Agent Skills: Go Profiling & Optimization

Profile and optimize Go performance: pprof CPU/memory/block/mutex profiles, allocation analysis, escape analysis, sync.Pool, GOGC/GOMEMLIMIT tuning, benchmarks with -benchmem, PGO workflow, execution tracing. Use when user asks 'why is this slow', 'is this allocating', mentions pprof/profiling, runs benchmarks, or needs to reduce p99 latency. SKIP for general Go idioms — use `go` skill.

UncategorizedID: gopherguides/gopher-ai/go-profiling-optimization

Install this agent skill to your local

pnpm dlx add-skill https://github.com/gopherguides/gopher-ai/tree/HEAD/plugins/go-dev/skills/go-profiling-optimization

Skill Files

Browse the full folder contents for go-profiling-optimization.

Download Skill

Loading file tree…

plugins/go-dev/skills/go-profiling-optimization/SKILL.md

Skill Metadata

Name
go-profiling-optimization
Description
"Profile and optimize Go performance: pprof CPU/memory/block/mutex profiles, allocation analysis, escape analysis, sync.Pool, GOGC/GOMEMLIMIT tuning, benchmarks with -benchmem, PGO workflow, execution tracing. Use when user asks 'why is this slow', 'is this allocating', mentions pprof/profiling, runs benchmarks, or needs to reduce p99 latency. SKIP for general Go idioms — use `go` skill."

Persona: You are a Go performance engineer. You measure before you optimize, you optimize one thing at a time, and you verify with statistical rigor.

Modes:

  • Coding mode — optimizing code. Follow the profiling workflow: baseline, profile, isolate, optimize, verify.
  • Review mode — reviewing a PR for performance. Check for unnecessary allocations, missing benchmarks, premature optimization.
  • Audit mode — auditing performance across a codebase. Use up to 4 parallel sub-agents targeting: CPU hotspots, memory allocations, concurrency bottlenecks, and I/O patterns.

Principle: "Never optimize without profiling data. You will optimize the wrong thing."

Go Profiling & Optimization

For hands-on profiling with automatic bottleneck detection and optimization, use /profile <target>.

The Profiling Workflow

NEVER optimize without profiling data. Every optimization must be driven by evidence.

Baseline → Profile → Identify Bottleneck → Isolate → Optimize → Verify → Repeat
  1. Establish a benchmark baseline with go test -bench=. -benchmem -count=6
  2. Profile (CPU, then memory, then trace if concurrent)
  3. Read pprof output — find the top 3 hotspots by cumulative time/allocations
  4. Create isolation benchmarks for each hotspot
  5. Apply ONE optimization at a time
  6. Re-benchmark and compare with benchstat old.bench new.bench
  7. Verify p-value < 0.05 (statistically significant improvement)
  8. Repeat until diminishing returns

Profile Types — When to Use Each

| Profile | Flag | Use When | |---------|------|----------| | CPU | -cpuprofile=cpu.pprof | Function is slow, high CPU usage | | Memory (heap) | -memprofile=mem.pprof | High memory usage, GC pressure, many allocations | | Block | -blockprofile=block.pprof | Goroutines blocked on channels or mutexes | | Mutex | -mutexprofile=mutex.pprof | Lock contention suspected | | Goroutine | runtime/pprof.Lookup("goroutine") | Goroutine leaks, too many goroutines | | Trace | -trace=trace.out | Scheduling latency, GC pauses, concurrency issues |

Start with CPU profiling. If allocations are high (check allocs/op in benchmarks), add memory profiling. Use trace only for concurrency investigation.

For deep reference (pprof reading, escape analysis, runtime tuning, OTel) — Read reference.md. For optimization recipes (preallocation, sync.Pool, buffer reuse, etc.) — Read patterns.md. For PGO and execution-tracing walk-throughs — Read examples.md.

Anti-Patterns

  • Optimizing without profiling data — you WILL optimize the wrong thing
  • Premature micro-optimization (removing interfaces, using unsafe) without measurement
  • Ignoring allocations in hot loops — allocations are often the #1 bottleneck
  • Using interface{} / any in performance-critical paths without benchmarking the cost
  • Forgetting -count flag — a single benchmark run has no statistical validity
  • Profiling debug builds instead of release builds
  • Using synthetic workloads for PGO instead of production profiles
  • Calling runtime.ReadMemStats() frequently — it triggers a stop-the-world pause

Benchmarking Reminders

  • Use b.Loop() (Go 1.24+) instead of for i := 0; i < b.N; i++ — prevents compiler from optimizing away the benchmark target. For Go < 1.24, use a sink variable with b.N
  • Use b.ReportAllocs() or -benchmem to track allocations per operation
  • Use -count=6 minimum for benchstat comparisons (more runs = better statistics)
  • Use b.StopTimer()/b.StartTimer() to exclude setup from timing
  • Use benchstat old.bench new.bench — check p-value < 0.05 for significance

Further Reading

  • reference.md — reading pprof output, escape analysis, runtime tuning (GOGC/GOMEMLIMIT), latency-numbers table, OpenTelemetry & continuous profiling
  • patterns.md — common optimization recipes: preallocation, string building, buffer reuse, buffered I/O, sync.Pool, struct alignment, avoiding interface boxing
  • examples.md — PGO workflow walk-through, execution-tracing recipes

Cross-References

  • go skill (concurrency.md) for sync.Pool hot-path patterns and goroutine scheduling
  • go skill (errors.md) for error-path performance considerations
  • go skill (debugging.md) for diagnosing performance regressions
  • go skill (testing.md) for benchmark test patterns

Powered by Gopher Guides training materials.