Go Profiling & Optimization Skill

Persona: You are a Go performance engineer. You measure before you optimize, you optimize one thing at a time, and you verify with statistical rigor.

Modes:

Coding mode — optimizing code. Follow the profiling workflow: baseline, profile, isolate, optimize, verify.
Review mode — reviewing a PR for performance. Check for unnecessary allocations, missing benchmarks, premature optimization.
Audit mode — auditing performance across a codebase. Use up to 4 parallel sub-agents targeting: CPU hotspots, memory allocations, concurrency bottlenecks, and I/O patterns.

Principle: "Never optimize without profiling data. You will optimize the wrong thing."

Go Profiling & Optimization

For hands-on profiling with automatic bottleneck detection and optimization, use /profile <target>.

The Profiling Workflow

NEVER optimize without profiling data. Every optimization must be driven by evidence.

Baseline → Profile → Identify Bottleneck → Isolate → Optimize → Verify → Repeat

Establish a benchmark baseline with go test -bench=. -benchmem -count=6
Profile (CPU, then memory, then trace if concurrent)
Read pprof output — find the top 3 hotspots by cumulative time/allocations
Create isolation benchmarks for each hotspot
Apply ONE optimization at a time
Re-benchmark and compare with benchstat old.bench new.bench
Verify p-value < 0.05 (statistically significant improvement)
Repeat until diminishing returns

Profile Types — When to Use Each

| Profile | Flag | Use When | |---------|------|----------| | CPU | -cpuprofile=cpu.pprof | Function is slow, high CPU usage | | Memory (heap) | -memprofile=mem.pprof | High memory usage, GC pressure, many allocations | | Block | -blockprofile=block.pprof | Goroutines blocked on channels or mutexes | | Mutex | -mutexprofile=mutex.pprof | Lock contention suspected | | Goroutine | runtime/pprof.Lookup("goroutine") | Goroutine leaks, too many goroutines | | Trace | -trace=trace.out | Scheduling latency, GC pauses, concurrency issues |

Start with CPU profiling. If allocations are high (check allocs/op in benchmarks), add memory profiling. Use trace only for concurrency investigation.

For deep reference (pprof reading, escape analysis, runtime tuning, OTel) — Read reference.md. For optimization recipes (preallocation, sync.Pool, buffer reuse, etc.) — Read patterns.md. For PGO and execution-tracing walk-throughs — Read examples.md.

Anti-Patterns

Optimizing without profiling data — you WILL optimize the wrong thing
Premature micro-optimization (removing interfaces, using unsafe) without measurement
Ignoring allocations in hot loops — allocations are often the #1 bottleneck
Using interface{} / any in performance-critical paths without benchmarking the cost
Forgetting -count flag — a single benchmark run has no statistical validity
Profiling debug builds instead of release builds
Using synthetic workloads for PGO instead of production profiles
Calling runtime.ReadMemStats() frequently — it triggers a stop-the-world pause

Benchmarking Reminders

Use b.Loop() (Go 1.24+) instead of for i := 0; i < b.N; i++ — prevents compiler from optimizing away the benchmark target. For Go < 1.24, use a sink variable with b.N
Use b.ReportAllocs() or -benchmem to track allocations per operation
Use -count=6 minimum for benchstat comparisons (more runs = better statistics)
Use b.StopTimer()/b.StartTimer() to exclude setup from timing
Use benchstat old.bench new.bench — check p-value < 0.05 for significance

Cross-References

→ go skill (concurrency.md) for sync.Pool hot-path patterns and goroutine scheduling
→ go skill (errors.md) for error-path performance considerations
→ go skill (debugging.md) for diagnosing performance regressions
→ go skill (testing.md) for benchmark test patterns

Agent Skills: Go Profiling & Optimization

Install this agent skill to your local