Python Performance Optimization
Expert guidance for profiling, optimizing, and accelerating Python applications through systematic analysis, algorithmic improvements, efficient data structures, and acceleration techniques.
When to Use This Skill
- Code runs too slowly for production requirements
- High CPU usage or memory consumption issues
- Need to reduce API response times or batch processing duration
- Application fails to scale under load
- Optimizing data processing pipelines or scientific computing
- Reducing cloud infrastructure costs through efficiency gains
- Profile-guided optimization after measuring performance bottlenecks
Core Concepts
The Golden Rule: Never optimize without profiling first. 80% of execution time is spent in 20% of code.
Optimization Hierarchy (in priority order):
- Algorithm complexity - O(n²) → O(n log n) provides exponential gains
- Data structure choice - List → Set for lookups (10,000x faster)
- Language features - Comprehensions, built-ins, generators
- Caching - Memoization for repeated calculations
- Compiled extensions - NumPy, Numba, Cython for hot paths
- Parallelism - Multiprocessing for CPU-bound work
Key Principle: Algorithmic improvements beat micro-optimizations every time.
Quick Reference
Load detailed guides for specific optimization areas:
| Task | Load reference |
| --- | --- |
| Profile code and find bottlenecks | skills/python-performance-optimization/references/profiling.md |
| Algorithm and data structure optimization | skills/python-performance-optimization/references/algorithms.md |
| Memory optimization and generators | skills/python-performance-optimization/references/memory.md |
| String concatenation and file I/O | skills/python-performance-optimization/references/string-io.md |
| NumPy, Numba, Cython, multiprocessing | skills/python-performance-optimization/references/acceleration.md |
Optimization Workflow
Phase 1: Measure
- Profile with cProfile - Identify slow functions
- Line profile hot paths - Find exact slow lines
- Memory profile - Check for memory bottlenecks
- Benchmark baseline - Record current performance
Phase 2: Analyze
- Check algorithm complexity - Is it O(n²) or worse?
- Evaluate data structures - Are you using lists for lookups?
- Identify repeated work - Can results be cached?
- Find I/O bottlenecks - Database queries, file operations
Phase 3: Optimize
- Improve algorithms first - Biggest impact
- Use appropriate data structures - Set/dict for O(1) lookups
- Apply caching -
@lru_cachefor expensive functions - Use generators - For large datasets
- Leverage NumPy/Numba - For numerical code
- Parallelize - Multiprocessing for CPU-bound tasks
Phase 4: Validate
- Re-profile - Verify improvements
- Benchmark - Measure speedup quantitatively
- Test correctness - Ensure optimizations didn't break functionality
- Document - Explain why optimization was needed
Common Optimization Patterns
Pattern 1: Replace List with Set for Lookups
# Slow: O(n) lookup
if item in large_list: # Bad
# Fast: O(1) lookup
if item in large_set: # Good
Pattern 2: Use Comprehensions
# Slower
result = []
for i in range(n):
result.append(i * 2)
# Faster (35% speedup)
result = [i * 2 for i in range(n)]
Pattern 3: Cache Expensive Calculations
from functools import lru_cache
@lru_cache(maxsize=None)
def expensive_function(n):
# Result cached automatically
return complex_calculation(n)
Pattern 4: Use Generators for Large Data
# Memory inefficient
def read_file(path):
return [line for line in open(path)] # Loads entire file
# Memory efficient
def read_file(path):
for line in open(path): # Streams line by line
yield line.strip()
Pattern 5: Vectorize with NumPy
# Pure Python: ~500ms
result = sum(i**2 for i in range(1000000))
# NumPy: ~5ms (100x faster)
import numpy as np
result = np.sum(np.arange(1000000)**2)
Common Mistakes to Avoid
- Optimizing before profiling - You'll optimize the wrong code
- Using lists for membership tests - Use sets/dicts instead
- String concatenation in loops - Use
"".join()orStringIO - Loading entire files into memory - Use generators
- N+1 database queries - Use JOINs or batch queries
- Ignoring built-in functions - They're C-optimized and fast
- Premature optimization - Focus on algorithmic improvements first
- Not benchmarking - Always measure improvements quantitatively
Decision Tree
Start here: Profile with cProfile to find bottlenecks
Hot path is algorithm?
- Yes → Check complexity, improve algorithm, use better data structures
- No → Continue
Hot path is computation?
- Numerical loops → NumPy or Numba
- CPU-bound → Multiprocessing
- Already fast enough → Done
Hot path is memory?
- Large data → Generators, streaming
- Many objects →
__slots__, object pooling - Caching needed →
@lru_cacheor custom cache
Hot path is I/O?
- Database → Batch queries, indexes, connection pooling
- Files → Buffering, streaming
- Network → Async I/O, request batching
Best Practices
- Profile before optimizing - Measure to find real bottlenecks
- Optimize algorithms first - O(n²) → O(n) beats micro-optimizations
- Use appropriate data structures - Set/dict for lookups, not lists
- Leverage built-ins - C-implemented built-ins are faster than pure Python
- Avoid premature optimization - Optimize hot paths identified by profiling
- Use generators for large data - Reduce memory usage with lazy evaluation
- Batch operations - Minimize overhead from syscalls and network requests
- Cache expensive computations - Use
@lru_cacheor custom caching - Consider NumPy/Numba - Vectorization and JIT for numerical code
- Parallelize CPU-bound work - Use multiprocessing to utilize all cores
Resources
- Python Performance: https://wiki.python.org/moin/PythonSpeed
- cProfile: https://docs.python.org/3/library/profile.html
- NumPy: https://numpy.org/doc/stable/user/absolute_beginners.html
- Numba: https://numba.pydata.org/
- Cython: https://cython.readthedocs.io/
- High Performance Python (Book by Gorelick & Ozsvald)