Overview
NumPy's random module shifted from the legacy RandomState to the modern Generator API. This new approach provides better statistical properties, faster algorithms, and a robust system for parallel random number generation using SeedSequence.
When to Use
- Stochastic simulations requiring high-quality random bits.
- Shuffling datasets for machine learning training.
- Generating independent random streams for parallel computing workers.
- Creating reproducible experiments across different runs.
Decision Tree
- Starting a new project?
- Use
np.random.default_rng(). Do not usenp.random.seed().
- Use
- Need independent streams for multiple CPUs?
- Use
SeedSequence.spawn()to create children.
- Use
- Shuffling in-place?
- Use
rng.shuffle(arr). For a copy, userng.permuted(arr).
- Use
Workflows
-
Parallel Random Stream Generation
- Initialize a SeedSequence with a high-quality entropy source.
- Use the
.spawn(n)method to create independent seed sequences for workers. - Instantiate a new Generator for each worker using its specific child sequence.
-
Reproducible Simulation Setup
- Obtain a 128-bit seed (e.g., using
secrets.randbits(128)). - Initialize the generator:
rng = np.random.default_rng(seed). - Log the seed to allow exact reproduction of the stochastic results in future runs.
- Obtain a 128-bit seed (e.g., using
-
In-Place Array Shuffling
- Create a Generator instance.
- Pass an existing array to
rng.shuffle(arr)to modify it in-place. - Specify the
axisparameter if only certain dimensions (e.g., rows) should be rearranged.
Non-Obvious Insights
- Legacy Discouragement:
RandomStateis essentially in maintenance mode;Generatoris faster and has better statistical distribution qualities. - Small Seed Limitation: Seeding with small integers (0-100) limits the reachable state space;
SeedSequenceensures high-entropy starting states. - Bitstream Instability: Even with the same seed, the bitstream is not guaranteed to be identical across different NumPy versions due to algorithmic improvements.
Evidence
- "In general, users will create a Generator instance with default_rng and call the various methods on it to obtain samples." Source
- "SeedSequence mixes sources of entropy in a reproducible way to set the initial state for independent and very probably non-overlapping BitGenerators." Source
Scripts
scripts/numpy-random_tool.py: Implements parallel seed spawning and reproducible RNG.scripts/numpy-random_tool.js: Basic random sampling logic.
Dependencies
numpy(Python)