Overview
NumPy sorting provides efficient tools for ordering data. Beyond basic sorting, it includes partitioning for top-k selection and vectorized binary search for finding insertion points in sorted data.
When to Use
- Finding the top $k$ largest or smallest elements without a full sort ($O(N)$).
- Ordering data based on multiple criteria (e.g., sort by Date, then by Price).
- Mapping data into bins or ranges using binary search.
- Handling datasets containing NaNs where sorting order is sensitive.
Decision Tree
- Need the indices of the sorted order (not the values)?
- Use
np.argsort.
- Use
- Only need the $k$ smallest elements?
- Use
np.partition(arr, k). Elements to the left of index $k$ are smaller.
- Use
- Finding where to insert a value to keep order?
- Use
np.searchsorted(sorted_arr, value).
- Use
Workflows
-
Efficiently Finding the Smallest K Elements
- Identify an unsorted array.
- Call
np.partition(arr, kth=k). - Select the first k elements:
result[:k].
-
Vectorized Lookup in Sorted Ranges
- Ensure the target array 'A' is sorted.
- Pass a list of values 'V' to
np.searchsorted(A, V). - Use the returned indices to map values to specific bins or ranges.
-
Indirect Multi-Key Sort
- Define primary and secondary key arrays.
- Use
np.lexsort((secondary, primary))to get the index array. - Apply the indices to the data to achieve the desired hierarchical sort order.
Non-Obvious Insights
- NaN Position:
np.nanis treated as larger thannp.infand is always sorted to the end of the array. - Partition Performance: Partitioning along the last axis is significantly faster and uses less memory than partitioning along any other axis.
- Lexsort Order:
lexsorttakes keys in reverse order of importance; the last key in the sequence is the primary sort key.
Evidence
- "In the output array, all elements smaller than the k-th element are located to the left of this element and all equal or greater are located to its right." Source
- "Binary search is used to find the required insertion points." Source
Scripts
scripts/numpy-sorting_tool.py: Implements top-k selection and hierarchical lexsort.scripts/numpy-sorting_tool.js: Basic sort simulation.
Dependencies
numpy(Python)