pytdc
Therapeutics Data Commons. AI-ready drug discovery datasets (ADME, toxicity, DTI), benchmarks, scaffold splits, molecular oracles, for therapeutic ML and pharmacological prediction.
evaluating-llms-harness
Evaluates LLMs across 60+ academic benchmarks (MMLU, HumanEval, GSM8K, TruthfulQA, HellaSwag). Use when benchmarking model quality, comparing models, reporting academic results, or tracking training progress. Industry standard used by EleutherAI, HuggingFace, and major labs. Supports HuggingFace, vLLM, APIs.
llm-evaluation
Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or establishing evaluation frameworks.
java-performance
JVM performance tuning - GC optimization, profiling, memory analysis, benchmarking
redis-performance
Master Redis performance - memory optimization, slow log analysis, benchmarking, monitoring, and tuning strategies
rust-performance
Master Rust performance - profiling, benchmarking, and optimization
benchmark-datasets
Standard datasets and benchmarks for evaluating AI security, robustness, and safety
performance
>
llm-evaluation
Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or establishing evaluation frameworks.
ctx:performance
Analyze and optimize parallel workflow performance. Use when users report slow parallel execution, want to improve speed, or need performance analysis. Activate for questions about bottlenecks, time savings, optimization opportunities, or benchmarking parallel workflows.
performance-at-scale
Spatial indexing and world streaming for Three.js building games with thousands of pieces. Use when optimizing building games, implementing spatial queries, chunk loading, or profiling performance. Includes spatial hash grids, octrees, chunk managers, and benchmarking tools.
optimization-performance
|
performance-testing
Performance testing guidance including load testing with k6, locust, and artillery, benchmarking strategies, profiling techniques, metrics analysis, performance budgets, and bottleneck identification. Use when setting up performance tests, analyzing system behavior under load, or optimizing application performance. Trigger keywords: performance testing, load testing, k6, locust, artillery, benchmarking, profiling, latency, throughput, performance budget, bottleneck, stress testing, scalability testing.
rep-performance-scorecard
Multi-dimensional rep evaluation: activity, conversion, velocity, deal size. Peer benchmarking and coaching priority identification.
profiling-optimization
Profile application performance, identify bottlenecks, and optimize hot paths using CPU profiling, flame graphs, and benchmarking. Use when investigating performance issues or optimizing critical code paths.