Back to tags
Tag

Agent Skills with tag: benchmarking

15 skills match this tag. Use tags to discover related Agent Skills and explore similar workflows.

pytdc

Therapeutics Data Commons. AI-ready drug discovery datasets (ADME, toxicity, DTI), benchmarks, scaffold splits, molecular oracles, for therapeutic ML and pharmacological prediction.

drug-discoverypharmacologymolecular-featurizationtherapeutics
ovachiever
ovachiever
81

evaluating-llms-harness

Evaluates LLMs across 60+ academic benchmarks (MMLU, HumanEval, GSM8K, TruthfulQA, HellaSwag). Use when benchmarking model quality, comparing models, reporting academic results, or tracking training progress. Industry standard used by EleutherAI, HuggingFace, and major labs. Supports HuggingFace, vLLM, APIs.

llm-evaluationbenchmarkingacademic-benchmarkshuggingface
ovachiever
ovachiever
81

llm-evaluation

Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or establishing evaluation frameworks.

llm-evaluationbenchmarkingautomated-metricshuman-feedback
ovachiever
ovachiever
81

java-performance

JVM performance tuning - GC optimization, profiling, memory analysis, benchmarking

JVMGC-optimizationprofilingmemory-analysis
pluginagentmarketplace
pluginagentmarketplace
1

redis-performance

Master Redis performance - memory optimization, slow log analysis, benchmarking, monitoring, and tuning strategies

redisperformance-tuningmemory-optimizationmonitoring
pluginagentmarketplace
pluginagentmarketplace
1

rust-performance

Master Rust performance - profiling, benchmarking, and optimization

rustprofilingbenchmarkingoptimization
pluginagentmarketplace
pluginagentmarketplace
1

benchmark-datasets

Standard datasets and benchmarks for evaluating AI security, robustness, and safety

benchmarkingdatasetsai-securityrobustness
pluginagentmarketplace
pluginagentmarketplace
1

performance

>

performance-tuningperformance-optimizationperformance-testingprofiling
pluginagentmarketplace
pluginagentmarketplace
1

llm-evaluation

Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or establishing evaluation frameworks.

llmevaluation-frameworkautomated-metricshuman-feedback
camoneart
camoneart
4

ctx:performance

Analyze and optimize parallel workflow performance. Use when users report slow parallel execution, want to improve speed, or need performance analysis. Activate for questions about bottlenecks, time savings, optimization opportunities, or benchmarking parallel workflows.

parallel-executionperformance-optimizationbottleneck-identificationbenchmarking
Shakes-tzd
Shakes-tzd
2

performance-at-scale

Spatial indexing and world streaming for Three.js building games with thousands of pieces. Use when optimizing building games, implementing spatial queries, chunk loading, or profiling performance. Includes spatial hash grids, octrees, chunk managers, and benchmarking tools.

threejsspatial-indexingoctreeschunk-loading
Bbeierle12
Bbeierle12
3

optimization-performance

|

performance-tuningparallelismprofilingbenchmarking
pluginagentmarketplace
pluginagentmarketplace
2

performance-testing

Performance testing guidance including load testing with k6, locust, and artillery, benchmarking strategies, profiling techniques, metrics analysis, performance budgets, and bottleneck identification. Use when setting up performance tests, analyzing system behavior under load, or optimizing application performance. Trigger keywords: performance testing, load testing, k6, locust, artillery, benchmarking, profiling, latency, throughput, performance budget, bottleneck, stress testing, scalability testing.

performance-testingload-testingbenchmarkingprofiling
cosmix
cosmix
3

rep-performance-scorecard

Multi-dimensional rep evaluation: activity, conversion, velocity, deal size. Peer benchmarking and coaching priority identification.

KPIperformance-trackingdashboardsbenchmarking
OneWave-AI
OneWave-AI
237

profiling-optimization

Profile application performance, identify bottlenecks, and optimize hot paths using CPU profiling, flame graphs, and benchmarking. Use when investigating performance issues or optimizing critical code paths.

performance-tuningcpu-profilingflame-graphsbenchmarking
aj-geddes
aj-geddes
301