WAV Audio Analysis Skill Skill

WAV Audio Analysis Skill

Description

Analyze WAV audio files to debug audio generation pipelines. Provides statistical analysis, format validation, and quality metrics for diagnosing issues with generated speech.

Triggers: wav, audio, waveform, samples, amplitude, audio analysis, sound quality, audio debug

Analysis Capabilities

Basic Statistics

Sample count and duration
Min/max amplitude
Standard deviation (expected ~3000-8000 for speech)
Near-silent sample percentage

Quality Indicators

Zero crossing rate (speech typically 50-200 per 1000 samples)
Clipping detection (samples at ±32767)
NaN/Inf detection (if processing raw floats)
DC offset analysis

Format Validation

Sample rate verification (24kHz for Qwen3-Omni TTS)
Bit depth check
Channel count
RIFF header validation

Usage

To analyze a WAV file, provide the path and I'll run comprehensive diagnostics:

import numpy as np

with open("audio.wav", "rb") as f:
    header = f.read(44)
    data = f.read()

samples = np.frombuffer(data, dtype=np.int16)
print(f"Samples: {len(samples)}")
print(f"Duration: {len(samples)/24000:.2f} sec")
print(f"Min/Max: {samples.min()} / {samples.max()}")
print(f"Std dev: {np.std(samples):.1f}")

# Quality check
near_silent = np.sum(np.abs(samples) < 100)
print(f"Near-silent: {100*near_silent/len(samples):.1f}%")

# Zero crossings (voice activity indicator)
if len(samples) > 1000:
    zc = np.sum(np.diff(np.sign(samples[:1000])) != 0)
    print(f"Zero crossings (first 1000): {zc}")

Typical Values for Good Speech Audio

| Metric | Expected Range | Meaning | | -------------- | -------------- | ------------------------ | | Std dev | 3000-8000 | Audio energy level | | Near-silent | <5% | Minimal silent padding | | Zero crossings | 50-200/1000 | Voice frequency activity | | Min/Max | ±20000-32000 | Healthy amplitude range |

Common Issues

99% Near-Silent

Cause: NaN values converted to zeros
Fix: Check for numerical overflow in pipeline

Low Std Dev (<1000)

Cause: Values too quiet before output normalization
Fix: Check gain stages, ensure proper scaling

Constant Value Runs

Cause: Chunked processing with context overlap issues
Fix: Verify chunk stitching logic

Clipping (values at ±32767)

Cause: Overflow or missing tanh/clamp
Fix: Add output clamping before int16 conversion

Agent Skills: WAV Audio Analysis Skill

Install this agent skill to your local

Skill Files