Agent Skills: People Analytics

Expert people analytics covering workforce analytics, HR metrics, predictive modeling, employee insights, and data-driven HR decisions.

UncategorizedID: borghei/claude-skills/people-analytics

Install this agent skill to your local

pnpm dlx add-skill https://github.com/borghei/Claude-Skills/tree/HEAD/hr-operations/people-analytics

Skill Files

Browse the full folder contents for people-analytics.

Download Skill

Loading file tree…

hr-operations/people-analytics/SKILL.md

Skill Metadata

Name
people-analytics
Description
>

People Analytics

The agent operates as a senior people analytics partner, translating workforce data into actionable insights using statistical modeling, segmentation analysis, and data governance best practices.

Workflow

  1. Frame the question -- Clarify the business question with the HR or business stakeholder. Examples: "Why is Sales attrition 2x the company average?" or "Are we paying equitably across gender?" Define the success metric for the analysis.
  2. Assess data readiness -- Identify required data sources (HRIS, ATS, survey platform, payroll). Check for completeness, recency, and quality. Flag any gaps before proceeding.
  3. Analyze -- Apply the appropriate method from the analytics toolkit (descriptive stats, regression, classification, segmentation). Document assumptions and limitations.
  4. Validate findings -- Sense-check results with domain experts (HRBPs, managers). Test for statistical significance and practical significance. Check predictive models for bias across protected groups.
  5. Recommend -- Translate findings into 2-3 specific, actionable recommendations with expected impact and cost.
  6. Deliver and monitor -- Present insights using the dashboard framework. Set up ongoing monitoring for key metrics with alert thresholds.

Checkpoint: After step 2, confirm that all data has been anonymized or aggregated to comply with privacy policy before analysis begins.

Analytics Maturity Model

| Level | Name | Capabilities | Typical Questions Answered | |-------|------|-------------|---------------------------| | 1 | Operational Reporting | Headcount, compliance, ad-hoc queries | "How many people do we have?" | | 2 | Advanced Reporting | Dashboards, trends, benchmarking, segmentation | "How has attrition changed by quarter?" | | 3 | Analytics | Statistical analysis, correlation, root cause | "What drives attrition in Sales?" | | 4 | Predictive | Turnover prediction, performance modeling, risk scoring | "Who is likely to leave in the next 6 months?" | | 5 | Prescriptive | Automated recommendations, real-time interventions | "What should we do to retain this person?" |

Core HR Metrics

Workforce Metrics

| Metric | Formula | Benchmark | |--------|---------|-----------| | Turnover Rate | (Separations / Avg HC) x 100 | 10-15% | | Retention Rate | (Retained / Starting HC) x 100 | 85-90% | | Time to Fill | Days from req open to offer accept | 30-45 days | | Cost per Hire | Total recruiting cost / Hires | $3-5K | | Regrettable Turnover | Regrettable exits / Total exits | < 30% |

Performance Metrics

| Metric | Formula | Benchmark | |--------|---------|-----------| | High Performers | % rated top tier | 15-20% | | Goal Completion | Goals achieved / Goals set | 80%+ | | Promotion Rate | Promotions / Headcount | 8-12% |

Engagement Metrics

| Metric | Formula | Benchmark | |--------|---------|-----------| | eNPS | Promoters % - Detractors % | 20-40 | | Engagement Score | Survey composite (1-100) | 70%+ | | Absenteeism | Absent days / Work days | < 3% |

Turnover Prediction Model

import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

def build_turnover_model(employee_data: pd.DataFrame) -> dict:
    """
    Build and evaluate a turnover prediction model.

    Input: DataFrame with columns for features + 'left_company' (0/1).
    Output: dict with model, feature importance, and evaluation metrics.
    """
    features = [
        'tenure_months', 'salary_ratio_to_market', 'performance_rating',
        'months_since_last_promotion', 'manager_tenure', 'team_size',
        'engagement_score', 'training_hours_ytd', 'projects_completed'
    ]

    X = employee_data[features]
    y = employee_data['left_company']

    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42, stratify=y
    )

    model = RandomForestClassifier(n_estimators=100, random_state=42)
    model.fit(X_train, y_train)

    y_pred = model.predict(X_test)
    report = classification_report(y_test, y_pred, output_dict=True)

    importance = (
        pd.DataFrame({'feature': features, 'importance': model.feature_importances_})
        .sort_values('importance', ascending=False)
    )

    return {'model': model, 'importance': importance, 'evaluation': report}


def score_flight_risk(model, current_employees: pd.DataFrame) -> pd.DataFrame:
    """
    Score current employees for flight risk.

    Returns DataFrame with employee_id, flight_risk_score (0-1), and risk_level.
    """
    probabilities = model.predict_proba(current_employees[model.feature_names_in_])[:, 1]

    risk_levels = pd.cut(
        probabilities,
        bins=[0, 0.25, 0.50, 0.75, 1.0],
        labels=['Low', 'Medium', 'High', 'Critical']
    )

    return pd.DataFrame({
        'employee_id': current_employees['employee_id'],
        'flight_risk_score': probabilities.round(3),
        'risk_level': risk_levels
    }).sort_values('flight_risk_score', ascending=False)

Example: Sales Attrition Root-Cause Analysis

QUESTION
  Sales voluntary turnover is 22% vs 12% company average. Why?

DATA
  Source: HRIS + engagement survey + exit interviews (n=45 exits, trailing 12 mo)

ANALYSIS
  Segmentation by tenure band:
    < 1 yr: 35% of exits (onboarding/ramp issues)
    1-2 yr: 40% of exits (comp dissatisfaction + career path)
    2+ yr: 25% of exits (manager relationship)

  Regression on exit survey scores (n=38 respondents):
    Top drivers of intent-to-leave:
      1. "I am paid fairly" (beta = -0.42, p < 0.01)
      2. "I see a career path here" (beta = -0.31, p < 0.01)
      3. "My manager supports my development" (beta = -0.28, p < 0.05)

  Compensation benchmark:
    Sales IC3 compa-ratio: 0.88 (12% below midpoint)
    Sales IC2 compa-ratio: 0.91 (9% below midpoint)
    Rest of company average: 0.98

FINDINGS
  1. Sales comp is significantly below market, especially at IC2-IC3
  2. No defined career ladder for Sales ICs beyond IC3
  3. New hires (< 1 yr) leaving due to unrealistic ramp expectations

RECOMMENDATIONS
  1. Market adjustment: Bring Sales IC2-IC3 to 95th percentile compa-ratio ($180K budget)
  2. Publish a Sales career ladder through IC5 with clear promotion criteria
  3. Redesign onboarding: extend ramp period from 30 to 90 days with milestone targets

EXPECTED IMPACT
  Reduce Sales attrition from 22% to 14-16% within 12 months
  ROI: $180K adjustment saves ~$450K in replacement costs (10 fewer exits x $45K/hire)

Pay Equity Analysis

import pandas as pd
import statsmodels.api as sm

def analyze_pay_equity(employee_data: pd.DataFrame) -> dict:
    """
    Conduct pay equity analysis controlling for legitimate pay factors.

    Returns raw gap, adjusted gap, model fit, and employees flagged for review.
    """
    # Raw gap
    avg_by_gender = employee_data.groupby('gender')['salary'].mean()
    raw_gap = (avg_by_gender['Female'] - avg_by_gender['Male']) / avg_by_gender['Male']

    # Adjusted gap (control for level, tenure, performance, location)
    controls = pd.get_dummies(
        employee_data[['job_level', 'tenure_years', 'performance_rating', 'department', 'location']],
        drop_first=True
    )
    controls = sm.add_constant(controls)
    controls['is_female'] = (employee_data['gender'] == 'Female').astype(int)

    model = sm.OLS(employee_data['salary'], controls).fit()
    adjusted_gap = model.params['is_female']

    # Flag outliers (residual > 2 std dev)
    employee_data['predicted'] = model.predict(controls)
    employee_data['residual'] = employee_data['salary'] - employee_data['predicted']
    threshold = 2 * employee_data['residual'].std()
    flagged = employee_data[abs(employee_data['residual']) > threshold]

    return {
        'raw_gap_pct': round(raw_gap * 100, 1),
        'adjusted_gap_usd': round(adjusted_gap, 0),
        'model_r_squared': round(model.rsquared, 3),
        'employees_flagged': len(flagged),
        'flagged_details': flagged[['employee_id', 'salary', 'predicted', 'residual']]
    }

Engagement Survey Analysis

  1. Calculate response rate -- Target 80%+ for statistical validity. Flag departments below 60%.
  2. Compute category scores -- Average Likert responses by category (Manager, Growth, Culture, Compensation). Compare to prior period.
  3. Run driver analysis -- Regress category scores against overall engagement to identify which categories have the highest impact on engagement.
  4. Segment -- Break results by department, level, tenure band, and location. Identify where scores diverge most from company average.
  5. Prioritize -- Plot categories on a 2x2 matrix (Impact vs Score). "High impact, low score" quadrant = priority action areas.

Checkpoint: Suppress results for any segment with fewer than 5 respondents to protect anonymity.

DEI Metrics Framework

| Domain | Metrics | Data Source | |--------|---------|-------------| | Representation | Gender / ethnicity distribution by level | HRIS | | Pay equity | Raw gap, adjusted gap (controlled regression) | Payroll + HRIS | | Progression | Promotion rates by demographic group | HRIS | | Hiring | Offer and accept rates by demographic group | ATS | | Inclusion | Inclusion index, belonging score, psychological safety | Survey |

Data Governance Checklist

Before starting any people analytics project:

  • [ ] Business question and purpose clearly documented
  • [ ] Data minimization applied (only collect what is needed)
  • [ ] Privacy impact assessment completed
  • [ ] Anonymization or aggregation applied where possible
  • [ ] Predictive models tested for bias across protected groups
  • [ ] Role-based access controls implemented
  • [ ] Data retention policy defined
  • [ ] Employee communication planned (transparency principle)

Reference Materials

  • references/hr_metrics.md - Complete HR metrics guide
  • references/predictive_models.md - Predictive modeling approaches
  • references/survey_design.md - Survey methodology
  • references/data_ethics.md - Ethical analytics practices

Scripts

# Analyze engagement survey results with driver analysis
python scripts/survey_analyzer.py --file survey_results.csv
python scripts/survey_analyzer.py --file survey_results.csv --prior prior_survey.csv --json

# Score attrition risk from employee data
python scripts/attrition_predictor.py --file employees.csv
python scripts/attrition_predictor.py --file employees.csv --threshold 0.7 --json

# Workforce headcount planning calculations
python scripts/headcount_planner.py --file workforce.csv --growth 0.15 --attrition 0.12
python scripts/headcount_planner.py --file workforce.csv --growth 0.15 --attrition 0.12 --json

Troubleshooting

| Problem | Root Cause | Resolution | |---------|-----------|------------| | Low survey response rate (< 70%) | Survey fatigue, lack of trust in anonymity, or no visible action from prior surveys | Shorten survey to 15-20 questions max; communicate anonymity safeguards clearly; publish and act on top 3 findings from prior survey before launching next one | | Attrition model produces too many false positives | Overfitting on historical data, missing key features, or class imbalance | Add regularization; use SMOTE or class weights to handle imbalance; validate with cross-validation not just train/test split; include manager quality and comp-ratio as features | | Stakeholders distrust analytics findings | Results contradict lived experience, or methodology is opaque | Present methodology transparently; validate findings with HRBPs before publishing; use confidence intervals not point estimates; start with descriptive analytics to build trust before predictive | | Data quality issues across HRIS sources | Inconsistent coding, missing fields, stale records, or duplicate entries | Establish data governance council; define data owners per field; run quarterly data quality audits; build automated validation checks at ingestion | | Privacy concerns block analysis | Insufficient anonymization, no consent framework, or regulatory gaps | Apply k-anonymity (minimum group size of 5); conduct privacy impact assessment before each project; engage Legal early; use aggregated data when individual-level is not required | | Engagement scores are flat despite interventions | Measuring wrong drivers, action plans not executed, or survey is too generic | Run driver analysis to identify high-impact low-score areas; assign action owners with quarterly check-ins; customize survey questions by department or function | | Leadership does not act on insights | Insights are too academic, lack business framing, or arrive too late | Lead with business impact (revenue, cost, risk); limit recommendations to 2-3 with clear owners and timelines; deliver insights within 2 weeks of data collection |

Success Criteria

| Dimension | Metric | Target | Measurement | |-----------|--------|--------|-------------| | Data Quality | HRIS data completeness | > 95% of required fields populated | Quarterly data audit report | | Data Quality | Data freshness | All records updated within 30 days | HRIS last-modified timestamps | | Adoption | Stakeholder usage of dashboards | > 70% of HRBPs and VPs access monthly | Dashboard analytics / login tracking | | Adoption | Insight-to-action rate | > 60% of recommendations result in initiatives | Quarterly tracking of recommendation outcomes | | Accuracy | Attrition prediction precision | > 70% precision at 50% recall | Model evaluation against actuals (6-month lag) | | Accuracy | Survey driver analysis validity | Top 3 drivers validated by qualitative data | Cross-reference with exit interviews and focus groups | | Impact | Regrettable attrition reduction | 10-20% reduction within 12 months of intervention | HRIS voluntary termination data, regrettable flag | | Impact | Time from question to insight | < 2 weeks for standard analyses | Request-to-delivery tracking | | Compliance | Privacy incidents | Zero breaches of anonymity thresholds | Audit log of all queries; minimum group size enforcement | | Maturity | Analytics maturity level progression | Advance 1 level per 12-18 months | Self-assessment against the Analytics Maturity Model |

Scope & Limitations

In Scope:

  • Workforce descriptive analytics: headcount, turnover, retention, demographics, tenure distribution
  • Engagement survey design, analysis, driver identification, and benchmarking
  • Attrition risk scoring using rule-based and statistical methods (standard library only)
  • Pay equity analysis: raw gap, controlled gap, outlier flagging
  • DEI metrics: representation, progression rates, hiring funnel equity
  • Workforce planning: headcount forecasting, scenario modeling, gap analysis
  • Dashboard design and KPI framework recommendations

Out of Scope:

  • Real-time predictive models requiring ML frameworks (scikit-learn, TensorFlow) -- scripts use rule-based scoring for portability
  • Sentiment analysis of free-text survey responses (requires NLP libraries)
  • Individual employee profiling or surveillance -- all analysis uses aggregated or anonymized data
  • HRIS system administration, data pipeline engineering, or ETL development
  • Legal interpretation of pay equity findings (requires Employment Law counsel)
  • Organizational network analysis requiring email/calendar metadata

Known Limitations:

  • Attrition risk scoring in scripts uses weighted heuristics, not trained ML models; accuracy depends on feature quality and weight calibration
  • Pay equity analysis in the SKILL.md examples requires statsmodels (external dependency); scripts use standard-library approximations
  • Survey analysis assumes Likert scale (1-5) responses; other formats require preprocessing
  • Small population segments (< 30) produce unreliable statistical results; flag these in reporting
  • Historical data biases (e.g., biased performance ratings) propagate into predictive models if not addressed

Integration Points

| System / Skill | Integration | Data Flow | |----------------|-------------|-----------| | HRIS (Workday, BambooHR, HiBob) | Employee master data, tenure, compensation, performance ratings | HRIS -> analytics data lake; analytics insights -> HRBP workforce plans | | ATS (Greenhouse, Lever) | Hiring funnel data, source-of-hire, time-to-fill | ATS -> hiring analytics; quality-of-hire scoring feeds back to TA strategy | | Survey Platform (Culture Amp, Qualtrics, Lattice) | Engagement survey responses, eNPS, pulse check data | Survey platform -> survey_analyzer.py; driver analysis -> action planning | | Talent Acquisition skill | Hiring funnel metrics, source effectiveness, quality of hire | TA pipeline data -> analytics models; analytics insights -> sourcing optimization | | HR Business Partner skill | Workforce planning inputs, org health scoring, retention strategy | Analytics insights -> HRBP recommendations; HRBP questions -> analytics projects | | Operations Manager skill | Headcount forecasting, capacity planning, productivity metrics | Ops demand forecast -> headcount_planner.py; workforce metrics -> ops capacity models | | Finance skill | Compensation budgets, cost modeling, headcount budget vs actual | Finance comp data -> pay equity analysis; headcount plan -> Finance budget model | | Payroll (ADP, Gusto) | Compensation actuals, bonus payouts, overtime data | Payroll -> comp analysis; pay equity findings -> comp adjustment recommendations | | BI Platform (Tableau, Looker, Power BI) | Dashboard hosting, self-service analytics, scheduled reporting | Analytics outputs -> BI dashboards; BI usage metrics -> adoption tracking |

People Analytics Skill | Agent Skills