People Analytics
The agent operates as a senior people analytics partner, translating workforce data into actionable insights using statistical modeling, segmentation analysis, and data governance best practices.
Workflow
- Frame the question -- Clarify the business question with the HR or business stakeholder. Examples: "Why is Sales attrition 2x the company average?" or "Are we paying equitably across gender?" Define the success metric for the analysis.
- Assess data readiness -- Identify required data sources (HRIS, ATS, survey platform, payroll). Check for completeness, recency, and quality. Flag any gaps before proceeding.
- Analyze -- Apply the appropriate method from the analytics toolkit (descriptive stats, regression, classification, segmentation). Document assumptions and limitations.
- Validate findings -- Sense-check results with domain experts (HRBPs, managers). Test for statistical significance and practical significance. Check predictive models for bias across protected groups.
- Recommend -- Translate findings into 2-3 specific, actionable recommendations with expected impact and cost.
- Deliver and monitor -- Present insights using the dashboard framework. Set up ongoing monitoring for key metrics with alert thresholds.
Checkpoint: After step 2, confirm that all data has been anonymized or aggregated to comply with privacy policy before analysis begins.
Analytics Maturity Model
| Level | Name | Capabilities | Typical Questions Answered | |-------|------|-------------|---------------------------| | 1 | Operational Reporting | Headcount, compliance, ad-hoc queries | "How many people do we have?" | | 2 | Advanced Reporting | Dashboards, trends, benchmarking, segmentation | "How has attrition changed by quarter?" | | 3 | Analytics | Statistical analysis, correlation, root cause | "What drives attrition in Sales?" | | 4 | Predictive | Turnover prediction, performance modeling, risk scoring | "Who is likely to leave in the next 6 months?" | | 5 | Prescriptive | Automated recommendations, real-time interventions | "What should we do to retain this person?" |
Core HR Metrics
Workforce Metrics
| Metric | Formula | Benchmark | |--------|---------|-----------| | Turnover Rate | (Separations / Avg HC) x 100 | 10-15% | | Retention Rate | (Retained / Starting HC) x 100 | 85-90% | | Time to Fill | Days from req open to offer accept | 30-45 days | | Cost per Hire | Total recruiting cost / Hires | $3-5K | | Regrettable Turnover | Regrettable exits / Total exits | < 30% |
Performance Metrics
| Metric | Formula | Benchmark | |--------|---------|-----------| | High Performers | % rated top tier | 15-20% | | Goal Completion | Goals achieved / Goals set | 80%+ | | Promotion Rate | Promotions / Headcount | 8-12% |
Engagement Metrics
| Metric | Formula | Benchmark | |--------|---------|-----------| | eNPS | Promoters % - Detractors % | 20-40 | | Engagement Score | Survey composite (1-100) | 70%+ | | Absenteeism | Absent days / Work days | < 3% |
Turnover Prediction Model
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
def build_turnover_model(employee_data: pd.DataFrame) -> dict:
"""
Build and evaluate a turnover prediction model.
Input: DataFrame with columns for features + 'left_company' (0/1).
Output: dict with model, feature importance, and evaluation metrics.
"""
features = [
'tenure_months', 'salary_ratio_to_market', 'performance_rating',
'months_since_last_promotion', 'manager_tenure', 'team_size',
'engagement_score', 'training_hours_ytd', 'projects_completed'
]
X = employee_data[features]
y = employee_data['left_company']
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
report = classification_report(y_test, y_pred, output_dict=True)
importance = (
pd.DataFrame({'feature': features, 'importance': model.feature_importances_})
.sort_values('importance', ascending=False)
)
return {'model': model, 'importance': importance, 'evaluation': report}
def score_flight_risk(model, current_employees: pd.DataFrame) -> pd.DataFrame:
"""
Score current employees for flight risk.
Returns DataFrame with employee_id, flight_risk_score (0-1), and risk_level.
"""
probabilities = model.predict_proba(current_employees[model.feature_names_in_])[:, 1]
risk_levels = pd.cut(
probabilities,
bins=[0, 0.25, 0.50, 0.75, 1.0],
labels=['Low', 'Medium', 'High', 'Critical']
)
return pd.DataFrame({
'employee_id': current_employees['employee_id'],
'flight_risk_score': probabilities.round(3),
'risk_level': risk_levels
}).sort_values('flight_risk_score', ascending=False)
Example: Sales Attrition Root-Cause Analysis
QUESTION
Sales voluntary turnover is 22% vs 12% company average. Why?
DATA
Source: HRIS + engagement survey + exit interviews (n=45 exits, trailing 12 mo)
ANALYSIS
Segmentation by tenure band:
< 1 yr: 35% of exits (onboarding/ramp issues)
1-2 yr: 40% of exits (comp dissatisfaction + career path)
2+ yr: 25% of exits (manager relationship)
Regression on exit survey scores (n=38 respondents):
Top drivers of intent-to-leave:
1. "I am paid fairly" (beta = -0.42, p < 0.01)
2. "I see a career path here" (beta = -0.31, p < 0.01)
3. "My manager supports my development" (beta = -0.28, p < 0.05)
Compensation benchmark:
Sales IC3 compa-ratio: 0.88 (12% below midpoint)
Sales IC2 compa-ratio: 0.91 (9% below midpoint)
Rest of company average: 0.98
FINDINGS
1. Sales comp is significantly below market, especially at IC2-IC3
2. No defined career ladder for Sales ICs beyond IC3
3. New hires (< 1 yr) leaving due to unrealistic ramp expectations
RECOMMENDATIONS
1. Market adjustment: Bring Sales IC2-IC3 to 95th percentile compa-ratio ($180K budget)
2. Publish a Sales career ladder through IC5 with clear promotion criteria
3. Redesign onboarding: extend ramp period from 30 to 90 days with milestone targets
EXPECTED IMPACT
Reduce Sales attrition from 22% to 14-16% within 12 months
ROI: $180K adjustment saves ~$450K in replacement costs (10 fewer exits x $45K/hire)
Pay Equity Analysis
import pandas as pd
import statsmodels.api as sm
def analyze_pay_equity(employee_data: pd.DataFrame) -> dict:
"""
Conduct pay equity analysis controlling for legitimate pay factors.
Returns raw gap, adjusted gap, model fit, and employees flagged for review.
"""
# Raw gap
avg_by_gender = employee_data.groupby('gender')['salary'].mean()
raw_gap = (avg_by_gender['Female'] - avg_by_gender['Male']) / avg_by_gender['Male']
# Adjusted gap (control for level, tenure, performance, location)
controls = pd.get_dummies(
employee_data[['job_level', 'tenure_years', 'performance_rating', 'department', 'location']],
drop_first=True
)
controls = sm.add_constant(controls)
controls['is_female'] = (employee_data['gender'] == 'Female').astype(int)
model = sm.OLS(employee_data['salary'], controls).fit()
adjusted_gap = model.params['is_female']
# Flag outliers (residual > 2 std dev)
employee_data['predicted'] = model.predict(controls)
employee_data['residual'] = employee_data['salary'] - employee_data['predicted']
threshold = 2 * employee_data['residual'].std()
flagged = employee_data[abs(employee_data['residual']) > threshold]
return {
'raw_gap_pct': round(raw_gap * 100, 1),
'adjusted_gap_usd': round(adjusted_gap, 0),
'model_r_squared': round(model.rsquared, 3),
'employees_flagged': len(flagged),
'flagged_details': flagged[['employee_id', 'salary', 'predicted', 'residual']]
}
Engagement Survey Analysis
- Calculate response rate -- Target 80%+ for statistical validity. Flag departments below 60%.
- Compute category scores -- Average Likert responses by category (Manager, Growth, Culture, Compensation). Compare to prior period.
- Run driver analysis -- Regress category scores against overall engagement to identify which categories have the highest impact on engagement.
- Segment -- Break results by department, level, tenure band, and location. Identify where scores diverge most from company average.
- Prioritize -- Plot categories on a 2x2 matrix (Impact vs Score). "High impact, low score" quadrant = priority action areas.
Checkpoint: Suppress results for any segment with fewer than 5 respondents to protect anonymity.
DEI Metrics Framework
| Domain | Metrics | Data Source | |--------|---------|-------------| | Representation | Gender / ethnicity distribution by level | HRIS | | Pay equity | Raw gap, adjusted gap (controlled regression) | Payroll + HRIS | | Progression | Promotion rates by demographic group | HRIS | | Hiring | Offer and accept rates by demographic group | ATS | | Inclusion | Inclusion index, belonging score, psychological safety | Survey |
Data Governance Checklist
Before starting any people analytics project:
- [ ] Business question and purpose clearly documented
- [ ] Data minimization applied (only collect what is needed)
- [ ] Privacy impact assessment completed
- [ ] Anonymization or aggregation applied where possible
- [ ] Predictive models tested for bias across protected groups
- [ ] Role-based access controls implemented
- [ ] Data retention policy defined
- [ ] Employee communication planned (transparency principle)
Reference Materials
references/hr_metrics.md- Complete HR metrics guidereferences/predictive_models.md- Predictive modeling approachesreferences/survey_design.md- Survey methodologyreferences/data_ethics.md- Ethical analytics practices
Scripts
# Analyze engagement survey results with driver analysis
python scripts/survey_analyzer.py --file survey_results.csv
python scripts/survey_analyzer.py --file survey_results.csv --prior prior_survey.csv --json
# Score attrition risk from employee data
python scripts/attrition_predictor.py --file employees.csv
python scripts/attrition_predictor.py --file employees.csv --threshold 0.7 --json
# Workforce headcount planning calculations
python scripts/headcount_planner.py --file workforce.csv --growth 0.15 --attrition 0.12
python scripts/headcount_planner.py --file workforce.csv --growth 0.15 --attrition 0.12 --json
Troubleshooting
| Problem | Root Cause | Resolution | |---------|-----------|------------| | Low survey response rate (< 70%) | Survey fatigue, lack of trust in anonymity, or no visible action from prior surveys | Shorten survey to 15-20 questions max; communicate anonymity safeguards clearly; publish and act on top 3 findings from prior survey before launching next one | | Attrition model produces too many false positives | Overfitting on historical data, missing key features, or class imbalance | Add regularization; use SMOTE or class weights to handle imbalance; validate with cross-validation not just train/test split; include manager quality and comp-ratio as features | | Stakeholders distrust analytics findings | Results contradict lived experience, or methodology is opaque | Present methodology transparently; validate findings with HRBPs before publishing; use confidence intervals not point estimates; start with descriptive analytics to build trust before predictive | | Data quality issues across HRIS sources | Inconsistent coding, missing fields, stale records, or duplicate entries | Establish data governance council; define data owners per field; run quarterly data quality audits; build automated validation checks at ingestion | | Privacy concerns block analysis | Insufficient anonymization, no consent framework, or regulatory gaps | Apply k-anonymity (minimum group size of 5); conduct privacy impact assessment before each project; engage Legal early; use aggregated data when individual-level is not required | | Engagement scores are flat despite interventions | Measuring wrong drivers, action plans not executed, or survey is too generic | Run driver analysis to identify high-impact low-score areas; assign action owners with quarterly check-ins; customize survey questions by department or function | | Leadership does not act on insights | Insights are too academic, lack business framing, or arrive too late | Lead with business impact (revenue, cost, risk); limit recommendations to 2-3 with clear owners and timelines; deliver insights within 2 weeks of data collection |
Success Criteria
| Dimension | Metric | Target | Measurement | |-----------|--------|--------|-------------| | Data Quality | HRIS data completeness | > 95% of required fields populated | Quarterly data audit report | | Data Quality | Data freshness | All records updated within 30 days | HRIS last-modified timestamps | | Adoption | Stakeholder usage of dashboards | > 70% of HRBPs and VPs access monthly | Dashboard analytics / login tracking | | Adoption | Insight-to-action rate | > 60% of recommendations result in initiatives | Quarterly tracking of recommendation outcomes | | Accuracy | Attrition prediction precision | > 70% precision at 50% recall | Model evaluation against actuals (6-month lag) | | Accuracy | Survey driver analysis validity | Top 3 drivers validated by qualitative data | Cross-reference with exit interviews and focus groups | | Impact | Regrettable attrition reduction | 10-20% reduction within 12 months of intervention | HRIS voluntary termination data, regrettable flag | | Impact | Time from question to insight | < 2 weeks for standard analyses | Request-to-delivery tracking | | Compliance | Privacy incidents | Zero breaches of anonymity thresholds | Audit log of all queries; minimum group size enforcement | | Maturity | Analytics maturity level progression | Advance 1 level per 12-18 months | Self-assessment against the Analytics Maturity Model |
Scope & Limitations
In Scope:
- Workforce descriptive analytics: headcount, turnover, retention, demographics, tenure distribution
- Engagement survey design, analysis, driver identification, and benchmarking
- Attrition risk scoring using rule-based and statistical methods (standard library only)
- Pay equity analysis: raw gap, controlled gap, outlier flagging
- DEI metrics: representation, progression rates, hiring funnel equity
- Workforce planning: headcount forecasting, scenario modeling, gap analysis
- Dashboard design and KPI framework recommendations
Out of Scope:
- Real-time predictive models requiring ML frameworks (scikit-learn, TensorFlow) -- scripts use rule-based scoring for portability
- Sentiment analysis of free-text survey responses (requires NLP libraries)
- Individual employee profiling or surveillance -- all analysis uses aggregated or anonymized data
- HRIS system administration, data pipeline engineering, or ETL development
- Legal interpretation of pay equity findings (requires Employment Law counsel)
- Organizational network analysis requiring email/calendar metadata
Known Limitations:
- Attrition risk scoring in scripts uses weighted heuristics, not trained ML models; accuracy depends on feature quality and weight calibration
- Pay equity analysis in the SKILL.md examples requires statsmodels (external dependency); scripts use standard-library approximations
- Survey analysis assumes Likert scale (1-5) responses; other formats require preprocessing
- Small population segments (< 30) produce unreliable statistical results; flag these in reporting
- Historical data biases (e.g., biased performance ratings) propagate into predictive models if not addressed
Integration Points
| System / Skill | Integration | Data Flow | |----------------|-------------|-----------| | HRIS (Workday, BambooHR, HiBob) | Employee master data, tenure, compensation, performance ratings | HRIS -> analytics data lake; analytics insights -> HRBP workforce plans | | ATS (Greenhouse, Lever) | Hiring funnel data, source-of-hire, time-to-fill | ATS -> hiring analytics; quality-of-hire scoring feeds back to TA strategy | | Survey Platform (Culture Amp, Qualtrics, Lattice) | Engagement survey responses, eNPS, pulse check data | Survey platform -> survey_analyzer.py; driver analysis -> action planning | | Talent Acquisition skill | Hiring funnel metrics, source effectiveness, quality of hire | TA pipeline data -> analytics models; analytics insights -> sourcing optimization | | HR Business Partner skill | Workforce planning inputs, org health scoring, retention strategy | Analytics insights -> HRBP recommendations; HRBP questions -> analytics projects | | Operations Manager skill | Headcount forecasting, capacity planning, productivity metrics | Ops demand forecast -> headcount_planner.py; workforce metrics -> ops capacity models | | Finance skill | Compensation budgets, cost modeling, headcount budget vs actual | Finance comp data -> pay equity analysis; headcount plan -> Finance budget model | | Payroll (ADP, Gusto) | Compensation actuals, bonus payouts, overtime data | Payroll -> comp analysis; pay equity findings -> comp adjustment recommendations | | BI Platform (Tableau, Looker, Power BI) | Dashboard hosting, self-service analytics, scheduled reporting | Analytics outputs -> BI dashboards; BI usage metrics -> adoption tracking |