Comprehensive skills and occupation taxonomy extracted from 900M+ job postings. Includes 37,000+ technical skills, 3,000+ professional certifications, and 400+ soft skills identified through a two-step process: Aho-Corasick dictionary matching followed by NLP relevance filtering to remove spurious matches. Coverage exceeds >85% for descriptions over 200 characters, with 5-15 skills per posting on average (2023+). SOC classification uses title + description context (not title-only matching), achieving >95% at 2-digit and 85-92% at 6-digit level. Taxonomy match rate >90%.
All taxonomy fields are derived from the Model Garden NLP pipeline. Skills, SOC codes, seniority, and normalized titles are available on every job posting record.
nlpSkillsnlpSoftSkillsnlpCertificationsnlpQualificationsnlpSocCodenlpSocTitlenlpNormalizedTitlenlpNormalizedTitleScorenlpSeniorityJob Market Data for HR Tech Platforms
Add salary benchmarking and skills intelligence to your platform without building ML
Job Market Training Data for AI & ML Teams
Pre-enriched, deduplicated job market training data. Skip 6 months of pipeline building.
Job Market Data for Academic Research
Longitudinal dataset for labor economics: wage dynamics, skill demand, remote work adoption