The flagship Canaria dataset: 907M deduplicated job postings aggregated from 15+ sources and enriched with 82 structured fields through our NLP enrichment pipeline. Sources include Indeed (226M), LinkedIn (176M), PJF (216M), SimplyHired (105M), 200,000+ ATS employer career portals, CareerBuilder, and more. Semantic deduplication removes 40-60% of cross-source duplicates using vector similarity, MinHash/Jaccard, and graph-based transitive matching. Every record includes SOC classification, seniority (100% complete), salary prediction (MAPE <15%), work mode detection, and skills extraction from a 37,000+-skill taxonomy.
All records are fully enriched through our NLP pipeline. Raw and enriched fields are delivered together for full transparency.
jobTitlecompanyNamelocationdescriptiondatePostednormTitlesocsocTitlesenioritysalaryAvgAnnualnlpSkillsnlpSoftSkillsremoteemploymentsrcBaseInteractive charts from our 900M+ deduplicated job postings, updated daily.
A preview of real records from this dataset. Unlock all fields by requesting a free sample.
| Job Title | Company | City | State | Seniority | Work Mode | Min Salary | Max Salary | SOC Code | SOC Title | +8 more |
|---|---|---|---|---|---|---|---|---|---|---|
| Senior Software Engineer | Mountain View | CA | Senior | Hybrid | 185,000 | 255,000 | 15-1252 | Software Developers | … | |
| Data Analyst | JPMorgan Chase | New York | NY | Mid | On-site | 95,000 | 130,000 | 15-2051 | Data Scientists | … |
| Product Manager | Meta | Menlo Park | CA | Senior | Hybrid | 172,000 | 240,000 | 11-2021 | Marketing Managers | ... |
| Registered Nurse | HCA Healthcare | Nashville | TN | Mid | On-site | 72,000 | 95,000 | 29-1141 | Registered Nurses | ... |
| DevOps Engineer | Datadog | Boston | MA | Mid | Remote | 140,000 | 185,000 | 15-1244 | Network Architects | ... |
See how Canaria transforms a basic job posting into a fully enriched record with 82 structured fields.
What scrapers give you
No SOC codes, no salary prediction, no skills extraction,
no seniority, no deduplication...
Building enrichment in-house costs $500K-$1M Year 1
82 fields per record
+ 62 more fields (location, dedup metadata, qualifications, clearance, travel...)
Labor Market Data for Investment Professionals
Hiring velocity as an economic leading indicator with SOC-level granularity
Job Market Data for Competitive Intelligence
Competitor hiring patterns, skills trends, and geographic expansion signals
Job Market Data for HR Tech Platforms
Add salary benchmarking and skills intelligence to your platform without building ML
Job Market Training Data for AI & ML Teams
Pre-enriched, deduplicated job market training data. Skip 6 months of pipeline building.
Job Market Data for Workforce Planning
Enterprise-grade labor market intelligence with full data transparency, no vendor lock-in
Job Market Data for Academic Research
Longitudinal dataset for labor economics: wage dynamics, skill demand, remote work adoption
Job Market Data for Recruiting & Staffing
Fresh daily job market data for content, outreach, and market positioning.
Job Market Data for Consulting Firms
Project-ready labor market data delivered in 24 hours, no annual contract required.
Job Market Data for DEI and ESG Analytics
Benchmark employer benefits across 286M+ records: 401K, PTO, health insurance, equity compensation, and 17 more benefit types.
Job Market Data for Healthcare Workforce Planning
Track clinical hiring pipelines with degree-level granularity: 338M+ degree requirement records across nursing, allied health, and physician roles.