Job Postings Data

The flagship Canaria dataset: 907M deduplicated job postings aggregated from 15+ sources and enriched with 82 structured fields through our NLP enrichment pipeline. Sources include Indeed (226M), LinkedIn (176M), PJF (216M), SimplyHired (105M), 200,000+ ATS employer career portals, CareerBuilder, and more. Semantic deduplication removes 40-60% of cross-source duplicates using vector similarity, MinHash/Jaccard, and graph-based transitive matching. Every record includes SOC classification, seniority (100% complete), salary prediction (MAPE <15%), work mode detection, and skills extraction from a 37,000+-skill taxonomy.

All records are fully enriched through our NLP pipeline. Raw and enriched fields are delivered together for full transparency.

Key Highlights

Multi-source semantic deduplication at 40-60% rate using vector similarity, MinHash/Jaccard, and graph-based transitive matching
Full NLP enrichment pipeline producing 82 structured fields per record
SOC classification using title + description context (>95% at 2-digit, 85-92% at 6-digit)
Seniority classification: 100% complete (always returns a value)
Work mode detection (remote, hybrid, on-site) extracted from description text
Salary prediction (MAPE <15%) trained on 50M+ Glassdoor/Indeed observations
Source composition: Indeed 226M, LinkedIn 176M, PJF 216M, SimplyHired 105M, 200K+ ATS portals, and more

Use Cases

Market research and competitive intelligence
Economic indicators and labor market signals
Workforce planning and talent strategy
AI/ML training data
Academic research and longitudinal analysis
Recruiting and staffing intelligence

Sample FieldsView full schema

jobTitlecompanyNamelocationdescriptiondatePostednormTitlesocsocTitlesenioritysalaryAvgAnnualnlpSkillsnlpSoftSkillsremoteemploymentsrcBase

Delivery Formats

CSVParquetS3GCSSnowflakeSFTP

Request a Free Sample

See This Data Live

Interactive charts from our 900M+ deduplicated job postings, updated daily.

Explore all data →

Sample Records

A preview of real records from this dataset. Unlock all fields by requesting a free sample.

Job Title	Company	City	State	Seniority	Work Mode	Min Salary	Max Salary	SOC Code	SOC Title	+8 more
Senior Software Engineer	Google	Mountain View	CA	Senior	Hybrid	185,000	255,000	15-1252	Software Developers	…
Data Analyst	JPMorgan Chase	New York	NY	Mid	On-site	95,000	130,000	15-2051	Data Scientists	…
Product Manager	Meta	Menlo Park	CA	Senior	Hybrid	172,000	240,000	11-2021	Marketing Managers	...
Registered Nurse	HCA Healthcare	Nashville	TN	Mid	On-site	72,000	95,000	29-1141	Registered Nurses	...
DevOps Engineer	Datadog	Boston	MA	Mid	Remote	140,000	185,000	15-1244	Network Architects	...

Showing 5 records with 6 of 82 fields visible

Raw vs. Enriched

See how Canaria transforms a basic job posting into a fully enriched record with 82 structured fields.

Raw Data

What scrapers give you

Job TitleSr. Software Engineer

CompanyAcme Corp

LocationSan Francisco, CA

Salary$180,000 - $240,000

EmploymentFull-time

URLindeed.com/viewjob?jk=abc123

No SOC codes, no salary prediction, no skills extraction,
no seniority, no deduplication...

Building enrichment in-house costs $500K-$1M Year 1

Canaria Enriched

82 fields per record

Job TitleSr. Software Engineer

CompanyAcme Corp

LocationSan Francisco, CA 94105

Posted Salary$180,000 - $240,000

EmploymentFull-time

+ Normalized TitleSoftware Engineer

+ SOC Code15-1252 (Software Developers)

+ SenioritySenior

+ Work ModeHybrid

+ Predicted Salary$195,000 - $225,000 (confidence: 0.92)

+ SkillsPython, AWS, Kubernetes, PostgreSQL, React

+ Soft SkillsLeadership, Communication

+ CertificationsAWS Solutions Architect

+ BenefitsHealth Insurance, 401k, Stock Options, PTO

+ Visa SponsorshipYes

+ ManagerialNo

+ IndustryTechnology

+ Company Size1,001 - 5,000

+ Degree RequiredBachelor's

+ Experience5-8 years software engineering

+ 62 more fields (location, dedup metadata, qualifications, clearance, travel...)

See Full 100-Record Comparison

Relevant Solutions

Labor Market Data for Investment Professionals

Hiring velocity as an economic leading indicator with SOC-level granularity

Job Market Data for Competitive Intelligence

Competitor hiring patterns, skills trends, and geographic expansion signals

Job Market Data for HR Tech Platforms

Add salary benchmarking and skills intelligence to your platform without building ML

Job Market Training Data for AI & ML Teams

Pre-enriched, deduplicated job market training data. Skip 6 months of pipeline building.

Job Market Data for Workforce Planning

Enterprise-grade labor market intelligence with full data transparency, no vendor lock-in

Job Market Data for Academic Research

Longitudinal dataset for labor economics: wage dynamics, skill demand, remote work adoption

Job Market Data for Recruiting & Staffing

Fresh daily job market data for content, outreach, and market positioning.

Job Market Data for Consulting Firms

Project-ready labor market data delivered in 24 hours, no annual contract required.

Job Market Data for DEI and ESG Analytics

Benchmark employer benefits across 286M+ records: 401K, PTO, health insurance, equity compensation, and 17 more benefit types.

Job Market Data for Healthcare Workforce Planning

Track clinical hiring pipelines with degree-level granularity: 338M+ degree requirement records across nursing, allied health, and physician roles.

Data Schema

Explore field definitions and coverage

Methodology

How we collect and enrich data

Provider Comparison

How we compare to alternatives

Job Postings Data

All records are fully enriched through our NLP pipeline. Raw and enriched fields are delivered together for full transparency.

Key Highlights

Multi-source semantic deduplication at 40-60% rate using vector similarity, MinHash/Jaccard, and graph-based transitive matching
Full NLP enrichment pipeline producing 82 structured fields per record
SOC classification using title + description context (>95% at 2-digit, 85-92% at 6-digit)
Seniority classification: 100% complete (always returns a value)
Work mode detection (remote, hybrid, on-site) extracted from description text
Salary prediction (MAPE <15%) trained on 50M+ Glassdoor/Indeed observations
Source composition: Indeed 226M, LinkedIn 176M, PJF 216M, SimplyHired 105M, 200K+ ATS portals, and more

Use Cases

Market research and competitive intelligence
Economic indicators and labor market signals
Workforce planning and talent strategy
AI/ML training data
Academic research and longitudinal analysis
Recruiting and staffing intelligence

Sample FieldsView full schema

jobTitlecompanyNamelocationdescriptiondatePostednormTitlesocsocTitlesenioritysalaryAvgAnnualnlpSkillsnlpSoftSkillsremoteemploymentsrcBase

Delivery Formats

CSVParquetS3GCSSnowflakeSFTP

Request a Free Sample

See This Data Live

Interactive charts from our 900M+ deduplicated job postings, updated daily.

Top In-Demand Skills

Monthly Posting Trends

Explore all data →

Sample Records

A preview of real records from this dataset. Unlock all fields by requesting a free sample.

Job Title	Company	City	State	Seniority	Work Mode	Min Salary	Max Salary	SOC Code	SOC Title	+8 more
Senior Software Engineer	Google	Mountain View	CA	Senior	Hybrid	185,000	255,000	15-1252	Software Developers	…
Data Analyst	JPMorgan Chase	New York	NY	Mid	On-site	95,000	130,000	15-2051	Data Scientists	…
Product Manager	Meta	Menlo Park	CA	Senior	Hybrid	172,000	240,000	11-2021	Marketing Managers	...
Registered Nurse	HCA Healthcare	Nashville	TN	Mid	On-site	72,000	95,000	29-1141	Registered Nurses	...
DevOps Engineer	Datadog	Boston	MA	Mid	Remote	140,000	185,000	15-1244	Network Architects	...

Showing 5 records with 6 of 82 fields visible

Raw vs. Enriched

See how Canaria transforms a basic job posting into a fully enriched record with 82 structured fields.

Raw Data

What scrapers give you

Job TitleSr. Software Engineer

CompanyAcme Corp

LocationSan Francisco, CA

Salary$180,000 - $240,000

EmploymentFull-time

URLindeed.com/viewjob?jk=abc123

No SOC codes, no salary prediction, no skills extraction,
no seniority, no deduplication...

Building enrichment in-house costs $500K-$1M Year 1

Canaria Enriched

82 fields per record

Job TitleSr. Software Engineer

CompanyAcme Corp

LocationSan Francisco, CA 94105

Posted Salary$180,000 - $240,000

EmploymentFull-time

+ Normalized TitleSoftware Engineer

+ SOC Code15-1252 (Software Developers)

+ SenioritySenior

+ Work ModeHybrid

+ Predicted Salary$195,000 - $225,000 (confidence: 0.92)

+ SkillsPython, AWS, Kubernetes, PostgreSQL, React

+ Soft SkillsLeadership, Communication

+ CertificationsAWS Solutions Architect

+ BenefitsHealth Insurance, 401k, Stock Options, PTO

+ Visa SponsorshipYes

+ ManagerialNo

+ IndustryTechnology

+ Company Size1,001 - 5,000

+ Degree RequiredBachelor's

+ Experience5-8 years software engineering

+ 62 more fields (location, dedup metadata, qualifications, clearance, travel...)

See Full 100-Record Comparison

Relevant Solutions

Labor Market Data for Investment Professionals

Hiring velocity as an economic leading indicator with SOC-level granularity

Job Market Data for Competitive Intelligence

Competitor hiring patterns, skills trends, and geographic expansion signals

Job Market Data for HR Tech Platforms

Add salary benchmarking and skills intelligence to your platform without building ML

Job Market Training Data for AI & ML Teams

Pre-enriched, deduplicated job market training data. Skip 6 months of pipeline building.

Job Market Data for Workforce Planning

Enterprise-grade labor market intelligence with full data transparency, no vendor lock-in

Job Market Data for Academic Research

Longitudinal dataset for labor economics: wage dynamics, skill demand, remote work adoption

Job Market Data for Recruiting & Staffing

Fresh daily job market data for content, outreach, and market positioning.

Job Market Data for Consulting Firms

Project-ready labor market data delivered in 24 hours, no annual contract required.

Job Market Data for DEI and ESG Analytics

Benchmark employer benefits across 286M+ records: 401K, PTO, health insurance, equity compensation, and 17 more benefit types.

Job Market Data for Healthcare Workforce Planning

Track clinical hiring pipelines with degree-level granularity: 338M+ degree requirement records across nursing, allied health, and physician roles.

Data Schema

Explore field definitions and coverage

Methodology

How we collect and enrich data

Provider Comparison

How we compare to alternatives