Open Data Open Jobs
Earlier this year, the Commonwealth Center for Advanced Research and Statistics (CCARS) initiated a pilot project to create an open "real-time" data set of advertised job postings in Virginia. This data set is the initial outcome of the pilot. Work will continue to build upon what we were able to accomplish in just a few short months with more data streams offered in "real-time" and more robust data enrichments.
Currently, the Open Data Open Jobs data set combines data from three sources:
Students from the Discovery Analytics Center at Virginia Tech ingested, cleansed, enriched, and de-duplicated the data from each source to come up with a single data set. Below is an overview of the major steps taken to achieve this result:
Mapped job postings from all three sources to the job-posting schema standard
Enriched job postings with average wage data from the Georgetown University’s Center for Education and the Workforce and job title normalization assistance from Glassdoor
De-duplicated the data using an algorithm to identify identical job postings
Currently, these job posting datasets are limited in its reach and should be used with some caveats in mind. This data set does not cover all job openings in Virginia advertised online. Not all sources of data used to create this set are "real-time." Currently, data supplied from the Virginia Workforce Connection is a snapshot of data from a point in time. Efforts are underway to explore access to these jobs in "real-time." Additionally, the schema tagged jobs pulled into this data set are limited to those jobs tagged with a "Veteran Hiring Commitment." Work is in progress to find an alternative source of these jobs that extends beyond the Veteran population. Skills and education data enrichments partnerships are in progress as well.