The Times Switches to CDC Covid Data, Ending Daily Collection
After more than three years of daily reporting on the number of Covid-19 cases and deaths in every county in the United States, The New York Times is ending its Covid data-gathering operation. The Times will continue to publish its Covid tracking pages for the United States, only now they will be based on the latest information available from the federal government, not the Times’s data set.
The tracking pages will still show data about hospital patients with Covid; reported cases and tests; and how many people have died from the virus. Data on vaccination rates and comparisons between vaccinated and unvaccinated populations will also remain.
A new interactive county map will show local levels of Covid-19 from the Centers for Disease Control and Prevention, which combine case and hospitalization data to determine the current impact of the virus on communities.
The data will be updated weekly instead of daily, and charts will include historical revisions as reported by the CDC.
Why we’re making this change
Since nearly the beginning of the pandemic, The Times has been collecting and standardizing Covid data from hundreds of state and local sources. The CDC now has a similar process: The agency collects data from hospitals, counties and states, and then it standardizes and reports the data to the public.
While Covid still kills thousands across the United States every week, the data from state and local sources is reported less frequently and less reliably. The comprehensive real-time reporting that The Times has prioritized is no longer possible.
At the same time, the data offered by the federal government has become more consistent, and it is sometimes the only source of information about Covid in parts of the country. Several states report data to the CDC but no longer report this information directly to the public.
Nebraska and Florida were the first states to significantly reduce public data reporting in the summer of 2021. Since then, most states have reduced the frequency of updates to once a week, and several no longer maintain public dashboards or reports.
What we can learn from individual metrics has also changed. Cases are widely undercounted because of the rise of at-home testing, the results of which are mostly unreported. Test positivity rates, which can still be useful as an indicator that infections are rising or falling, are much higher than earlier in the pandemic because more negative tests go unreported.
Hospitalization data is a more reliable indicator for trends in infections and severe cases at the local level because testing is more common in hospitals. Going forward, The Times’s tracking pages will highlight hospitalizations, which is reported directly by individual hospitals to the federal government.
How The Times filled a public health data gap
As the virus began to spread rapidly in the United States in March 2020, it became clear that there was no single source that tracked infections at the local level. In the absence of comprehensive government data, The Times quickly built a custom system for gathering, vetting and publishing data from more than 100 state and local government sources.
By collecting the data continuously, and from multiple levels of government, The Times was able to map the spread of the virus, with updated information published several times a day.
The Times’s open-source data set was first made available to the public in March 2020. The project eventually grew to include 350 custom data scrapers, and it was supplemented daily by manual data collection for some locations that published their updates as images, social media posts or in other irregular formats. Since states and counties often had unique reporting methodologies, The Times made efforts to standardize the data every day for every county in the country.
To pull this off, more than 160 Times staffers, freelancers and college students worked on the project over the last three years.
Closely tracking the data revealed weaknesses in the nation’s health care system, which is largely decentralized and dependent on local health departments that often have inadequate staffing and outdated technology.
The lack of readily available public health data meant that The Times and other organizations like the COVID Tracking Project and Johns Hopkins University were able to provide important, timely information before the federal government, which took many more months to report similar data.
The final database that powered The Times’s public data included more than 62 million rows of information. Times scrapers retrieved data from public sources more than 353,000 times.