Data scrapers

These scripts pull data from various sources for use in Covasim. To run all scrapers, simply type

./run_scrapers

1. Corona Data Scraper

To quote the Corona Data Scraper web page,

Corona Data Scraper pulls COVID-19 Coronavirus case data from verified sources.

These are scraped by the loader below, and placed in the data/epi_data/corona-data-scraper-project directory. The data is in CSV format.

Here is a sample of the data.

  key population aggregate cum_positives cum_death cum_recovered cum_active cum_tests cum_hospitalized cum_discharged date day positives death tests hospitalized discharged recovered active
57089 Roane County, Tennessee, United States 53382.0 county 1.0             2020-03-21 0 1.0            
57090 Roane County, Tennessee, United States 53382.0 county 1.0             2020-03-22 1 0.0            
57091 Roane County, Tennessee, United States 53382.0 county 1.0             2020-03-23 2 0.0            
57092 Roane County, Tennessee, United States 53382.0 county 1.0             2020-03-24 3 0.0            
57093 Roane County, Tennessee, United States 53382.0 county 1.0             2020-03-25 4 0.0            
57094 Roane County, Tennessee, United States 53382.0 county 1.0             2020-03-26 5 0.0            
57095 Roane County, Tennessee, United States 53382.0 county 1.0             2020-03-27 6 0.0            
57096 Roane County, Tennessee, United States 53382.0 county 1.0             2020-03-28 7 0.0            
57097 Roane County, Tennessee, United States 53382.0 county 2.0             2020-03-29 8 1.0            
57098 Roane County, Tennessee, United States 53382.0 county 2.0             2020-03-30 9 0.0            
57099 Roane County, Tennessee, United States 53382.0 county 2.0       88.0     2020-03-31 10 0.0   88.0        
57100 Roane County, Tennessee, United States 53382.0 county 2.0       91.0     2020-04-01 11 0.0   3.0        
57101 Roane County, Tennessee, United States 53382.0 county 3.0       131.0     2020-04-02 12 1.0   40.0        
57102 Roane County, Tennessee, United States 53382.0 county 3.0       150.0     2020-04-03 13 0.0   19.0        

Updating:: To update the Corona Data Scraper data,

python data/load_corona_data_scraper_data.py

As of April 4, 2020, there are apparently 3874 data sets.

2. European Centre for Disease Prevention and Control

To quote the European Centre for Disease Prevention and Control web page,

Since the beginning of the coronavirus pandemic, ECDC’s Epidemic Intelligence team has been collecting the number of COVID-19 cases and deaths, based on reports from health authorities worldwide. This comprehensive and systematic process is carried out on a daily basis. To insure the accuracy and reliability of the data, this process is being constantly refined. This helps to monitor and interpret the dynamics of the COVID-19 pandemic not only in the European Union (EU), the European Economic Area (EEA), but also worldwide.

The data is stored in CSV format in data/epi_data/european-centre-for-disease-prevention-and-control

Here is a sample of the data:

  day new_positives new_death key population date
3960 0 2 0 Greenland 56025.0 2020-03-20
3959 1 0 0 Greenland 56025.0 2020-03-21
3958 2 0 0 Greenland 56025.0 2020-03-22
3957 3 0 0 Greenland 56025.0 2020-03-23
3956 4 2 0 Greenland 56025.0 2020-03-24
3955 5 0 0 Greenland 56025.0 2020-03-25
3954 6 1 0 Greenland 56025.0 2020-03-26
3953 7 1 0 Greenland 56025.0 2020-03-27
3952 8 3 0 Greenland 56025.0 2020-03-28
3951 9 1 0 Greenland 56025.0 2020-03-29
3950 10 0 0 Greenland 56025.0 2020-03-30
3949 11 0 0 Greenland 56025.0 2020-03-31
3948 12 0 0 Greenland 56025.0 2020-04-01
3947 13 0 0 Greenland 56025.0 2020-04-02

Updating:: To update the Corona Data Scraper data,

python data/load_ecdp_data.py

This adds data from 10,538 countries and territories (as of April 15, 2020), including Africa, Asia, the Americas, Europe, and Oceania. More details at: https://www.ecdc.europa.eu/en/geographical-distribution-2019-ncov-cases

3. The COVID Tracking Project

The COVID Tracking Project “obtains, organizes, and publishes high-quality data required to understand and respond to the COVID-19 outbreak in the United States.” The project website is https://covidtracking.com

We transform this data for use in the Covasim parameter format. It is stored in CSV-format in the ata/epi_data/covid-tracking-project directory.

  date key cum_hospitalized cum_in_icu cum_on_ventilator death new_death new_hospitalized new_negatives new_positives new_tests day num_icu num_on_ventilator
2210 2020-03-04 NY                   0    
2191 2020-03-05 NY         0.0 0.0 28.0 16.0 44.0 1    
2163 2020-03-06 NY         0.0 0.0 16.0 11.0 27.0 2    
2122 2020-03-07 NY         0.0 0.0 0.0 43.0 43.0 3    
2071 2020-03-08 NY         0.0 0.0 0.0 29.0 29.0 4    
2020 2020-03-09 NY         0.0 0.0 0.0 37.0 37.0 5    
1969 2020-03-10 NY         0.0 0.0 0.0 31.0 31.0 6    
1918 2020-03-11 NY         0.0 0.0 0.0 43.0 43.0 7    
1867 2020-03-12 NY         0.0 0.0 0.0 0.0 0.0 8    
1816 2020-03-13 NY         0.0 0.0 2687.0 205.0 2892.0 9    
1765 2020-03-14 NY         0.0 0.0 0.0 103.0 103.0 10    
1714 2020-03-15 NY       3.0 3.0 0.0 1764.0 205.0 1969.0 11    
1661 2020-03-16 NY       7.0 4.0 0.0 0.0 221.0 221.0 12    
1605 2020-03-17 NY       7.0 0.0 0.0 963.0 750.0 1713.0 13    

Updating:: To update the COVID Tracking Project data,

python data/load_covid_tracking_project_data.py

4. Demographic data scraper

To scrape demographic data, run

python data/load_demographic_data.py