Data scrapers#

These scripts pull data from various sources for use in Covasim. To run all scrapers, simply type

./run_scrapers

1. Corona Data Scraper#

To quote the Corona Data Scraper web page,

Corona Data Scraper pulls COVID-19 Coronavirus case data from verified sources.

These are scraped by the loader below, and placed in the data/epi_data/corona-data-scraper-project directory. The data is in CSV format.

Here is a sample of the data.

key

population

aggregate

cum_positives

cum_death

cum_recovered

cum_active

cum_tests

cum_hospitalized

cum_discharged

date

day

positives

death

tests

hospitalized

discharged

recovered

active

57089

Roane County, Tennessee, United States

53382.0

county

1.0

2020-03-21

0

1.0

57090

Roane County, Tennessee, United States

53382.0

county

1.0

2020-03-22

1

0.0

57091

Roane County, Tennessee, United States

53382.0

county

1.0

2020-03-23

2

0.0

57092

Roane County, Tennessee, United States

53382.0

county

1.0

2020-03-24

3

0.0

57093

Roane County, Tennessee, United States

53382.0

county

1.0

2020-03-25

4

0.0

57094

Roane County, Tennessee, United States

53382.0

county

1.0

2020-03-26

5

0.0

57095

Roane County, Tennessee, United States

53382.0

county

1.0

2020-03-27

6

0.0

57096

Roane County, Tennessee, United States

53382.0

county

1.0

2020-03-28

7

0.0

57097

Roane County, Tennessee, United States

53382.0

county

2.0

2020-03-29

8

1.0

57098

Roane County, Tennessee, United States

53382.0

county

2.0

2020-03-30

9

0.0

57099

Roane County, Tennessee, United States

53382.0

county

2.0

88.0

2020-03-31

10

0.0

88.0

57100

Roane County, Tennessee, United States

53382.0

county

2.0

91.0

2020-04-01

11

0.0

3.0

57101

Roane County, Tennessee, United States

53382.0

county

3.0

131.0

2020-04-02

12

1.0

40.0

57102

Roane County, Tennessee, United States

53382.0

county

3.0

150.0

2020-04-03

13

0.0

19.0

Updating:: To update the Corona Data Scraper data,

python data/load_corona_data_scraper_data.py

As of April 4, 2020, there are apparently 3874 data sets.

2. European Centre for Disease Prevention and Control#

To quote the European Centre for Disease Prevention and Control web page,

Since the beginning of the coronavirus pandemic, ECDC’s Epidemic Intelligence team has been collecting the number of COVID-19 cases and deaths, based on reports from health authorities worldwide. This comprehensive and systematic process is carried out on a daily basis. To insure the accuracy and reliability of the data, this process is being constantly refined. This helps to monitor and interpret the dynamics of the COVID-19 pandemic not only in the European Union (EU), the European Economic Area (EEA), but also worldwide.

The data is stored in CSV format in data/epi_data/european-centre-for-disease-prevention-and-control

Here is a sample of the data:

day

new_positives

new_death

key

population

date

3960

0

2

0

Greenland

56025.0

2020-03-20

3959

1

0

0

Greenland

56025.0

2020-03-21

3958

2

0

0

Greenland

56025.0

2020-03-22

3957

3

0

0

Greenland

56025.0

2020-03-23

3956

4

2

0

Greenland

56025.0

2020-03-24

3955

5

0

0

Greenland

56025.0

2020-03-25

3954

6

1

0

Greenland

56025.0

2020-03-26

3953

7

1

0

Greenland

56025.0

2020-03-27

3952

8

3

0

Greenland

56025.0

2020-03-28

3951

9

1

0

Greenland

56025.0

2020-03-29

3950

10

0

0

Greenland

56025.0

2020-03-30

3949

11

0

0

Greenland

56025.0

2020-03-31

3948

12

0

0

Greenland

56025.0

2020-04-01

3947

13

0

0

Greenland

56025.0

2020-04-02

Updating:: To update the Corona Data Scraper data,

python data/load_ecdp_data.py

This adds data from 10,538 countries and territories (as of April 15, 2020), including Africa, Asia, the Americas, Europe, and Oceania. More details at: https://www.ecdc.europa.eu/en/geographical-distribution-2019-ncov-cases

3. The COVID Tracking Project#

The COVID Tracking Project “obtains, organizes, and publishes high-quality data required to understand and respond to the COVID-19 outbreak in the United States.” The project website is https://covidtracking.com

We transform this data for use in the Covasim parameter format. It is stored in CSV-format in the ata/epi_data/covid-tracking-project directory.

date

key

cum_hospitalized

cum_in_icu

cum_on_ventilator

death

new_death

new_hospitalized

new_negatives

new_positives

new_tests

day

num_icu

num_on_ventilator

2210

2020-03-04

NY

0

2191

2020-03-05

NY

0.0

0.0

28.0

16.0

44.0

1

2163

2020-03-06

NY

0.0

0.0

16.0

11.0

27.0

2

2122

2020-03-07

NY

0.0

0.0

0.0

43.0

43.0

3

2071

2020-03-08

NY

0.0

0.0

0.0

29.0

29.0

4

2020

2020-03-09

NY

0.0

0.0

0.0

37.0

37.0

5

1969

2020-03-10

NY

0.0

0.0

0.0

31.0

31.0

6

1918

2020-03-11

NY

0.0

0.0

0.0

43.0

43.0

7

1867

2020-03-12

NY

0.0

0.0

0.0

0.0

0.0

8

1816

2020-03-13

NY

0.0

0.0

2687.0

205.0

2892.0

9

1765

2020-03-14

NY

0.0

0.0

0.0

103.0

103.0

10

1714

2020-03-15

NY

3.0

3.0

0.0

1764.0

205.0

1969.0

11

1661

2020-03-16

NY

7.0

4.0

0.0

0.0

221.0

221.0

12

1605

2020-03-17

NY

7.0

0.0

0.0

963.0

750.0

1713.0

13

Updating:: To update the COVID Tracking Project data,

python data/load_covid_tracking_project_data.py

4. Demographic data scraper#

To scrape demographic data, run

python data/load_demographic_data.py