# Covid-19 Public Data Collaboration Project This project aggregates data from various public sources to better understand the spread and effect of COVID-19. The goal is to provide a central place where data, analysis, and discussion can be conducted and shared by a global community struggling to make sense of the current public health emergency. For each data source, we provide a simple summary notebook with interactive figures: * [Summary of global data from from JHU CSSE](https://renkulab.io/projects/covid-19/covid-19-public-data/files/blob/runs/Dashboard.run.ipynb) * [Global data from from ECDC](https://renkulab.io/projects/covid-19/covid-19-public-data/files/blob/runs/covid-19-ecdc.run.ipynb) * [U.S. state-level data from covidtracking.com](https://renkulab.io/projects/covid-19/covid-19-public-data/files/blob/runs/covidtracking.run.ipynb) * [U.S. county-level data from the New York Times](https://renkulab.io/projects/covid-19/covid-19-public-data/files/blob/runs/covid-19-us-nyt.run.ipynb) * [Regional data for Italy from italian Civil Protection](https://renkulab.io/projects/covid-19/covid-19-public-data/files/blob/runs/covid-19-italy.run.ipynb) * [Switzerland cantonal data collected by the Zürich Statistical Office](https://renkulab.io/projects/covid-19/covid-19-public-data/files/blob/runs/openzh-covid-19.run.ipynb) Case data is complemented by population figures from various sources. A summary of all the data can be found in the table below. ## Getting started with the project The goal of this project is not to build yet another dashboard - instead, it provides a place for easy access to the relevant data for the purposes of analysis and collaboration. This project is envisioned to be hands-on; with a few clicks you can be analysing the latest data from around the globe. The simplest way to start is to make an account or logging in and forking the project. Then, [start an interactive environment](https://renkulab.io/projects/covid-19/covid-19-public-data/environments/new) and use the hosted JupyterLab or RStudio to explore the data. If you don't know how to do something shoot us a line [on Discourse](https://renku.discourse.group), chat with us on [gitter](https://gitter.im/SwissDataScienceCenter/renku) or [open an issue](https://renkulab.io/projects/covid-19/covid-19-public-data/collaboration/issues) and someone will be able to help out. Is there a great data source that you wish we had included? Start a [discussion](https://renkulab.io/projects/covid-19/covid-19-public-data/collaboration/issues)! ## Working with the data A summary of the datasets available in this project is in the table below. In order to work more efficiently with the data, we have implemented a set of "converters" to standardize the various datasets to a subset of useful fields. Each converter is aware of the details of each dataset and produces a view of the dataset that is homogenized with the others. In this way, data from different sources can be used efficiently with minimal boilerplate code. For example, to work with the JHU-CSSE country-level data as well as the more detailed dataset from Spain: ```python from covid_19_utils.converters import CaseConverter converter = CaseConverter('./data/atlas') jhu_df = converter.read_convert('./data/covid-19_jhu-csse') spain_df = converter.read_convert('./data/covid-19-spain') ``` The resulting DataFrames have exactly the same structure so they can be used interchangably in any analysis or plotting code. See the [standardization notebook]('https://renkulab.io/projects/covid-19/covid-19-public-data/files/blob/notebooks/process/standardize_datasets.ipynb') for a more complete example. ### Updating your branch or fork The data in the main master branch of this project is updated daily - how can you keep your fork or branch up-to-date? We recommend that you do not make changes to the files and directories that are automatically updated so as to avoid merge conflicts as much as possible. This includes the datasets in the `data/` directory and the notebooks in `notebooks/` and `runs/`. Especially for notebooks, the easiest way to avoid conflicts would be to simply make a new directory where you put your work. When you are ready to pull in changes from master, you can do the following from a terminal, when working on your branch or fork: ``` git remote add upstream https://renkulab.io/gitlab/covid-19/covid-19-public-data.git git fetch upstream git merge upstream/master ``` This will sync your branch or fork with the latest changes from the master branch of the parent repository. ### Project structure `data/`: contains all of the datasets. `notebooks/`: contains the sample notebooks. The ones in the base directory are executed automatically every time the project is updated and their rendered versions can be found in the `runs/` directory. `runs/`: contains executed (rendered) versions of various pre- and post-processing notebooks. `src/covid-19/covid_19_utils`: contains the data converters as well as some useful helper and plotting functions that are used in the sample notebooks. ## Dataset Summary
| Dataset | Location | Code |
|---|---|---|
| Case population rates | data/covid-19_rates |
notebooks/process/ToRates.ipynb |