Commit 64898ca9 authored by CR (covid cron)'s avatar CR (covid cron) Committed by renku 0.10.0
Browse files

renku update --with-siblings

parent bda5230b
class: Workflow
cwlVersion: v1.0
hints: []
inputs:
input_1:
default:
class: File
path: ../../notebooks/covidtracking-dashboard.ipynb
streamable: false
type: File
input_10:
default:
class: File
path: ../../data/ch-population-statistics/ch-population-by-age-canton.xls
streamable: false
type: File
input_11:
default: ts_folder
streamable: false
type: string
input_12:
default:
class: Directory
listing: []
path: /tmp/covid-19-public-data/data/covid-19_jhu-csse
streamable: false
type: Directory
input_13:
default: rates_folder
streamable: false
type: string
input_14:
default: geodata_path
streamable: false
type: string
input_15:
default:
class: File
path: ../../data/geodata/geo_data.csv
streamable: false
type: File
input_16:
default:
class: File
path: ../../notebooks/Dashboard.ipynb
streamable: false
type: File
input_17:
default: runs/Dashboard.run.ipynb
streamable: false
type: string
input_18:
default:
class: File
path: ../../notebooks/examples/italy-covid-19.ipynb
streamable: false
type: File
input_19:
default: runs/italy-covid-19.ipynb
streamable: false
type: string
input_2:
default: runs/covidtracking-dashboard.ipynb
streamable: false
type: string
input_20:
default: data_folder
streamable: false
type: string
input_21:
default:
class: Directory
listing: []
path: /tmp/covid-19-public-data/data/covid-19-italy
streamable: false
type: Directory
input_22:
default: ts_folder
streamable: false
type: string
input_23:
default: runs/ToRates.run.ipynb
streamable: false
type: string
input_24:
default:
class: Directory
listing: []
path: /tmp/covid-19-public-data/data/covid-19_jhu-csse
streamable: false
type: Directory
input_25:
default: wb_path
streamable: false
type: string
input_26:
default:
class: File
path: ../../data/worldbank/SP.POP.TOTL.zip
streamable: false
type: File
input_27:
default: geodata_path
streamable: false
type: string
input_28:
default:
class: File
path: ../../data/geodata/geo_data.csv
streamable: false
type: File
input_29:
default: out_folder
streamable: false
type: string
input_3:
default: data_path
streamable: false
type: string
input_30:
default: data/covid-19_rates
streamable: false
type: string
input_31:
default:
class: File
path: ../../notebooks/process/ToRates.ipynb
streamable: false
type: File
input_4:
default:
class: Directory
listing: []
path: /tmp/covid-19-public-data/data/covidtracking
streamable: false
type: Directory
input_5:
default:
class: File
path: ../../data/geodata/us_pop_fung_2019.csv
streamable: false
type: File
input_6:
default:
class: File
path: ../../notebooks/openzh-covid-19-dashboard.ipynb
streamable: false
type: File
input_7:
default: runs/openzh-covid-19-dashboard.run.ipynb
streamable: false
type: string
input_8:
default: data_path
streamable: false
type: string
input_9:
default:
class: Directory
listing: []
path: /tmp/covid-19-public-data/data/openzh-covid-19
streamable: false
type: Directory
outputs:
output_0:
outputSource: step_3/output_0
streamable: false
type: File
output_1:
outputSource: step_1/output_0
streamable: false
type: File
output_2:
outputSource: step_2/output_0
streamable: false
type: File
output_3:
outputSource: step_5/output_1
streamable: false
type: Directory
output_4:
outputSource: step_4/output_0
streamable: false
type: File
output_5:
outputSource: step_5/output_0
streamable: false
type: File
requirements: []
steps:
step_1:
in:
input_1: input_1
input_2: input_2
input_3: input_3
input_4: input_4
input_5: input_5
out:
- output_0
run: 30e7c6a1fdb74ccea2652434f5c83d13_papermill.cwl
step_2:
in:
input_1: input_6
input_2: input_7
input_3: input_8
input_4: input_9
input_5: input_10
out:
- output_0
run: 0415b23203ef422185f0ecf77290cbd3_papermill.cwl
step_3:
in:
input_2: input_11
input_3: input_12
input_4: input_13
input_5: step_5/output_1
input_6: input_14
input_7: input_15
input_8: input_16
input_9: input_17
out:
- output_0
run: e85c3f23b37044c580388dc2ce1fb946_papermill.cwl
step_4:
in:
input_1: input_18
input_2: input_19
input_3: input_20
input_4: input_21
out:
- output_0
run: f2c255fbcb87444ea80501341f0b7f1a_papermill.cwl
step_5:
in:
input_1: input_22
input_10: input_23
input_2: input_24
input_3: input_25
input_4: input_26
input_5: input_27
input_6: input_28
input_7: input_29
input_8: input_30
input_9: input_31
out:
- output_0
- output_1
run: 342d5ac23ef74852a0ecfa61ff854182_papermill.cwl
This source diff could not be displayed because it is stored in LFS. You can view the blob instead.
This source diff could not be displayed because it is stored in LFS. You can view the blob instead.
This diff is collapsed.
%% Cell type:markdown id: tags:
# Convert Series to Rates per 100,000
%% Cell type:code id: tags:
``` python
import pandas as pd
import os
```
%% Cell type:code id: tags:parameters
``` python
ts_folder = "../../data/covid-19_jhu-csse/"
wb_path = "../../data/worldbank/SP.POP.TOTL.zip"
geodata_path = "../../data/geodata/geo_data.csv"
out_folder = None
PAPERMILL_OUTPUT_PATH = None
```
%% Cell type:code id: tags:injected-parameters
``` python
# Parameters
PAPERMILL_INPUT_PATH = "/tmp/8p13hxw8/notebooks/process/ToRates.ipynb"
PAPERMILL_INPUT_PATH = "/tmp/0xt6p8xn/notebooks/process/ToRates.ipynb"
PAPERMILL_OUTPUT_PATH = "runs/ToRates.run.ipynb"
ts_folder = "/tmp/8p13hxw8/data/covid-19_jhu-csse"
wb_path = "/tmp/8p13hxw8/data/worldbank/SP.POP.TOTL.zip"
geodata_path = "/tmp/8p13hxw8/data/geodata/geo_data.csv"
ts_folder = "/tmp/0xt6p8xn/data/covid-19_jhu-csse"
wb_path = "/tmp/0xt6p8xn/data/worldbank/SP.POP.TOTL.zip"
geodata_path = "/tmp/0xt6p8xn/data/geodata/geo_data.csv"
out_folder = "data/covid-19_rates"
```
%% Cell type:markdown id: tags:parameters
## Read in JHU CSSE data
I will switch to [xarray](http://xarray.pydata.org/en/stable/), but ATM, it's easier like this...
%% Cell type:code id: tags:
``` python
def read_jhu_covid_region_df(name):
filename = os.path.join(ts_folder, f"time_series_covid19_{name}_global.csv")
df = pd.read_csv(filename)
df = df.set_index(['Country/Region', 'Province/State', 'Lat', 'Long'])
df.columns = pd.to_datetime(df.columns)
region_df = df.groupby(level='Country/Region').sum()
loc_df = df.reset_index([2,3]).groupby(level='Country/Region').mean()[['Long', 'Lat']]
return region_df.join(loc_df).set_index(['Long', 'Lat'], append=True)
```
%% Cell type:code id: tags:
``` python
frames_map = {
"confirmed": read_jhu_covid_region_df("confirmed"),
"deaths": read_jhu_covid_region_df("deaths"),
}
```
%% Cell type:markdown id: tags:
# Read in World Bank data
%% Cell type:code id: tags:
``` python
import zipfile
zf = zipfile.ZipFile(wb_path)
pop_df = pd.read_csv(zf.open("API_SP.POP.TOTL_DS2_en_csv_v2_821007.csv"), skiprows=4)
```
%% Cell type:markdown id: tags:
There is 2018 pop data for all countries/regions except Eritrea
%% Cell type:code id: tags:
``` python
pop_df[pd.isna(pop_df['2018'])]
```
%%%% Output: execute_result
Country Name Country Code Indicator Name Indicator Code 1960 \
67 Eritrea ERI Population, total SP.POP.TOTL 1007590.0
108 Not classified INX Population, total SP.POP.TOTL NaN
1961 1962 1963 1964 1965 ... 2011 \
67 1033328.0 1060486.0 1088854.0 1118159.0 1148189.0 ... 3213972.0
108 NaN NaN NaN NaN NaN ... NaN
2012 2013 2014 2015 2016 2017 2018 2019 Unnamed: 64
67 NaN NaN NaN NaN NaN NaN NaN NaN NaN
108 NaN NaN NaN NaN NaN NaN NaN NaN NaN
[2 rows x 65 columns]
%% Cell type:markdown id: tags:
Fix the country/region names that differ between the World Bank population data and the JHU CSSE data.
%% Cell type:code id: tags:
``` python
region_wb_jhu_map = {
'Brunei Darussalam': 'Brunei',
'Czech Republic': 'Czechia',
'Egypt, Arab Rep.': 'Egypt',
'Hong Kong SAR, China': 'Hong Kong SAR',
'Iran, Islamic Rep.': 'Iran',
'Korea, Rep.': 'Korea, South',
'Macao SAR, China': 'Macao SAR',
'Russian Federation': 'Russia',
'Slovak Republic': 'Slovakia',
'St. Martin (French part)': 'Saint Martin',
'United States': 'US'
}
current_pop_ser = pop_df[['Country Name', '2018']].copy().replace(region_wb_jhu_map).set_index('Country Name')['2018']
data_pop_ser = current_pop_ser[current_pop_ser.index.isin(frames_map['confirmed'].index.levels[0])]
```
%% Cell type:code id: tags:
``` python
# Use this to find the name in the series
# current_pop_ser[current_pop_ser.index.str.contains('Czech')]
```
%% Cell type:markdown id: tags:
There are some regions that we cannot resolve, but we will just ignore these.
%% Cell type:code id: tags:
``` python
frames_map['confirmed'].loc[
frames_map['confirmed'].index.levels[0].isin(data_pop_ser.index) == False
].iloc[:,-2:]
```
%%%% Output: execute_result
2020-03-29 00:00:00 \
2020-03-30 00:00:00 \
Country/Region Long Lat
Bahamas -77.396300 25.034300 11
Burma 95.956000 21.916200 10
Bahamas -77.396300 25.034300 14
Burma 95.956000 21.916200 14
Congo (Brazzaville) 21.758700 -4.038300 19
Congo (Kinshasa) 21.758700 -4.038300 65
Congo (Kinshasa) 21.758700 -4.038300 81
Diamond Princess 0.000000 0.000000 712
Gambia -15.310100 13.443200 4
Holy See 12.453400 41.902900 6
Kyrgyzstan 74.766100 41.204400 84
Kyrgyzstan 74.766100 41.204400 94
Laos 102.495496 19.856270 8
MS Zaandam 0.000000 0.000000 2
Saint Kitts and Nevis -62.782998 17.357822 2
Saint Kitts and Nevis -62.782998 17.357822 7
Saint Lucia -60.978900 13.909400 9
Saint Vincent and the Grenadines -61.287200 12.984300 1
Syria 38.996815 34.802075 9
Taiwan* 121.000000 23.700000 298
Venezuela -66.589700 6.423800 119
Syria 38.996815 34.802075 10
Taiwan* 121.000000 23.700000 306
Venezuela -66.589700 6.423800 135
2020-03-30 00:00:00
2020-03-31 00:00:00
Country/Region Long Lat
Bahamas -77.396300 25.034300 14
Burma 95.956000 21.916200 14
Burma 95.956000 21.916200 15
Congo (Brazzaville) 21.758700 -4.038300 19
Congo (Kinshasa) 21.758700 -4.038300 81
Congo (Kinshasa) 21.758700 -4.038300 98
Diamond Princess 0.000000 0.000000 712
Gambia -15.310100 13.443200 4
Holy See 12.453400 41.902900 6
Kyrgyzstan 74.766100 41.204400 94
Laos 102.495496 19.856270 8
Kyrgyzstan 74.766100 41.204400 107
Laos 102.495496 19.856270 9
MS Zaandam 0.000000 0.000000 2
Saint Kitts and Nevis -62.782998 17.357822 7
Saint Lucia -60.978900 13.909400 9
Saint Kitts and Nevis -62.782998 17.357822 8
Saint Lucia -60.978900 13.909400 13
Saint Vincent and the Grenadines -61.287200 12.984300 1
Syria 38.996815 34.802075 10
Taiwan* 121.000000 23.700000 306
Taiwan* 121.000000 23.700000 322
Venezuela -66.589700 6.423800 135
%% Cell type:markdown id: tags:
# Read in geodata to get additional population numbers
%% Cell type:code id: tags:
``` python
geodata_df = pd.read_csv(geodata_path).drop('Unnamed: 0', axis=1).set_index('name_jhu')
```
%% Cell type:markdown id: tags:
Add in populations for missing countries
%% Cell type:code id: tags:
``` python
missing_countries = frames_map['confirmed'].loc[
frames_map['confirmed'].index.levels[0].isin(data_pop_ser.index) == False
].iloc[:,-2:].reset_index()['Country/Region']
display(geodata_df.loc[geodata_df.index.isin(missing_countries)])
data_pop_ser = data_pop_ser.append(geodata_df.loc[geodata_df.index.isin(missing_countries), 'pop_est'])
```
%%%% Output: display_data
%% Cell type:markdown id: tags:
# Compute rates per 100,000 for regions
%% Cell type:code id: tags:
``` python
def cases_to_rates_df(df):
per_100000_df = df.reset_index([1, 2], drop=True)
per_100000_df = per_100000_df.div(data_pop_ser, 'index').mul(100000).dropna()
per_100000_df.index.name = 'Country/Region'
return per_100000_df
def frames_to_rates(frames_map):
return {k: cases_to_rates_df(v) for k,v in frames_map.items()}
rates_map = frames_to_rates(frames_map)
```
%% Cell type:code id: tags:
``` python
if PAPERMILL_OUTPUT_PATH:
for k, v in rates_map.items():
out_path = os.path.join(out_folder, f"ts_rates_19-covid-{k}.csv")
v.reset_index().to_csv(out_path)
```
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment