Commit 9a18ef0a authored by CR (covid cron)'s avatar CR (covid cron) Committed by renku 0.10.2
Browse files

renku rerun data/covidtracking/states-metadata.json data/covidtracking/states-daily.json

parent cbf9bc60
class: Workflow
cwlVersion: v1.0
hints: []
inputs:
input_1:
default: states-metadata.json
streamable: false
type: string
input_2:
default: out_folder
streamable: false
type: string
input_3:
default: data/covidtracking
streamable: false
type: string
input_4:
default:
class: File
path: ../../notebooks/process/download-covidtracking-data.ipynb
streamable: false
type: File
input_5:
default: runs/download-covidtracking-data.runs.ipynb
streamable: false
type: string
input_6:
default: states-daily.json
streamable: false
type: string
outputs:
output_0:
outputSource: step_2/output_1
streamable: false
type: Directory
output_2:
outputSource: step_2/output_0
streamable: false
type: File
requirements: []
steps:
step_1:
in:
filename: input_1
input_directory: step_2/output_1
out:
- output_file
run:
arguments: []
baseCommand:
- 'true'
class: CommandLineTool
cwlVersion: v1.0
hints: []
inputs:
filename:
default: states-metadata.json
streamable: false
type: string
input_directory:
streamable: false
type: Directory
outputs:
output_file:
outputBinding:
glob: $(inputs.filename)
streamable: false
type: File
permanentFailCodes: []
requirements:
- &id001
class: InlineJavascriptRequirement
- &id002
class: InitialWorkDirRequirement
listing: $(inputs.input_directory.listing)
successCodes: []
temporaryFailCodes: []
step_2:
in:
input_1: input_2
input_2: input_3
input_3: input_4
input_4: input_5
out:
- output_0
- output_1
run: a17d560c41a54f5aa307ce5f3c5effe5_papermill.cwl
step_3:
in:
filename: input_6
input_directory: step_2/output_1
out:
- output_file
run:
arguments: []
baseCommand:
- 'true'
class: CommandLineTool
cwlVersion: v1.0
hints: []
inputs:
filename:
default: states-daily.json
streamable: false
type: string
input_directory:
streamable: false
type: Directory
outputs:
output_file:
outputBinding:
glob: $(inputs.filename)
streamable: false
type: File
permanentFailCodes: []
requirements:
- *id001
- *id002
successCodes: []
temporaryFailCodes: []
This source diff could not be displayed because it is stored in LFS. You can view the blob instead.
This source diff could not be displayed because it is stored in LFS. You can view the blob instead.
%% Cell type:code id: tags:
``` python
import requests
import os
import pandas as pd
```
%% Cell type:code id: tags:parameters
``` python
out_folder = "../data/covidtracking/"
PAPERMILL_OUTPUT_PATH = None
```
%% Cell type:code id: tags:injected-parameters
``` python
# Parameters
PAPERMILL_INPUT_PATH = "/tmp/lhx0ak2n/notebooks/process/download-covidtracking-data.ipynb"
PAPERMILL_INPUT_PATH = "/tmp/e17s3uzn/notebooks/process/download-covidtracking-data.ipynb"
PAPERMILL_OUTPUT_PATH = "runs/download-covidtracking-data.runs.ipynb"
out_folder = "data/covidtracking"
```
%% Cell type:markdown id: tags:
# Download state metadata
Download a dataset of URLs for data for each US state and several territories. See [Google Doc](https://docs.google.com/spreadsheets/d/18oVRrHj3c183mHmq3m89_163yuYltLNlOmPerQ18E8w/htmlview?sle=true).
%% Cell type:code id: tags:
``` python
url = 'http://covidtracking.com/api/states/info'
r = requests.get(url, allow_redirects=True)
states_metadata_json = r.content
```
%% Cell type:code id: tags:
``` python
# save the result
if PAPERMILL_OUTPUT_PATH:
out_path = os.path.join(out_folder, 'states-metadata.json')
with open(out_path, 'wb') as f:
f.write(states_metadata_json)
```
%% Cell type:code id: tags:
``` python
metadata_df = pd.read_json(states_metadata_json)
print(len(metadata_df), "states and territories have metadata")
metadata_df.head(2)
```
%%%% Output: stream
56 states and territories have metadata
%%%% Output: execute_result
state covid19SiteOld \
0 AK http://dhss.alaska.gov/dph/Epi/id/Pages/COVID-...
1 AL http://www.alabamapublichealth.gov/infectiousd...
covid19Site \
0 http://dhss.alaska.gov/dph/Epi/id/Pages/COVID-...
1 https://alpublichealth.maps.arcgis.com/apps/op...
covid19SiteSecondary twitter \
0 http://dhss.alaska.gov/dph/Epi/id/Pages/COVID-... @Alaska_DHSS
1 None @alpublichealth
pui pum notes fips \
0 All data False Total tests are taken from the annotations on ... 2
1 No data False Negatives = (Totals - Positives) \nPositives o... 1
name
0 Alaska
1 Alabama
%% Cell type:markdown id: tags:
# Download daily state data
%% Cell type:code id: tags:
``` python
url = 'https://covidtracking.com/api/states/daily'
r = requests.get(url, allow_redirects=True)
states_daily_json = r.content
```
%% Cell type:code id: tags:
``` python
# save the result
if PAPERMILL_OUTPUT_PATH:
out_path = os.path.join(out_folder, 'states-daily.json')
with open(out_path, 'wb') as f:
f.write(states_daily_json)
```
%% Cell type:code id: tags:
``` python
data_df = pd.read_json(states_daily_json)
print(len(data_df), "data points")
data_df.head(2)
```
%%%% Output: stream
1765 data points
1821 data points
%%%% Output: execute_result
date state positive negative pending hospitalizedCurrently \
0 20200406 AK 191.0 6692.0 NaN NaN
1 20200406 AL 1968.0 12797.0 NaN NaN
0 20200407 AK 213.0 6700.0 NaN NaN
1 20200407 AL 2119.0 12797.0 NaN NaN
hospitalizedCumulative inIcuCurrently inIcuCumulative \
0 23.0 NaN NaN
1 240.0 NaN NaN
1 271.0 NaN NaN
onVentilatorCurrently ... hospitalized total totalTestResults posNeg \
0 NaN ... 23.0 6883 6883 6883
1 NaN ... 240.0 14765 14765 14765
0 NaN ... 23.0 6913 6913 6913
1 NaN ... 271.0 14916 14916 14916
fips deathIncrease hospitalizedIncrease negativeIncrease \
0 2 0.0 3.0 593.0
1 1 5.0 9.0 1515.0
0 2 0.0 0.0 8.0
1 1 6.0 31.0 0.0
positiveIncrease totalTestResultsIncrease
0 6.0 599.0
1 172.0 1687.0
0 22.0 30.0
1 151.0 151.0
[2 rows x 25 columns]
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment