Commit c4a3a9b2 authored by CI-bot's avatar CI-bot Committed by renku 0.10.4.dev13
Browse files

renku rerun data/covidtracking/states-metadata.json data/covidtracking/states-daily.json

parent 01a58d19
class: Workflow
cwlVersion: v1.0
hints: []
inputs:
input_1:
default: out_folder
streamable: false
type: string
input_2:
default: data/covidtracking
streamable: false
type: string
input_3:
default:
class: File
path: ../../notebooks/process/download-covidtracking-data.ipynb
streamable: false
type: File
input_4:
default: runs/download-covidtracking-data.runs.ipynb
streamable: false
type: string
input_5:
default: states-metadata.json
streamable: false
type: string
input_6:
default: states-daily.json
streamable: false
type: string
outputs:
output_0:
outputSource: step_1/output_1
streamable: false
type: Directory
output_2:
outputSource: step_1/output_0
streamable: false
type: File
requirements: []
steps:
step_1:
in:
input_1: input_1
input_2: input_2
input_3: input_3
input_4: input_4
out:
- output_1
- output_0
run: a17d560c41a54f5aa307ce5f3c5effe5_papermill.cwl
step_2:
in:
filename: input_5
input_directory: step_1/output_1
out:
- output_file
run:
arguments: []
baseCommand:
- 'true'
class: CommandLineTool
cwlVersion: v1.0
hints: []
inputs:
filename:
default: states-metadata.json
streamable: false
type: string
input_directory:
streamable: false
type: Directory
outputs:
output_file:
outputBinding:
glob: $(inputs.filename)
streamable: false
type: File
permanentFailCodes: []
requirements:
- &id001
class: InlineJavascriptRequirement
- &id002
class: InitialWorkDirRequirement
listing: $(inputs.input_directory.listing)
successCodes: []
temporaryFailCodes: []
step_3:
in:
filename: input_6
input_directory: step_1/output_1
out:
- output_file
run:
arguments: []
baseCommand:
- 'true'
class: CommandLineTool
cwlVersion: v1.0
hints: []
inputs:
filename:
default: states-daily.json
streamable: false
type: string
input_directory:
streamable: false
type: Directory
outputs:
output_file:
outputBinding:
glob: $(inputs.filename)
streamable: false
type: File
permanentFailCodes: []
requirements:
- *id001
- *id002
successCodes: []
temporaryFailCodes: []
This source diff could not be displayed because it is stored in LFS. You can view the blob instead.
This source diff could not be displayed because it is stored in LFS. You can view the blob instead.
%% Cell type:code id: tags:
``` python
import requests
import os
import pandas as pd
```
%% Cell type:code id: tags:parameters
``` python
out_folder = "../data/covidtracking/"
PAPERMILL_OUTPUT_PATH = None
```
%% Cell type:code id: tags:injected-parameters
``` python
# Parameters
PAPERMILL_INPUT_PATH = "/tmp/osodv0vi/notebooks/process/download-covidtracking-data.ipynb"
PAPERMILL_INPUT_PATH = "/tmp/x8u9p2lc/notebooks/process/download-covidtracking-data.ipynb"
PAPERMILL_OUTPUT_PATH = "runs/download-covidtracking-data.runs.ipynb"
out_folder = "data/covidtracking"
```
%% Cell type:markdown id: tags:
# Download state metadata
Download a dataset of URLs for data for each US state and several territories. See [Google Doc](https://docs.google.com/spreadsheets/d/18oVRrHj3c183mHmq3m89_163yuYltLNlOmPerQ18E8w/htmlview?sle=true).
%% Cell type:code id: tags:
``` python
url = 'http://covidtracking.com/api/states/info'
r = requests.get(url, allow_redirects=True)
states_metadata_json = r.content
```
%% Cell type:code id: tags:
``` python
# save the result
if PAPERMILL_OUTPUT_PATH:
out_path = os.path.join(out_folder, 'states-metadata.json')
with open(out_path, 'wb') as f:
f.write(states_metadata_json)
```
%% Cell type:code id: tags:
``` python
metadata_df = pd.read_json(states_metadata_json)
print(len(metadata_df), "states and territories have metadata")
metadata_df.head(2)
```
%%%% Output: stream
56 states and territories have metadata
%%%% Output: execute_result
state covid19SiteOld \
0 AK http://dhss.alaska.gov/dph/Epi/id/Pages/COVID-...
1 AL http://www.alabamapublichealth.gov/infectiousd...
covid19Site \
0 http://dhss.alaska.gov/dph/Epi/id/Pages/COVID-...
1 https://alpublichealth.maps.arcgis.com/apps/op...
covid19SiteSecondary twitter \
0 http://dhss.alaska.gov/dph/Epi/id/Pages/COVID-... @Alaska_DHSS
1 https://dph1.adph.state.al.us/covid-19/ @alpublichealth
pui pum notes fips \
0 All data False Total tests are taken from the annotations on ... 2
1 No data False Negatives = (Totals - Positives) \nPositives o... 1
name
0 Alaska
1 Alabama
%% Cell type:markdown id: tags:
# Download daily state data
%% Cell type:code id: tags:
``` python
url = 'https://covidtracking.com/api/states/daily'
r = requests.get(url, allow_redirects=True)
states_daily_json = r.content
```
%% Cell type:code id: tags:
``` python
# save the result
if PAPERMILL_OUTPUT_PATH:
out_path = os.path.join(out_folder, 'states-daily.json')
with open(out_path, 'wb') as f:
f.write(states_daily_json)
```
%% Cell type:code id: tags:
``` python
data_df = pd.read_json(states_daily_json)
print(len(data_df), "data points")
data_df.head(2)
```
%%%% Output: stream
3769 data points
3825 data points
%%%% Output: execute_result
date state positive negative pending hospitalizedCurrently \
0 20200511 AK 381.0 28299.0 NaN 7.0
1 20200511 AL 10009.0 119435.0 NaN NaN
0 20200512 AK 383.0 29578.0 NaN 10.0
1 20200512 AL 10310.0 122908.0 NaN NaN
hospitalizedCumulative inIcuCurrently inIcuCumulative \
0 NaN NaN NaN
1 1256.0 NaN 463.0
1 1287.0 NaN 468.0
onVentilatorCurrently ... hospitalized total totalTestResults posNeg \
0 NaN ... NaN 28680 28680 28680
1 NaN ... 1256.0 129444 129444 129444
0 NaN ... NaN 29961 29961 29961
1 NaN ... 1287.0 133218 133218 133218
fips deathIncrease hospitalizedIncrease negativeIncrease \
0 2 0.0 0.0 1314.0
1 1 8.0 16.0 1791.0
0 2 0.0 0.0 1279.0
1 1 28.0 31.0 3473.0
positiveIncrease totalTestResultsIncrease
0 2.0 1316.0
1 232.0 2023.0
0 2.0 1281.0
1 301.0 3774.0
[2 rows x 27 columns]
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment