Commit c68646ca authored by Aaron Spring's avatar Aaron Spring 🚼
Browse files

showcase different data access options

parent a9710f00
# CHANGELOG
### unreleased
- Add notebooks showcasing accessing output of different models from different sources: (!2, [Aaron Spring](https://renkulab.io/gitlab/aaron.spring))
- S2S-Project models:
- from from European Weather Cloud:
- [`climetlab-s2s-ai-challenge`](https://github.com/ecmwf-lab/climetlab-s2s-ai-challenge/) [recommended], see [`climetlab-s2s-ai-challenge` notebooks](https://github.com/ecmwf-lab/climetlab-s2s-ai-challenge/tree/main/notebooks)
- `curl` & `wget`, see [wget_curl.ipynb](https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/blob/master/notebooks/data_access/wget_curl.ipynb)
- `intake`, see [intake.ipynb](https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/blob/master/notebooks/data_access/intake.ipynb)
- `IRIDL` including overview, see [IRIDL.ipynb](https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/blob/master/notebooks/data_access/IRIDL.ipynb)
- SubX-Project models: `IRIDL` including overview, see [IRIDL.ipynb](https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/blob/master/notebooks/data_access/IRIDL.ipynb)
- How to access password-protected S2S-Project output from IRIDL with xarray? see [IRIDL.ipynb](https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/blob/master/notebooks/data_access/IRIDL.ipynb)
- fix `netcdf4` version to `1.5.4` for `opendap` to work lazily with `xarray` (!2, [Aaron Spring](https://renkulab.io/gitlab/aaron.spring))
### 2021-05-31: `v0.2` *release*
After this `v0.2` release, this CHANGELOG.md will describe all changes made in this template repository.
- update `README` how to join competition, please `git pull` if you forked before
- find status of your submission in `s2s-ai-competition-scoring-image` https://renkulab.io/gitlab/tasko.olevski/s2s-ai-competition-scoring-image/-/blob/master/README.md
- calculate `RPSS` with respect to climatology (not ECMWF anymore)
- calculate `RPSS` with respect to climatology (not ECMWF anymore) ([Aaron Spring](https://renkulab.io/gitlab/aaron.spring))
- update `RPSS_verification.ipynb`
- update `scorer`: https://renkulab.io/gitlab/tasko.olevski/s2s-ai-competition-scoring-image
- update `scorer`: https://renkulab.io/gitlab/tasko.olevski/s2s-ai-competition-scoring-image ([Tasko Olevski](https://renkulab.io/gitlab/tasko.olevski), [Aaron Spring](https://renkulab.io/gitlab/aaron.spring))
- Averaged ECMWF RPSS skill value to beat at least: -0.0070
### 2021-05-26: `v0.1` *pre-release*
- update `README` how to join competition !4
- git lfs track zarr: `git lfs track "**/*.zarr/**"` !4
- add notebooks: !4
- update `README` how to join competition (!4, [Aaron Spring](https://renkulab.io/gitlab/aaron.spring))
- git lfs track zarr: `git lfs track "**/*.zarr/**"` (!4, [Aaron Spring](https://renkulab.io/gitlab/aaron.spring))
- add notebooks: (!4, [Aaron Spring](https://renkulab.io/gitlab/aaron.spring))
- create renku datasets: `renku_datasets_biweekly.ipynb`
- RPSS verification: `RPSS_verification.ipynb`
- ML train and predict based on weatherbench: `ML_train_and_predict.ipynb`
- mean bias reduction: `mean_bias_reduction.ipynb`
- template for training and predictions: `ML_forecast_template.ipynb`
- add renku dataset `s2s-ai-challenge` with files: !4
- add renku dataset `s2s-ai-challenge` with files: (!4, [Aaron Spring](https://renkulab.io/gitlab/aaron.spring))
- `hindcast-like-observations_2000-2019_biweekly_deterministic.zarr`
- `forecast-like-observations_2020_biweekly_deterministic.zarr`
- `hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc`
......@@ -31,10 +45,10 @@ After this `v0.2` release, this CHANGELOG.md will describe all changes made in t
- `ecmwf_forecast-input_2020_biweekly_deterministic.zarr`
- `ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr`
- `ecmwf_recalibrated_benchmark_2020_biweekly_terciled.nc`
- add reproducibility section below in training !4
- add reproducibility section below in training (!4, [Aaron Spring](https://renkulab.io/gitlab/aaron.spring))
- how to deal with this dry mask? provide as renku dataset? now implicitly masked in categorized observations `obs_p`
- justify if training takes more than a week !4
- show RPS for all years. ToDo: take RPSS #4
- justify if training takes more than a week (!4, [Aaron Spring](https://renkulab.io/gitlab/aaron.spring))
- show RPS for all years. ~~ToDo: take RPSS~~ (!4, [Aaron Spring](https://renkulab.io/gitlab/aaron.spring))
......
......@@ -5,27 +5,27 @@ dependencies:
- xarray
# ML
- tensorflow
#- pytorch
- pytorch
# viz
- matplotlib-base
# - cartopy
# scoring
- xskillscore # includes sklearn
- xskillscore>=0.0.20 # includes sklearn
# data access
#- intake
#- fsspec
- intake
- fsspec
- zarr
- s3fs
#- intake-xarray
- intake-xarray
- cfgrib
#- pydap
#- h5netcdf
# - netcdf4#==1.5.1 # see https://github.com/pydata/xarray/issues/4925
- nc-time-axis
- pydap
- h5netcdf
- netcdf4==1.5.3
- pip
- pip:
- climetlab >= 0.7.0
- climetlab_s2s_ai_challenge >= 0.6.3
- configargparse # for weatherbench
- netcdf4 # ==1.5.1 # see https://github.com/pydata/xarray/issues/4925
- git+https://github.com/phausamann/sklearn-xarray.git@develop
- netcdf4==1.5.4
prefix: "/opt/conda"
plugins:
source:
- module: intake_xarray
sources:
training-input:
description: climetlab name in AI/ML community naming for hindcasts as input to the ML-model in training period
driver: netcdf
parameters:
model:
description: name of the S2S model
type: str
default: ecmwf
allowed: [ecmwf, eccc, ncep]
param:
description: variable name
type: str
default: tp
allowed: [t2m, ci, gh, lsm, msl, q, rsn, sm100, sm20, sp, sst, st100, st20, t, tcc, tcw, ttr, tp, v, u]
date:
description: initialization weekly thursdays
type: datetime
default: 2020.01.02
min: 2020.01.02
max: 2020.12.31
version:
description: versioning of the data
type: str
default: 0.3.0
format:
description: data type
type: str
default: netcdf
allowed: [netcdf, grib]
ending:
description: data format compatible with format; netcdf -> nc, grib -> grib
type: str
default: nc
allowed: [nc, grib]
xarray_kwargs:
engine: h5netcdf
args: # add simplecache:: for caching: https://filesystem-spec.readthedocs.io/en/latest/features.html#caching-files-locally
urlpath: https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/training-input/{{version}}/{{format}}/{{model}}-hindcast-{{param}}-{{date.strftime("%Y%m%d")}}.{{ending}}
test-input:
description: climetlab name in AI/ML community naming for 2020 forecasts as input to ML model in test period 2020
driver: netcdf
parameters:
model:
description: name of the S2S model
type: str
default: ecmwf
allowed: [ecmwf, eccc, ncep]
param:
description: variable name
type: str
default: tp
allowed: [t2m, ci, gh, lsm, msl, q, rsn, sm100, sm20, sp, sst, st100, st20, t, tcc, tcw, ttr, tp, v, u]
date:
description: initialization weekly thursdays
type: datetime
default: 2020.01.02
min: 2020.01.02
max: 2020.12.31
version:
description: versioning of the data
type: str
default: 0.3.0
format:
description: data type
type: str
default: netcdf
allowed: [netcdf, grib]
ending:
description: data format compatible with format; netcdf -> nc, grib -> grib
type: str
default: nc
allowed: [nc, grib]
xarray_kwargs:
engine: h5netcdf
args: # add simplecache:: for caching: https://filesystem-spec.readthedocs.io/en/latest/features.html#caching-files-locally
urlpath: https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-input/{{version}}/{{format}}/{{model}}-forecast-{{param}}-{{date.strftime("%Y%m%d")}}.{{ending}}
training-output-reference:
description: climetlab name in AI/ML community naming for 2020 forecasts as output reference to compare to ML model output to in training period
driver: netcdf
parameters:
param:
description: variable name
type: str
default: tp
allowed: [t2m, tp]
date:
description: initialization weekly thursdays
type: datetime
default: 2020.01.02
min: 2020.01.02
max: 2020.12.31
xarray_kwargs:
engine: h5netcdf
args: # add simplecache:: for caching: https://filesystem-spec.readthedocs.io/en/latest/features.html#caching-files-locally
urlpath: https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-output-reference/{{param}}-{{date.strftime("%Y%m%d")}}.nc
test-output-reference:
description: climetlab name in AI/ML community naming for 2020 forecasts as output reference to compare to ML model output to in test period 2020
driver: netcdf
parameters:
param:
description: variable name
type: str
default: tp
allowed: [t2m, tp]
date:
description: initialization weekly thursdays
type: datetime
default: 2020.01.02
min: 2020.01.02
max: 2020.12.31
xarray_kwargs:
engine: h5netcdf
args: # add simplecache:: for caching: https://filesystem-spec.readthedocs.io/en/latest/features.html#caching-files-locally
urlpath: https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-output-reference/{{param}}-{{date.strftime("%Y%m%d")}}.nc
This diff is collapsed.
# Data Access
- European Weather Cloud:
- [`climetlab-s2s-ai-challenge`](https://github.com/ecmwf-lab/climetlab-s2s-ai-challenge)
- `wget`: wget_curl.ipynb
- `curl`: wget_curl.ipynb
- `mouse`: wget_curl.ipynb
- `intake`: intake.ipynb
- [IRI Data Library](iridl.ldeo.columbia.edu/): IRIDL.ipynb
- S2S: http://iridl.ldeo.columbia.edu/SOURCES/.ECMWF/.S2S/ (restricted access explained in IRIDL.ipynb)
- SubX: http://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/
- NMME: http://iridl.ldeo.columbia.edu/SOURCES/.Models/.NMME/
- s2sprediction.net
This diff is collapsed.
%% Cell type:markdown id: tags:
# Data Access via `curl` or `wget`
Data easily available via `climetlab`: https://github.com/ecmwf-lab/climetlab-s2s-ai-challenge
Data holdings listed:
- https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-input/0.3.0/netcdf/index.html
- https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/training-input/0.3.0/netcdf/index.html
- https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-output-reference/index.html
- https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/training-output-reference/index.html
Therefore, S3 data also accessible with `curl` or `wget`. Alternatively, you can click on the html links and download files by mouse click.
%% Cell type:code id: tags:
``` python
import xarray as xr
import os
from subprocess import call
xr.set_options(display_style='text')
```
%%%% Output: stream
/opt/conda/lib/python3.8/site-packages/xarray/backends/cfgrib_.py:27: UserWarning: Failed to load cfgrib - most likely there is a problem accessing the ecCodes library. Try `import cfgrib` to get the full error message
warnings.warn(
%%%% Output: execute_result
<xarray.core.options.set_options at 0x7f5170570520>
%% Cell type:code id: tags:
``` python
# version of the EWC data
version = '0.3.0'
```
%% Cell type:markdown id: tags:
# `hindcast-input`
on-the-fly hindcasts corresponding to the 2020 forecasts
%% Cell type:code id: tags:
``` python
parameter = 't2m'
date = '20200102'
model = 'ecmwf'
```
%% Cell type:code id: tags:
``` python
url = f'https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/training-input/{version}/netcdf/{model}-hindcast-{parameter}-{date}.nc'
os.system(f'wget {url}')
assert os.path.exists(f'{model}-hindcast-{parameter}-{date}.nc')
```
%% Cell type:markdown id: tags:
# `forecast-input`
2020
%% Cell type:code id: tags:
``` python
url = f'https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-input/{version}/netcdf/{model}-forecast-{parameter}-{date}.nc'
os.system(f'wget {url}')
assert os.path.exists(f'{model}-forecast-{parameter}-{date}.nc')
```
%% Cell type:markdown id: tags:
# `hindcast-like-observations`
CPC observations formatted like training period hindcasts
%% Cell type:code id: tags:
``` python
url = f'https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/training-output-reference/{parameter}-{date}.nc'
os.system(f'wget {url}')
assert os.path.exists(f'{parameter}-{date}.nc')
```
%% Cell type:markdown id: tags:
# `forecast-like-observations`
CPC observations formatted like test period 2020 forecasts
%% Cell type:code id: tags:
``` python
url = f'https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/data/test-output-reference/{parameter}-{date}.nc'
os.system(f'wget {url}')
assert os.path.exists(f'{parameter}-{date}.nc')
```
%% Cell type:code id: tags:
``` python
```
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment