Commit dc38ff62 authored by Aaron Spring's avatar Aaron Spring 🚼
Browse files

Merge branch 'SubX_magic' into 'master'

IRIDL server-side processing

Server-side processing:
- [x] select and aggregate `lead_time` `L` and `forecast_time` `S`
- [x] regrid `X` and `Y` to 1.5 deg grid
- [x] SubX intake catalog biweekly 
- [x] requires unreleased climetlab_s2s_ai_challenge plugin
- [ ] init frequency forecast: keep as todo
- [x] IRIDL cat S2S 
- [x] IRIDL obs regrid
- [x] check proper lead ints or floats to match my int lead_time

closes #9

See merge request aaron.spring/s2s-ai-challenge-template!10
parents ae40fed3 fc6223c7
Pipeline #209132 passed with stage
in 19 seconds
......@@ -20,8 +20,10 @@
- `curl` & `wget`, see [wget_curl.ipynb](https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/blob/master/notebooks/data_access/wget_curl.ipynb)
- `intake`, see [intake.ipynb](https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/blob/master/notebooks/data_access/intake.ipynb)
- `IRIDL` including overview, see [IRIDL.ipynb](https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/blob/master/notebooks/data_access/IRIDL.ipynb)
- `intake` catalogs for `IRIDL` [`SubX`](https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/blob/master/notebooks/data_access/SubX_catalog.yml) and [`S2S`](https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/blob/master/notebooks/data_access/S2S_catalog.yml) see [IRIDL.ipynb](https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/blob/master/notebooks/data_access/IRIDL.ipynb) (#9 !10, [Aaron Spring](https://renkulab.io/gitlab/aaron.spring)
- SubX-Project models: `IRIDL` including overview, see [IRIDL.ipynb](https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/blob/master/notebooks/data_access/IRIDL.ipynb)
- How to access password-protected S2S-Project output from IRIDL with xarray? see [IRIDL.ipynb](https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/blob/master/notebooks/data_access/IRIDL.ipynb)
- Access NOAA CPC observations via opendap from `IRIDL`, see [IRIDL.ipynb](https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/blob/master/notebooks/data_access/IRIDL.ipynb) (#9 !10, [Aaron Spring](https://renkulab.io/gitlab/aaron.spring)
- fix `netcdf4` version to `1.5.4` for `opendap` to work lazily with `xarray` (!2, !7, [Aaron Spring](https://renkulab.io/gitlab/aaron.spring))
......
......@@ -24,8 +24,8 @@ dependencies:
- netcdf4
- pip
- pip:
- climetlab >= 0.7.0
- climetlab_s2s_ai_challenge >= 0.6.3
- climetlab >= 0.8.0
- climetlab_s2s_ai_challenge >= 0.7.1
- configargparse # for weatherbench
- netcdf4==1.5.4
prefix: "/opt/conda"
%% Cell type:markdown id: tags:
# Access from `iridl.ldeo.columbia.edu`
# `iridl.ldeo.columbia.edu`
%% Cell type:markdown id: tags:
IRI Data Library (IRIDL) hosts various subseasonal initialized forecast and hindcast simulations:
IRI Data Library (IRIDL) hosts various subseasonal initialized forecast, hindcast simulations and observations:
- `S2S project`:
- http://iridl.ldeo.columbia.edu/SOURCES/.ECMWF/.S2S/
- hindcast/reforecast: one variable, one model:
- login required
- `SubX project`:
- http://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/
- hindcast/reforecast: one variable, one model:
- login not required
- `NOAA CPC` observations:
- ground truth for `s2s-ai-challenge`
- login not required
- `pr`: http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/.CPC/.UNIFIED_PRCP/.GAUGE_BASED/.GLOBAL/.v1p0/.extREALTIME/.rain/
- `t2m`: http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/.CPC/.temperature/.daily/
---
- Notes:
- Output on IRIDL is not always on the 1.5 degree grid requested for the competition. Also dimension names and coordinates differ.
- Beware that most models are not only initialized on thursdays. It is not forbidden to use simulations which are started on other weekdays, buy please pay attention that you may only use information available on `forecast_time`, i.e. if the model is initialized on Mondays, you have to use the day 14+3=17 to day 27+3=30 forecast for week 3-4.
- Beware that conventions (variable name, standard_name, coordinates, output format) from `SubX` and `S2S` might differ.
- First check whether you can find 2020 `test` and previous `training` data for a given feature.
---
This notebook also provides opendap magic, i.e. commands added to the opendap URL which preprocess data server-side. (not implemented)
This notebook also provides opendap magic, i.e. commands added to the opendap URL which preprocess data server-side: select and aggregate `lead_time` `L` and `forecast_time` `S`. Regridding `X` and `Y`.
---
%% Cell type:markdown id: tags:
Todo:
- [ ] how to handle `hdate` best?
- [ ] how to not download empty `S`? specify weekly stride in `S` via ingrid
%% Cell type:markdown id: tags:
# `IRIDL` cookie
%% Cell type:markdown id: tags:
Here are instructions for configuring xarray to open protected Data Library datasets, after you have created a Data Library account and accepted the terms and conditions for the dataset.
1. Visit https://iridl.ldeo.columbia.edu/auth/genkey . Log in to the Data Library. Copy the key from the response.
2. Create a file with the following content, substituting the key from step 1 for `"xxxx"`:
`Set-Cookie: __dlauth_id=xxxx; domain=.iridl.ldeo.columbia.edu`
3. Put the following in `~/.daprc`, which is `/home/jovyan/.daprc` on renku, substituting the path to the above file for `/path/to/cookie/file`:
`HTTP.COOKIEJAR=/path/to/cookie/file`. You may need to copy `.daprc` to `/home/jovyan` on renku, because `/home/jovyan` is not tracked by `git`.
%% Cell type:code id: tags:
``` python
%%writefile /work/s2s-ai-challenge-template/.daprc
HTTP.COOKIEJAR=/work/s2s-ai-challenge-template/.cookie_iridl
```
%%%% Output: stream
Writing /work/s2s-ai-challenge-template/.daprc
Overwriting /work/s2s-ai-challenge-template/.daprc
%% Cell type:code id: tags:
``` python
!cp /work/s2s-ai-challenge-template/.daprc /home/jovyan
```
%% Cell type:code id: tags:
``` python
#%writefile /work/s2s-ai-challenge-template/.cookie_iridl
#Set-Cookie: __dlauth_id=xxxx; domain=.iridl.ldeo.columbia.edu
```
%% Cell type:code id: tags:
``` python
%%writefile /work/s2s-ai-challenge-template/.cookie_iridl
Set-Cookie: __dlauth_id=6d3f0d342e1bdd448b287481f6d7989673305eeba2fa65fabb2709e2d76101b21ae816ffe0560b1a25ed3c8d0bf8884eab7d4bc2; domain=.iridl.ldeo.columbia.edu
```
%%%% Output: stream
Writing /work/s2s-ai-challenge-template/.cookie_iridl
Overwriting /work/s2s-ai-challenge-template/.cookie_iridl
%% Cell type:code id: tags:
``` python
import xarray as xr
xr.set_options(display_style='text')
import pandas as pd
```
%%%% Output: stream
/opt/conda/lib/python3.8/site-packages/xarray/backends/cfgrib_.py:27: UserWarning: Failed to load cfgrib - most likely there is a problem accessing the ecCodes library. Try `import cfgrib` to get the full error message
warnings.warn(
%%%% Output: execute_result
%% Cell type:markdown id: tags:
<xarray.core.options.set_options at 0x7efe3cb51fd0>
# S2S
%% Cell type:markdown id: tags:
Please beawre that most models are not only initialized on thursdays.
It is not forbidden to use simulations which are started on other weekdays,
buy please pay attention that you may only use information available on `forecast_time`,
i.e. if the model is initialized on Mondays, you have to use the day 14+3=17 to day 27+3=30 forecast for week 3-4.
%% Cell type:code id: tags:
``` python
ds = xr.open_dataset('https://iridl.ldeo.columbia.edu/SOURCES/.ECMWF/.S2S/.ECMF/.reforecast/.control/.2m_above_ground/.2t/dods',
chunks='auto', decode_times=False)
```
%%%% Output: stream
/opt/conda/lib/python3.8/site-packages/xarray/backends/plugins.py:61: RuntimeWarning: Engine 'cfgrib' loading failed:
/opt/conda/lib/python3.8/site-packages/gribapi/_bindings.cpython-38-x86_64-linux-gnu.so: undefined symbol: codes_bufr_key_is_header
warnings.warn(f"Engine {name!r} loading failed:\n{ex}", RuntimeWarning)
%% Cell type:code id: tags:
``` python
# calendar '360' not recognized, but '360_day'
if ds.hdate.attrs['calendar'] == '360':
ds.hdate.attrs['calendar'] = '360_day'
```
%% Cell type:code id: tags:
``` python
ds = xr.decode_cf(ds).rename({'X':'longitude', 'Y':'latitude', 'S':'forecast_time', 'LA': 'lead_time', '2t':'t2m'})
ds['t2m']
```
%%%% Output: execute_result
<xarray.DataArray 't2m' (hdate: 26, forecast_time: 637, lead_time: 46, latitude: 121, longitude: 240)>
dask.array<open_dataset-f89df07098f6ce22c120a08e3f3f29a52t, shape=(26, 637, 46, 121, 240), dtype=float32, chunksize=(8, 91, 15, 46, 60), chunktype=numpy.ndarray>
<xarray.DataArray 't2m' (hdate: 26, forecast_time: 641, lead_time: 46, latitude: 121, longitude: 240)>
dask.array<open_dataset-f89df07098f6ce22c120a08e3f3f29a52t, shape=(26, 641, 46, 121, 240), dtype=float32, chunksize=(7, 197, 12, 33, 60), chunktype=numpy.ndarray>
Coordinates:
* latitude (latitude) float32 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
* lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 45 days 12...
* hdate (hdate) object 1995-07-01 00:00:00 ... 2020-07-01 00:00:00
* forecast_time (forecast_time) datetime64[ns] 2015-05-14 ... 2021-06-17
* lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 45 days 12...
* forecast_time (forecast_time) datetime64[ns] 2015-05-14 ... 2021-07-01
* longitude (longitude) float32 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
Attributes:
pointwidth: 0
gribPDSpattern: 04XXXX003D0000
standard_name: air_temperature
long_name: 2-meter Temperature
units: K
standard_name: air_temperature
%% Cell type:code id: tags:
``` python
ds.nbytes/1e9,'GB'
```
%%%% Output: execute_result
(88.496735436, 'GB')
(89.052444908, 'GB')
%% Cell type:code id: tags:
``` python
# hdate gets the privous years reforecast for that dayofyear
```
%% Cell type:markdown id: tags:
## Hindcast Availability
- BOM: BoM POAMA Ensemble.
- CMA: Beijing Climate Center (BCC) Climate Prediction System for S2S.
- CNRM: CNRM Ensemble Prediction System.
- ECCC: ECCC Ensemble Prediction System.
- ECMF: ECMWF Ensemble.
- HMCR: HMCR Ensemble.
- ISAC: ISAC-CNR Ensemble.
- JMA: JMA Ensemble System.
- KMA: KMA Seasonal Prediction System.
- NCEP: NCEP CFSv2 Ensemble.
- UKMO: UKMO Ensemble Prediction System.
%% Cell type:code id: tags:
``` python
models = ['BOM','CNRM','ECCC','ECMF','HMCR','ISAC','JMA','KMA','NCEP','UKMO']
for model in models:
try:
ds = xr.open_dataset(f'https://iridl.ldeo.columbia.edu/SOURCES/.ECMWF/.S2S/.{model}/.reforecast/.perturbed/.2m_above_ground/.2t/dods',
chunks='auto', decode_times=False).rename({'S':'forecast_time', 'LA':'lead_time','M':'realization', 'X':'longitude', 'Y':'latitude'})
# calendar '360' not recognized, but '360_day'
for c in ['hdate','forecast_time']:
if c in ds.coords:
if ds[c].attrs['calendar'] == '360':
ds[c].attrs['calendar'] = '360_day'
ds = xr.decode_cf(ds)
onthefly = True if 'hdate' in ds.coords else False
forecast_time_freq = xr.infer_freq(ds.forecast_time)
print(model, 'on-the-fly' if onthefly else 'not on-the-fly',
'forecast_time freq:'+forecast_time_freq if forecast_time_freq else 'forecast_time freq not found',
'\n',ds.coords,'\n',ds.sizes,ds.nbytes/1e9,'GB','\n')
except Exception as e:
print(f'model={model} failed due to {type(e).__name__}: {e} \n')
```
%%%% Output: stream
BOM not on-the-fly forecast_time freq not found
Coordinates:
* latitude (latitude) float32 88.1 85.64 83.16 ... -83.16 -85.64 -88.1
* lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 61 days 12...
* forecast_time (forecast_time) datetime64[ns] 1981-01-01 ... 2013-12-26
* realization (realization) float32 1.0 2.0 3.0 4.0 ... 29.0 30.0 31.0 32.0
* longitude (longitude) float32 0.0 2.507 5.014 ... 353.5 356.0 358.5
Frozen(SortedKeysDict({'latitude': 72, 'lead_time': 62, 'forecast_time': 2376, 'realization': 32, 'longitude': 144})) 195.498364944 GB
CNRM not on-the-fly forecast_time freq not found
Coordinates:
* latitude (latitude) float32 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
* lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 60 days 12...
* forecast_time (forecast_time) datetime64[ns] 1993-01-01 ... 2014-12-15
* realization (realization) float32 1.0 2.0 3.0 4.0 ... 11.0 12.0 13.0 14.0
* longitude (longitude) float32 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
Frozen(SortedKeysDict({'latitude': 121, 'lead_time': 61, 'forecast_time': 528, 'realization': 14, 'longitude': 240})) 52.377944132 GB
ECCC on-the-fly forecast_time freq:W-THU
Coordinates:
* latitude (latitude) float32 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
* hdate (hdate) object 1995-07-01 00:00:00 ... 2017-07-01 00:00:00
* forecast_time (forecast_time) datetime64[ns] 2016-01-07 ... 2021-06-03
* forecast_time (forecast_time) datetime64[ns] 2016-01-07 ... 2021-06-10
* realization (realization) float32 1.0 2.0 3.0
* longitude (longitude) float32 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
* hdate (hdate) object 1995-07-01 00:00:00 ... 2017-07-01 00:00:00
* lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 31 days 12...
Frozen(SortedKeysDict({'latitude': 121, 'hdate': 23, 'forecast_time': 283, 'realization': 3, 'longitude': 240, 'lead_time': 32})) 72.5842064 GB
Frozen(SortedKeysDict({'latitude': 121, 'forecast_time': 284, 'realization': 3, 'longitude': 240, 'hdate': 23, 'lead_time': 32})) 72.840687688 GB
ECMF on-the-fly forecast_time freq not found
Coordinates:
* latitude (latitude) float32 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
* lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 45 days 12...
* hdate (hdate) object 1995-07-01 00:00:00 ... 2020-07-01 00:00:00
* forecast_time (forecast_time) datetime64[ns] 2015-05-14 ... 2021-06-17
* lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 45 days 12...
* forecast_time (forecast_time) datetime64[ns] 2015-05-14 ... 2021-07-01
* realization (realization) float32 1.0 2.0 3.0 4.0 ... 7.0 8.0 9.0 10.0
* longitude (longitude) float32 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
Frozen(SortedKeysDict({'latitude': 121, 'lead_time': 46, 'hdate': 26, 'forecast_time': 637, 'realization': 10, 'longitude': 240})) 884.967290356 GB
Frozen(SortedKeysDict({'latitude': 121, 'hdate': 26, 'lead_time': 46, 'forecast_time': 641, 'realization': 10, 'longitude': 240})) 890.524384788 GB
HMCR on-the-fly forecast_time freq not found
Coordinates:
* latitude (latitude) float32 90.0 87.5 85.0 82.5 ... -85.0 -87.5 -90.0
* lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 60 days 12...
* hdate (hdate) object 1985-07-01 00:00:00 ... 2010-07-01 00:00:00
* forecast_time (forecast_time) datetime64[ns] 2015-01-07 ... 2021-06-03
* latitude (latitude) float32 90.0 87.5 85.0 82.5 ... -85.0 -87.5 -90.0
* forecast_time (forecast_time) datetime64[ns] 2015-01-07 ... 2021-06-10
* realization (realization) float32 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
* longitude (longitude) float32 0.0 2.507 5.014 ... 353.5 356.0 358.5
Frozen(SortedKeysDict({'latitude': 73, 'lead_time': 61, 'hdate': 26, 'forecast_time': 335, 'realization': 9, 'longitude': 144})) 201.0647102 GB
* hdate (hdate) object 1985-07-01 00:00:00 ... 2010-07-01 00:00:00
Frozen(SortedKeysDict({'lead_time': 61, 'latitude': 73, 'forecast_time': 336, 'realization': 9, 'longitude': 144, 'hdate': 26})) 201.66490336 GB
model=ISAC failed due to OSError: [Errno -90] NetCDF: file not found: b'https://iridl.ldeo.columbia.edu/SOURCES/.ECMWF/.S2S/.ISAC/.reforecast/.perturbed/.2m_above_ground/.2t/dods'
JMA not on-the-fly forecast_time freq:D
Coordinates:
* lead_time (lead_time) timedelta64[ns] 1 days 2 days ... 32 days 33 days
* latitude (latitude) float32 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
* forecast_time (forecast_time) datetime64[ns] 1981-01-10T12:00:00 ... 201...
* realization (realization) float32 1.0 2.0 3.0 4.0
* longitude (longitude) float32 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
* lead_time (lead_time) timedelta64[ns] 1 days 2 days ... 32 days 33 days
Frozen(SortedKeysDict({'latitude': 121, 'forecast_time': 10948, 'realization': 4, 'longitude': 240, 'lead_time': 33})) 167.867087068 GB
Frozen(SortedKeysDict({'lead_time': 33, 'latitude': 121, 'forecast_time': 10948, 'realization': 4, 'longitude': 240})) 167.867087068 GB
KMA on-the-fly forecast_time freq:D
Coordinates:
* hdate (hdate) object 1991-07-01 00:00:00 ... 2010-07-01 00:00:00
* latitude (latitude) float32 90.0 87.5 85.0 82.5 ... -85.0 -87.5 -90.0
* lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 59 days 12...
* hdate (hdate) object 1991-07-01 00:00:00 ... 2010-07-01 00:00:00
* forecast_time (forecast_time) datetime64[ns] 2016-11-01 ... 2021-06-01
* forecast_time (forecast_time) datetime64[ns] 2016-11-01 ... 2021-06-17
* realization (realization) float32 1.0 2.0
* longitude (longitude) float32 0.0 2.507 5.014 ... 353.5 356.0 358.5
Frozen(SortedKeysDict({'latitude': 73, 'lead_time': 60, 'hdate': 20, 'forecast_time': 1674, 'realization': 2, 'longitude': 144})) 168.932059708 GB
Frozen(SortedKeysDict({'hdate': 20, 'latitude': 73, 'lead_time': 60, 'forecast_time': 1690, 'realization': 2, 'longitude': 144})) 170.546703036 GB
NCEP not on-the-fly forecast_time freq:D
Coordinates:
* latitude (latitude) float32 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
* lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 43 days 12...
* forecast_time (forecast_time) datetime64[ns] 1999-01-01 ... 2010-12-31
* realization (realization) float32 1.0 2.0 3.0
* longitude (longitude) float32 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
Frozen(SortedKeysDict({'latitude': 121, 'lead_time': 44, 'forecast_time': 4383, 'realization': 3, 'longitude': 240})) 67.205101832 GB
UKMO on-the-fly forecast_time freq not found
Coordinates:
* lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 59 days 12...
* latitude (latitude) float32 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
* forecast_time (forecast_time) datetime64[ns] 2016-01-01 ... 2019-05-09
* realization (realization) float32 1.0 2.0
* longitude (longitude) float32 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
* hdate (hdate) object 1993-07-01 00:00:00 ... 2015-07-01 00:00:00
Frozen(SortedKeysDict({'lead_time': 60, 'latitude': 121, 'forecast_time': 162, 'realization': 2, 'longitude': 240, 'hdate': 23})) 51.937462612 GB
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
# SubX
%% Cell type:markdown id: tags:
The access to output from the SubX project does not require login information via cookie.
%% Cell type:code id: tags:
``` python
ds = xr.open_dataset('http://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/.CESM/.30LCESM1/.hindcast/.tas/dods',
chunks='auto', decode_times=False)
```
%% Cell type:code id: tags:
``` python
# calendar '360' not recognized, but '360_day'
if ds.S.attrs['calendar'] == '360':
ds.S.attrs['calendar'] = '360_day'
```
%% Cell type:code id: tags:
``` python
ds = xr.decode_cf(ds).rename({'X':'longitude', 'Y':'latitude', 'S':'forecast_time', 'L': 'lead_time', 'M':'realization', 'tas':'t2m'})
ds['t2m']
```
%%%% Output: execute_result
<xarray.DataArray 't2m' (forecast_time: 887, realization: 10, lead_time: 45, latitude: 181, longitude: 360)>
dask.array<open_dataset-1bd5755a82e148fd83330ea4db46cbb8tas, shape=(887, 10, 45, 181, 360), dtype=float32, chunksize=(335, 2, 9, 61, 90), chunktype=numpy.ndarray>
Coordinates:
* lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 44 days 12...
* latitude (latitude) float32 -90.0 -89.0 -88.0 -87.0 ... 88.0 89.0 90.0
* forecast_time (forecast_time) datetime64[ns] 1999-01-06 ... 2015-12-30
* realization (realization) float32 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
* longitude (longitude) float32 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0
Attributes:
pointwidth: 0.0
cell_methods: time: mean
units: Kelvin_scale
standard_name: air_temperature
long_name: 2-meter Air Temperature
level_type: 2 meters above ground
cell_methods: time: mean
units: Kelvin_scale
%% Cell type:code id: tags:
``` python
ds.nbytes/1e9,'GB'
```
%%%% Output: execute_result
(104.03446566, 'GB')
%% Cell type:markdown id: tags:
## Hindcast Availability
%% Cell type:markdown id: tags:
- center: model
- CESM: 30LCESM1 46LCESM1
- ECCC: GEM GEPS6 GEPS5
- EMC: GEFS GEFSv12
- ESRL: FIMr1p1
- GMAO: GEOS_V2p1
- NCEP: CFSv2
- NRL: NESM
- RSMAS: CCSM4
%% Cell type:code id: tags:
``` python
centers = ['CESM', 'CESM', 'ECCC', 'ECCC', 'ECCC', 'EMC', 'EMC', 'ESRL', 'GMAO' , 'NCEP', 'NRL','RSMAS']
models = ['30LCESM1','46LCESM1','GEM','GEPS6','GEPS5','GEFS','GEFSv12','FIMr1p1','GEOS_V2p1','CFSv2','NESM','CCSM4']
for center,model in zip(centers,models):
try:
ds = xr.open_dataset(f'https://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/.{center}/.{model}/.hindcast/.tas/dods',
chunks='auto', decode_times=False).rename({'S':'forecast_time', 'L':'lead_time','M':'realization', 'X':'longitude', 'Y':'latitude'})
# calendar '360' not recognized, but '360_day'
for c in ['hdate','forecast_time']:
if c in ds.coords:
if ds[c].attrs['calendar'] == '360':
ds[c].attrs['calendar'] = '360_day'
ds = xr.decode_cf(ds)
onthefly = True if 'hdate' in ds.coords else False
forecast_time_freq = xr.infer_freq(ds.forecast_time)
print(model, 'on-the-fly' if onthefly else 'not on-the-fly',
'forecast_time freq:'+forecast_time_freq if forecast_time_freq else 'forecast_time freq not found',
'\n',ds.coords,'\n',ds.sizes,ds.nbytes/1e9,'GB','\n')
except Exception as e:
print(f'center={center} model={model} failed due to {type(e).__name__}: {e} \n')
```
%%%% Output: stream
30LCESM1 not on-the-fly forecast_time freq:W-WED
Coordinates:
* lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 44 days 12...
* latitude (latitude) float32 -90.0 -89.0 -88.0 -87.0 ... 88.0 89.0 90.0
* forecast_time (forecast_time) datetime64[ns] 1999-01-06 ... 2015-12-30
* realization (realization) float32 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
* longitude (longitude) float32 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0
Frozen(SortedKeysDict({'lead_time': 45, 'latitude': 181, 'forecast_time': 887, 'realization': 10, 'longitude': 360})) 104.03446566 GB
46LCESM1 not on-the-fly forecast_time freq:W-WED
Coordinates:
* lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 44 days 12...
* latitude (latitude) float32 -90.0 -89.0 -88.0 -87.0 ... 88.0 89.0 90.0
* forecast_time (forecast_time) datetime64[ns] 1999-01-06 ... 2015-12-30
* realization (realization) float32 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
* longitude (longitude) float32 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0
Frozen(SortedKeysDict({'lead_time': 45, 'latitude': 181, 'forecast_time': 887, 'realization': 10, 'longitude': 360})) 104.03446566 GB
GEM not on-the-fly forecast_time freq:D
Coordinates:
* lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 31 days 12...
* latitude (latitude) float32 -90.0 -89.0 -88.0 -87.0 ... 88.0 89.0 90.0
* forecast_time (forecast_time) datetime64[ns] 1995-01-04 ... 2014-12-28
* realization (realization) float32 1.0 2.0 3.0 4.0
* longitude (longitude) float32 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0
Frozen(SortedKeysDict({'lead_time': 32, 'latitude': 181, 'forecast_time': 7299, 'realization': 4, 'longitude': 360})) 243.508714908 GB
GEPS6 not on-the-fly forecast_time freq:D
Coordinates:
* lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 31 days 12...
* latitude (latitude) float32 -90.0 -89.0 -88.0 -87.0 ... 88.0 89.0 90.0
* forecast_time (forecast_time) datetime64[ns] 1998-01-03 ... 2017-12-27
* realization (realization) float32 1.0 2.0 3.0 4.0
* longitude (longitude) float32 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0
Frozen(SortedKeysDict({'lead_time': 32, 'latitude': 181, 'forecast_time': 7299, 'realization': 4, 'longitude': 360})) 243.508714908 GB
GEPS5 not on-the-fly forecast_time freq:D
Coordinates:
* lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 31 days 12...
* latitude (latitude) float32 -90.0 -89.0 -88.0 -87.0 ... 88.0 89.0 90.0
* forecast_time (forecast_time) datetime64[ns] 1998-01-03 ... 2017-12-27
* realization (realization) float32 1.0 2.0 3.0 4.0
* longitude (longitude) float32 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0
Frozen(SortedKeysDict({'lead_time': 32, 'latitude': 181, 'forecast_time': 7299, 'realization': 4, 'longitude': 360})) 243.508714908 GB
GEFS not on-the-fly forecast_time freq:W-WED
Coordinates:
* lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 34 days 12...
* latitude (latitude) float32 90.0 89.0 88.0 87.0 ... -88.0 -89.0 -90.0
* forecast_time (forecast_time) datetime64[ns] 1999-01-06 ... 2016-12-28
* realization (realization) float32 0.0 1.0 2.0 3.0 ... 7.0 8.0 9.0 10.0
* longitude (longitude) float32 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0
Frozen(SortedKeysDict({'lead_time': 35, 'latitude': 181, 'forecast_time': 939, 'realization': 11, 'longitude': 360})) 94.2252796 GB
center=EMC model=GEFSv12 failed due to OSError: [Errno -90] NetCDF: file not found: b'https://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/.EMC/.GEFSv12/.hindcast/.tas/dods'
FIMr1p1 not on-the-fly forecast_time freq:W-WED
Coordinates:
* lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 31 days 12...
* latitude (latitude) float32 -90.0 -89.0 -88.0 -87.0 ... 88.0 89.0 90.0
* forecast_time (forecast_time) datetime64[ns] 1999-01-06 ... 2017-06-28
* realization (realization) float32 1.0 2.0 3.0 4.0
* longitude (longitude) float32 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0
Frozen(SortedKeysDict({'lead_time': 32, 'latitude': 181, 'forecast_time': 965, 'realization': 4, 'longitude': 360})) 32.194262956 GB
GEOS_V2p1 not on-the-fly forecast_time freq:D
Coordinates:
* lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 44 days 12...
* latitude (latitude) float32 -90.0 -89.0 -88.0 -87.0 ... 88.0 89.0 90.0
* forecast_time (forecast_time) datetime64[ns] 1999-01-01 ... 2016-12-27
* realization (realization) float32 1.0 2.0 3.0 4.0
* longitude (longitude) float32 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0
Frozen(SortedKeysDict({'lead_time': 45, 'latitude': 181, 'forecast_time': 6571, 'realization': 4, 'longitude': 360})) 308.279834308 GB
CFSv2 not on-the-fly forecast_time freq:6H
Coordinates:
* lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 43 days 12...
* latitude (latitude) float32 90.0 89.0 88.0 87.0 ... -88.0 -89.0 -90.0
* forecast_time (forecast_time) datetime64[ns] 1999-01-01 ... 2017-09-30
* realization (realization) int32 1
* longitude (longitude) float32 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0
Frozen(SortedKeysDict({'lead_time': 44, 'latitude': 181, 'forecast_time': 27389, 'realization': 1, 'longitude': 360})) 314.101655872 GB
NESM not on-the-fly forecast_time freq:D
Coordinates:
* lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 44 days 12...
* latitude (latitude) float32 -90.0 -89.0 -88.0 -87.0 ... 88.0 89.0 90.0
* forecast_time (forecast_time) datetime64[ns] 1999-01-02T12:00:00 ... 201...
* realization (realization) int32 1
* longitude (longitude) float32 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0
Frozen(SortedKeysDict({'lead_time': 45, 'latitude': 181, 'forecast_time': 6574, 'realization': 1, 'longitude': 360})) 77.10518632 GB
CCSM4 not on-the-fly forecast_time freq:D
Coordinates:
* lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 44 days 12...
* latitude (latitude) float32 -90.0 -89.0 -88.0 -87.0 ... 88.0 89.0 90.0
* forecast_time (forecast_time) datetime64[ns] 1999-01-07 ... 2016-12-31
* realization (realization) float32 1.0 2.0 3.0
* longitude (longitude) float32 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0
Frozen(SortedKeysDict({'lead_time': 45, 'latitude': 181, 'forecast_time': 6569, 'realization': 3, 'longitude': 360})) 231.139516688 GB
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
# Opendap magic
Opendap URLs be appended for server-side preprocessing.
- https://www.opendap.org/support
- http://iridl.ldeo.columbia.edu/dochelp/topics/DODS/fnlist.html
- https://iridl.ldeo.columbia.edu/dochelp/Documentation/funcindex.html?Set-Language=en
%% Cell type:markdown id: tags:
## `curl` or `wget`
You can always work file-based and download from IRIDL via `curl` or `wget`. However, nicer is direct access via `opendap` and `xarray`.
%% Cell type:code id: tags:
``` python
from subprocess import call
fname = 'GEFS_pra_hc.nc'
# endless magic commands selecting week 3-4 and aggregating pr to tp with unit conversion
dset_url = 'http://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/.EMC/.GEFS/.hindcast/.pr/S/(0000%206%20Jan%201999)/(0000%2028%20Dec%202015)/RANGEEDGES/S/(days%20since%201999-01-01)/streamgridunitconvert/Y/1/20/RANGE/X/-20/10/RANGE/L/(14)/(28)/RANGEEDGES/%5BL%5Daverage/S/(Jun-Aug)/VALUES/SOURCES/.Models/.SubX/.EMC/.GEFS/.hindcast/.dc9915/.pr/Y/1/20/RANGE/X/-20/10/RANGE/L/(14)/(28)/RANGEEDGES/%5BL%5Daverage/S/to366daysample/%5BYR%5Daverage/S/sampleDOY/sub/c%3A/0.001/(m3%20kg-1)/%3Ac/mul/c%3A/1000/(mm%20m-1)/%3Ac/mul/c%3A/86400/(s%20day-1)/%3Ac/mul/c%3A/7.0//units//days/def/%3Ac/mul/data.nc'
# download data with curl
call(['curl','-k',dset_url, '-o',fname])
```
%%%% Output: execute_result
0
%% Cell type:code id: tags:
``` python
import pandas as pd
ds = xr.open_dataset(fname).rename({'X':'longitude', 'Y':'latitude', 'S':'forecast_time', 'M':'realization', 'aprod':'tp'}).assign_coords(lead_time=pd.Timedelta('14 d'))
ds
```
%%%% Output: execute_result
<xarray.Dataset>
Dimensions: (forecast_time: 226, latitude: 20, longitude: 31, realization: 11)
Coordinates:
* latitude (latitude) float32 1.0 2.0 3.0 4.0 ... 17.0 18.0 19.0 20.0
* forecast_time (forecast_time) datetime64[ns] 1999-06-02 ... 2015-08-26
* realization (realization) float32 0.0 1.0 2.0 3.0 ... 7.0 8.0 9.0 10.0
* longitude (longitude) float32 -20.0 -19.0 -18.0 -17.0 ... 8.0 9.0 10.0
lead_time timedelta64[ns] 14 days
Data variables:
tp (realization, forecast_time, latitude, longitude) float64 ...
%% Cell type:markdown id: tags:
## opendap
%% Cell type:code id: tags:
``` python