Newer
Older
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"IRI Data Library (IRIDL) hosts various subseasonal initialized forecast, hindcast simulations and observations:\n",
"- `S2S project`:\n",
" - http://iridl.ldeo.columbia.edu/SOURCES/.ECMWF/.S2S/\n",
" - hindcast/reforecast: one variable, one model:\n",
" - login required\n",
"- `SubX project`:\n",
" - http://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/\n",
" - hindcast/reforecast: one variable, one model:\n",
" - login not required\n",
"- `NOAA CPC` observations:\n",
" - ground truth for `s2s-ai-challenge`\n",
" - login not required\n",
" - `pr`: http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/.CPC/.UNIFIED_PRCP/.GAUGE_BASED/.GLOBAL/.v1p0/.extREALTIME/.rain/\n",
" - `t2m`: http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/.CPC/.temperature/.daily/\n",
"---\n",
"- Notes:\n",
" - Output on IRIDL is not always on the 1.5 degree grid requested for the competition. Also dimension names and coordinates differ.\n",
" - Beware that most models are not only initialized on thursdays. It is not forbidden to use simulations which are started on other weekdays, buy please pay attention that you may only use information available on `forecast_time`, i.e. if the model is initialized on Mondays, you have to use the day 14+3=17 to day 27+3=30 forecast for week 3-4.\n",
" - Beware that conventions (variable name, standard_name, coordinates, output format) from `SubX` and `S2S` might differ.\n",
" - First check whether you can find 2020 `test` and previous `training` data for a given feature.\n",
"This notebook also provides opendap magic, i.e. commands added to the opendap URL which preprocess data server-side: select and aggregate `lead_time` `L` and `forecast_time` `S`. Regridding `X` and `Y`.\n",
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Todo:\n",
"- [ ] how to handle `hdate` best?\n",
"- [ ] how to not download empty `S`? specify weekly stride in `S` via ingrid"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# `IRIDL` cookie"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here are instructions for configuring xarray to open protected Data Library datasets, after you have created a Data Library account and accepted the terms and conditions for the dataset.\n",
"1. Visit https://iridl.ldeo.columbia.edu/auth/genkey . Log in to the Data Library. Copy the key from the response.\n",
"\n",
"2. Create a file with the following content, substituting the key from step 1 for `\"xxxx\"`:\n",
"`Set-Cookie: __dlauth_id=xxxx; domain=.iridl.ldeo.columbia.edu`\n",
"\n",
"3. Put the following in `~/.daprc`, which is `/home/jovyan/.daprc` on renku, substituting the path to the above file for `/path/to/cookie/file`:\n",
"`HTTP.COOKIEJAR=/path/to/cookie/file`. You may need to copy `.daprc` to `/home/jovyan` on renku, because `/home/jovyan` is not tracked by `git`."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting /work/s2s-ai-challenge-template/.daprc\n"
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
]
}
],
"source": [
"%%writefile /work/s2s-ai-challenge-template/.daprc\n",
"HTTP.COOKIEJAR=/work/s2s-ai-challenge-template/.cookie_iridl"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"!cp /work/s2s-ai-challenge-template/.daprc /home/jovyan"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"#%writefile /work/s2s-ai-challenge-template/.cookie_iridl\n",
"#Set-Cookie: __dlauth_id=xxxx; domain=.iridl.ldeo.columbia.edu"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting /work/s2s-ai-challenge-template/.cookie_iridl\n"
]
}
],
"source": [
"%%writefile /work/s2s-ai-challenge-template/.cookie_iridl\n",
"Set-Cookie: __dlauth_id=6d3f0d342e1bdd448b287481f6d7989673305eeba2fa65fabb2709e2d76101b21ae816ffe0560b1a25ed3c8d0bf8884eab7d4bc2; domain=.iridl.ldeo.columbia.edu"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/opt/conda/lib/python3.8/site-packages/xarray/backends/cfgrib_.py:27: UserWarning: Failed to load cfgrib - most likely there is a problem accessing the ecCodes library. Try `import cfgrib` to get the full error message\n",
" warnings.warn(\n"
]
}
],
"source": [
"import xarray as xr\n",
"xr.set_options(display_style='text')\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# S2S"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Please beawre that most models are not only initialized on thursdays.\n",
"It is not forbidden to use simulations which are started on other weekdays,\n",
"buy please pay attention that you may only use information available on `forecast_time`,\n",
"i.e. if the model is initialized on Mondays, you have to use the day 14+3=17 to day 27+3=30 forecast for week 3-4."
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/opt/conda/lib/python3.8/site-packages/xarray/backends/plugins.py:61: RuntimeWarning: Engine 'cfgrib' loading failed:\n",
"/opt/conda/lib/python3.8/site-packages/gribapi/_bindings.cpython-38-x86_64-linux-gnu.so: undefined symbol: codes_bufr_key_is_header\n",
" warnings.warn(f\"Engine {name!r} loading failed:\\n{ex}\", RuntimeWarning)\n"
]
}
],
"source": [
"ds = xr.open_dataset('https://iridl.ldeo.columbia.edu/SOURCES/.ECMWF/.S2S/.ECMF/.reforecast/.control/.2m_above_ground/.2t/dods',\n",
" chunks='auto', decode_times=False)"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"# calendar '360' not recognized, but '360_day'\n",
"if ds.hdate.attrs['calendar'] == '360':\n",
" ds.hdate.attrs['calendar'] = '360_day'"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<pre><xarray.DataArray 't2m' (hdate: 26, forecast_time: 641, lead_time: 46, latitude: 121, longitude: 240)>\n",
"dask.array<open_dataset-f89df07098f6ce22c120a08e3f3f29a52t, shape=(26, 641, 46, 121, 240), dtype=float32, chunksize=(7, 197, 12, 33, 60), chunktype=numpy.ndarray>\n",
"Coordinates:\n",
" * latitude (latitude) float32 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0\n",
" * hdate (hdate) object 1995-07-01 00:00:00 ... 2020-07-01 00:00:00\n",
" * lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 45 days 12...\n",
" * forecast_time (forecast_time) datetime64[ns] 2015-05-14 ... 2021-07-01\n",
" * longitude (longitude) float32 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5\n",
"Attributes:\n",
" pointwidth: 0\n",
" gribPDSpattern: 04XXXX003D0000\n",
" long_name: 2-meter Temperature\n",
"<xarray.DataArray 't2m' (hdate: 26, forecast_time: 641, lead_time: 46, latitude: 121, longitude: 240)>\n",
"dask.array<open_dataset-f89df07098f6ce22c120a08e3f3f29a52t, shape=(26, 641, 46, 121, 240), dtype=float32, chunksize=(7, 197, 12, 33, 60), chunktype=numpy.ndarray>\n",
"Coordinates:\n",
" * latitude (latitude) float32 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0\n",
" * hdate (hdate) object 1995-07-01 00:00:00 ... 2020-07-01 00:00:00\n",
" * lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 45 days 12...\n",
" * forecast_time (forecast_time) datetime64[ns] 2015-05-14 ... 2021-07-01\n",
" * longitude (longitude) float32 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5\n",
"Attributes:\n",
" pointwidth: 0\n",
" gribPDSpattern: 04XXXX003D0000\n",
" long_name: 2-meter Temperature\n",
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ds = xr.decode_cf(ds).rename({'X':'longitude', 'Y':'latitude', 'S':'forecast_time', 'LA': 'lead_time', '2t':'t2m'})\n",
"ds['t2m']"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ds.nbytes/1e9,'GB'"
]
},
{
"cell_type": "code",
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
"metadata": {},
"outputs": [],
"source": [
"# hdate gets the privous years reforecast for that dayofyear"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Hindcast Availability\n",
"\n",
"- BOM:\tBoM POAMA Ensemble.\n",
"- CMA:\tBeijing Climate Center (BCC) Climate Prediction System for S2S.\n",
"- CNRM:\tCNRM Ensemble Prediction System.\n",
"- ECCC:\tECCC Ensemble Prediction System.\n",
"- ECMF:\tECMWF Ensemble.\n",
"- HMCR:\tHMCR Ensemble.\n",
"- ISAC:\tISAC-CNR Ensemble.\n",
"- JMA:\tJMA Ensemble System.\n",
"- KMA:\tKMA Seasonal Prediction System.\n",
"- NCEP:\tNCEP CFSv2 Ensemble.\n",
"- UKMO:\tUKMO Ensemble Prediction System."
]
},
{
"cell_type": "code",
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"BOM not on-the-fly forecast_time freq not found \n",
" Coordinates:\n",
" * latitude (latitude) float32 88.1 85.64 83.16 ... -83.16 -85.64 -88.1\n",
" * lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 61 days 12...\n",
" * forecast_time (forecast_time) datetime64[ns] 1981-01-01 ... 2013-12-26\n",
" * realization (realization) float32 1.0 2.0 3.0 4.0 ... 29.0 30.0 31.0 32.0\n",
" * longitude (longitude) float32 0.0 2.507 5.014 ... 353.5 356.0 358.5 \n",
" Frozen(SortedKeysDict({'latitude': 72, 'lead_time': 62, 'forecast_time': 2376, 'realization': 32, 'longitude': 144})) 195.498364944 GB \n",
"\n",
"CNRM not on-the-fly forecast_time freq not found \n",
" Coordinates:\n",
" * latitude (latitude) float32 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0\n",
" * lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 60 days 12...\n",
" * forecast_time (forecast_time) datetime64[ns] 1993-01-01 ... 2014-12-15\n",
" * realization (realization) float32 1.0 2.0 3.0 4.0 ... 11.0 12.0 13.0 14.0\n",
" * longitude (longitude) float32 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5 \n",
" Frozen(SortedKeysDict({'latitude': 121, 'lead_time': 61, 'forecast_time': 528, 'realization': 14, 'longitude': 240})) 52.377944132 GB \n",
"\n",
"ECCC on-the-fly forecast_time freq:W-THU \n",
" Coordinates:\n",
" * latitude (latitude) float32 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0\n",
" * forecast_time (forecast_time) datetime64[ns] 2016-01-07 ... 2021-06-10\n",
" * realization (realization) float32 1.0 2.0 3.0\n",
" * longitude (longitude) float32 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5\n",
" * hdate (hdate) object 1995-07-01 00:00:00 ... 2017-07-01 00:00:00\n",
" * lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 31 days 12... \n",
" Frozen(SortedKeysDict({'latitude': 121, 'forecast_time': 284, 'realization': 3, 'longitude': 240, 'hdate': 23, 'lead_time': 32})) 72.840687688 GB \n",
"\n",
"ECMF on-the-fly forecast_time freq not found \n",
" Coordinates:\n",
" * latitude (latitude) float32 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0\n",
" * hdate (hdate) object 1995-07-01 00:00:00 ... 2020-07-01 00:00:00\n",
" * lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 45 days 12...\n",
" * forecast_time (forecast_time) datetime64[ns] 2015-05-14 ... 2021-07-01\n",
" * realization (realization) float32 1.0 2.0 3.0 4.0 ... 7.0 8.0 9.0 10.0\n",
" * longitude (longitude) float32 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5 \n",
" Frozen(SortedKeysDict({'latitude': 121, 'hdate': 26, 'lead_time': 46, 'forecast_time': 641, 'realization': 10, 'longitude': 240})) 890.524384788 GB \n",
"\n",
"HMCR on-the-fly forecast_time freq not found \n",
" Coordinates:\n",
" * lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 60 days 12...\n",
" * latitude (latitude) float32 90.0 87.5 85.0 82.5 ... -85.0 -87.5 -90.0\n",
" * forecast_time (forecast_time) datetime64[ns] 2015-01-07 ... 2021-06-10\n",
" * realization (realization) float32 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0\n",
" * longitude (longitude) float32 0.0 2.507 5.014 ... 353.5 356.0 358.5\n",
" * hdate (hdate) object 1985-07-01 00:00:00 ... 2010-07-01 00:00:00 \n",
" Frozen(SortedKeysDict({'lead_time': 61, 'latitude': 73, 'forecast_time': 336, 'realization': 9, 'longitude': 144, 'hdate': 26})) 201.66490336 GB \n",
"\n",
"model=ISAC failed due to OSError: [Errno -90] NetCDF: file not found: b'https://iridl.ldeo.columbia.edu/SOURCES/.ECMWF/.S2S/.ISAC/.reforecast/.perturbed/.2m_above_ground/.2t/dods' \n",
"\n",
"JMA not on-the-fly forecast_time freq:D \n",
" Coordinates:\n",
" * lead_time (lead_time) timedelta64[ns] 1 days 2 days ... 32 days 33 days\n",
" * latitude (latitude) float32 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0\n",
" * forecast_time (forecast_time) datetime64[ns] 1981-01-10T12:00:00 ... 201...\n",
" * realization (realization) float32 1.0 2.0 3.0 4.0\n",
" * longitude (longitude) float32 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5 \n",
" Frozen(SortedKeysDict({'lead_time': 33, 'latitude': 121, 'forecast_time': 10948, 'realization': 4, 'longitude': 240})) 167.867087068 GB \n",
"\n",
"KMA on-the-fly forecast_time freq:D \n",
" Coordinates:\n",
" * hdate (hdate) object 1991-07-01 00:00:00 ... 2010-07-01 00:00:00\n",
" * latitude (latitude) float32 90.0 87.5 85.0 82.5 ... -85.0 -87.5 -90.0\n",
" * lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 59 days 12...\n",
" * forecast_time (forecast_time) datetime64[ns] 2016-11-01 ... 2021-06-17\n",
" * realization (realization) float32 1.0 2.0\n",
" * longitude (longitude) float32 0.0 2.507 5.014 ... 353.5 356.0 358.5 \n",
" Frozen(SortedKeysDict({'hdate': 20, 'latitude': 73, 'lead_time': 60, 'forecast_time': 1690, 'realization': 2, 'longitude': 144})) 170.546703036 GB \n",
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
"\n",
"NCEP not on-the-fly forecast_time freq:D \n",
" Coordinates:\n",
" * latitude (latitude) float32 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0\n",
" * lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 43 days 12...\n",
" * forecast_time (forecast_time) datetime64[ns] 1999-01-01 ... 2010-12-31\n",
" * realization (realization) float32 1.0 2.0 3.0\n",
" * longitude (longitude) float32 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5 \n",
" Frozen(SortedKeysDict({'latitude': 121, 'lead_time': 44, 'forecast_time': 4383, 'realization': 3, 'longitude': 240})) 67.205101832 GB \n",
"\n",
"UKMO on-the-fly forecast_time freq not found \n",
" Coordinates:\n",
" * lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 59 days 12...\n",
" * latitude (latitude) float32 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0\n",
" * forecast_time (forecast_time) datetime64[ns] 2016-01-01 ... 2019-05-09\n",
" * realization (realization) float32 1.0 2.0\n",
" * longitude (longitude) float32 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5\n",
" * hdate (hdate) object 1993-07-01 00:00:00 ... 2015-07-01 00:00:00 \n",
" Frozen(SortedKeysDict({'lead_time': 60, 'latitude': 121, 'forecast_time': 162, 'realization': 2, 'longitude': 240, 'hdate': 23})) 51.937462612 GB \n",
"\n"
]
}
],
"source": [
"models = ['BOM','CNRM','ECCC','ECMF','HMCR','ISAC','JMA','KMA','NCEP','UKMO']\n",
"for model in models:\n",
" try:\n",
" ds = xr.open_dataset(f'https://iridl.ldeo.columbia.edu/SOURCES/.ECMWF/.S2S/.{model}/.reforecast/.perturbed/.2m_above_ground/.2t/dods',\n",
" chunks='auto', decode_times=False).rename({'S':'forecast_time', 'LA':'lead_time','M':'realization', 'X':'longitude', 'Y':'latitude'})\n",
" # calendar '360' not recognized, but '360_day'\n",
" for c in ['hdate','forecast_time']:\n",
" if c in ds.coords:\n",
" if ds[c].attrs['calendar'] == '360':\n",
" ds[c].attrs['calendar'] = '360_day'\n",
" ds = xr.decode_cf(ds)\n",
" onthefly = True if 'hdate' in ds.coords else False\n",
" forecast_time_freq = xr.infer_freq(ds.forecast_time)\n",
" print(model, 'on-the-fly' if onthefly else 'not on-the-fly',\n",
" 'forecast_time freq:'+forecast_time_freq if forecast_time_freq else 'forecast_time freq not found',\n",
" '\\n',ds.coords,'\\n',ds.sizes,ds.nbytes/1e9,'GB','\\n')\n",
" except Exception as e:\n",
" print(f'model={model} failed due to {type(e).__name__}: {e} \\n')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# SubX"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The access to output from the SubX project does not require login information via cookie."
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"ds = xr.open_dataset('http://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/.CESM/.30LCESM1/.hindcast/.tas/dods',\n",
" chunks='auto', decode_times=False)"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"# calendar '360' not recognized, but '360_day'\n",
"if ds.S.attrs['calendar'] == '360':\n",
" ds.S.attrs['calendar'] = '360_day'"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<pre><xarray.DataArray 't2m' (forecast_time: 887, realization: 10, lead_time: 45, latitude: 181, longitude: 360)>\n",
"dask.array<open_dataset-1bd5755a82e148fd83330ea4db46cbb8tas, shape=(887, 10, 45, 181, 360), dtype=float32, chunksize=(335, 2, 9, 61, 90), chunktype=numpy.ndarray>\n",
"Coordinates:\n",
" * lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 44 days 12...\n",
" * latitude (latitude) float32 -90.0 -89.0 -88.0 -87.0 ... 88.0 89.0 90.0\n",
" * forecast_time (forecast_time) datetime64[ns] 1999-01-06 ... 2015-12-30\n",
" * realization (realization) float32 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0\n",
" * longitude (longitude) float32 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0\n",
"Attributes:\n",
" pointwidth: 0.0\n",
" cell_methods: time: mean\n",
" units: Kelvin_scale\n",
" standard_name: air_temperature\n",
" long_name: 2-meter Air Temperature\n",
],
"text/plain": [
"<xarray.DataArray 't2m' (forecast_time: 887, realization: 10, lead_time: 45, latitude: 181, longitude: 360)>\n",
"dask.array<open_dataset-1bd5755a82e148fd83330ea4db46cbb8tas, shape=(887, 10, 45, 181, 360), dtype=float32, chunksize=(335, 2, 9, 61, 90), chunktype=numpy.ndarray>\n",
"Coordinates:\n",
" * lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 44 days 12...\n",
" * latitude (latitude) float32 -90.0 -89.0 -88.0 -87.0 ... 88.0 89.0 90.0\n",
" * forecast_time (forecast_time) datetime64[ns] 1999-01-06 ... 2015-12-30\n",
" * realization (realization) float32 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0\n",
" * longitude (longitude) float32 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0\n",
"Attributes:\n",
" pointwidth: 0.0\n",
" cell_methods: time: mean\n",
" units: Kelvin_scale\n",
" standard_name: air_temperature\n",
" long_name: 2-meter Air Temperature\n",
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ds = xr.decode_cf(ds).rename({'X':'longitude', 'Y':'latitude', 'S':'forecast_time', 'L': 'lead_time', 'M':'realization', 'tas':'t2m'})\n",
"ds['t2m']"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(104.03446566, 'GB')"
]
},
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ds.nbytes/1e9,'GB'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Hindcast Availability"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- center: model\n",
"- CESM:\t30LCESM1 46LCESM1\n",
"- ECCC:\tGEM GEPS6 GEPS5\n",
"- EMC:\tGEFS GEFSv12\n",
"- ESRL:\tFIMr1p1\n",
"- GMAO:\tGEOS_V2p1\n",
"- NCEP:\tCFSv2\n",
"- NRL:\tNESM\n",
"- RSMAS:\tCCSM4"
]
},
{
"cell_type": "code",
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"30LCESM1 not on-the-fly forecast_time freq:W-WED \n",
" Coordinates:\n",
" * lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 44 days 12...\n",
" * latitude (latitude) float32 -90.0 -89.0 -88.0 -87.0 ... 88.0 89.0 90.0\n",
" * forecast_time (forecast_time) datetime64[ns] 1999-01-06 ... 2015-12-30\n",
" * realization (realization) float32 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0\n",
" * longitude (longitude) float32 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0 \n",
" Frozen(SortedKeysDict({'lead_time': 45, 'latitude': 181, 'forecast_time': 887, 'realization': 10, 'longitude': 360})) 104.03446566 GB \n",
"\n",
"46LCESM1 not on-the-fly forecast_time freq:W-WED \n",
" Coordinates:\n",
" * lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 44 days 12...\n",
" * latitude (latitude) float32 -90.0 -89.0 -88.0 -87.0 ... 88.0 89.0 90.0\n",
" * forecast_time (forecast_time) datetime64[ns] 1999-01-06 ... 2015-12-30\n",
" * realization (realization) float32 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0\n",
" * longitude (longitude) float32 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0 \n",
" Frozen(SortedKeysDict({'lead_time': 45, 'latitude': 181, 'forecast_time': 887, 'realization': 10, 'longitude': 360})) 104.03446566 GB \n",
"\n",
"GEM not on-the-fly forecast_time freq:D \n",
" Coordinates:\n",
" * lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 31 days 12...\n",
" * latitude (latitude) float32 -90.0 -89.0 -88.0 -87.0 ... 88.0 89.0 90.0\n",
" * forecast_time (forecast_time) datetime64[ns] 1995-01-04 ... 2014-12-28\n",
" * realization (realization) float32 1.0 2.0 3.0 4.0\n",
" * longitude (longitude) float32 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0 \n",
" Frozen(SortedKeysDict({'lead_time': 32, 'latitude': 181, 'forecast_time': 7299, 'realization': 4, 'longitude': 360})) 243.508714908 GB \n",
"\n",
"GEPS6 not on-the-fly forecast_time freq:D \n",
" Coordinates:\n",
" * lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 31 days 12...\n",
" * latitude (latitude) float32 -90.0 -89.0 -88.0 -87.0 ... 88.0 89.0 90.0\n",
" * forecast_time (forecast_time) datetime64[ns] 1998-01-03 ... 2017-12-27\n",
" * realization (realization) float32 1.0 2.0 3.0 4.0\n",
" * longitude (longitude) float32 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0 \n",
" Frozen(SortedKeysDict({'lead_time': 32, 'latitude': 181, 'forecast_time': 7299, 'realization': 4, 'longitude': 360})) 243.508714908 GB \n",
"\n",
"GEPS5 not on-the-fly forecast_time freq:D \n",
" Coordinates:\n",
" * lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 31 days 12...\n",
" * latitude (latitude) float32 -90.0 -89.0 -88.0 -87.0 ... 88.0 89.0 90.0\n",
" * forecast_time (forecast_time) datetime64[ns] 1998-01-03 ... 2017-12-27\n",
" * realization (realization) float32 1.0 2.0 3.0 4.0\n",
" * longitude (longitude) float32 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0 \n",
" Frozen(SortedKeysDict({'lead_time': 32, 'latitude': 181, 'forecast_time': 7299, 'realization': 4, 'longitude': 360})) 243.508714908 GB \n",
"\n",
"GEFS not on-the-fly forecast_time freq:W-WED \n",
" Coordinates:\n",
" * lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 34 days 12...\n",
" * latitude (latitude) float32 90.0 89.0 88.0 87.0 ... -88.0 -89.0 -90.0\n",
" * forecast_time (forecast_time) datetime64[ns] 1999-01-06 ... 2016-12-28\n",
" * realization (realization) float32 0.0 1.0 2.0 3.0 ... 7.0 8.0 9.0 10.0\n",
" * longitude (longitude) float32 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0 \n",
" Frozen(SortedKeysDict({'lead_time': 35, 'latitude': 181, 'forecast_time': 939, 'realization': 11, 'longitude': 360})) 94.2252796 GB \n",
"\n",
"center=EMC model=GEFSv12 failed due to OSError: [Errno -90] NetCDF: file not found: b'https://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/.EMC/.GEFSv12/.hindcast/.tas/dods' \n",
"\n",
"FIMr1p1 not on-the-fly forecast_time freq:W-WED \n",
" Coordinates:\n",
" * lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 31 days 12...\n",
" * latitude (latitude) float32 -90.0 -89.0 -88.0 -87.0 ... 88.0 89.0 90.0\n",
" * forecast_time (forecast_time) datetime64[ns] 1999-01-06 ... 2017-06-28\n",
" * realization (realization) float32 1.0 2.0 3.0 4.0\n",
" * longitude (longitude) float32 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0 \n",
" Frozen(SortedKeysDict({'lead_time': 32, 'latitude': 181, 'forecast_time': 965, 'realization': 4, 'longitude': 360})) 32.194262956 GB \n",
"\n",
"GEOS_V2p1 not on-the-fly forecast_time freq:D \n",
" Coordinates:\n",
" * lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 44 days 12...\n",
" * latitude (latitude) float32 -90.0 -89.0 -88.0 -87.0 ... 88.0 89.0 90.0\n",
" * forecast_time (forecast_time) datetime64[ns] 1999-01-01 ... 2016-12-27\n",
" * realization (realization) float32 1.0 2.0 3.0 4.0\n",
" * longitude (longitude) float32 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0 \n",
" Frozen(SortedKeysDict({'lead_time': 45, 'latitude': 181, 'forecast_time': 6571, 'realization': 4, 'longitude': 360})) 308.279834308 GB \n",
"\n",
"CFSv2 not on-the-fly forecast_time freq:6H \n",
" Coordinates:\n",
" * lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 43 days 12...\n",
" * latitude (latitude) float32 90.0 89.0 88.0 87.0 ... -88.0 -89.0 -90.0\n",
" * forecast_time (forecast_time) datetime64[ns] 1999-01-01 ... 2017-09-30\n",
" * realization (realization) int32 1\n",
" * longitude (longitude) float32 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0 \n",
" Frozen(SortedKeysDict({'lead_time': 44, 'latitude': 181, 'forecast_time': 27389, 'realization': 1, 'longitude': 360})) 314.101655872 GB \n",
"\n",
"NESM not on-the-fly forecast_time freq:D \n",
" Coordinates:\n",
" * lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 44 days 12...\n",
" * latitude (latitude) float32 -90.0 -89.0 -88.0 -87.0 ... 88.0 89.0 90.0\n",
" * forecast_time (forecast_time) datetime64[ns] 1999-01-02T12:00:00 ... 201...\n",
" * realization (realization) int32 1\n",
" * longitude (longitude) float32 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0 \n",
" Frozen(SortedKeysDict({'lead_time': 45, 'latitude': 181, 'forecast_time': 6574, 'realization': 1, 'longitude': 360})) 77.10518632 GB \n",
"\n",
"CCSM4 not on-the-fly forecast_time freq:D \n",
" Coordinates:\n",
" * lead_time (lead_time) timedelta64[ns] 0 days 12:00:00 ... 44 days 12...\n",
" * latitude (latitude) float32 -90.0 -89.0 -88.0 -87.0 ... 88.0 89.0 90.0\n",
" * forecast_time (forecast_time) datetime64[ns] 1999-01-07 ... 2016-12-31\n",
" * realization (realization) float32 1.0 2.0 3.0\n",
" * longitude (longitude) float32 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0 \n",
" Frozen(SortedKeysDict({'lead_time': 45, 'latitude': 181, 'forecast_time': 6569, 'realization': 3, 'longitude': 360})) 231.139516688 GB \n",
"\n"
]
}
],
"source": [
"centers = ['CESM', 'CESM', 'ECCC', 'ECCC', 'ECCC', 'EMC', 'EMC', 'ESRL', 'GMAO' , 'NCEP', 'NRL','RSMAS']\n",
"models = ['30LCESM1','46LCESM1','GEM','GEPS6','GEPS5','GEFS','GEFSv12','FIMr1p1','GEOS_V2p1','CFSv2','NESM','CCSM4']\n",
"for center,model in zip(centers,models):\n",
" try:\n",
" ds = xr.open_dataset(f'https://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/.{center}/.{model}/.hindcast/.tas/dods',\n",
" chunks='auto', decode_times=False).rename({'S':'forecast_time', 'L':'lead_time','M':'realization', 'X':'longitude', 'Y':'latitude'})\n",
" # calendar '360' not recognized, but '360_day'\n",
" for c in ['hdate','forecast_time']:\n",
" if c in ds.coords:\n",
" if ds[c].attrs['calendar'] == '360':\n",
" ds[c].attrs['calendar'] = '360_day'\n",
" ds = xr.decode_cf(ds)\n",
" onthefly = True if 'hdate' in ds.coords else False\n",
" forecast_time_freq = xr.infer_freq(ds.forecast_time)\n",
" print(model, 'on-the-fly' if onthefly else 'not on-the-fly',\n",
" 'forecast_time freq:'+forecast_time_freq if forecast_time_freq else 'forecast_time freq not found',\n",
" '\\n',ds.coords,'\\n',ds.sizes,ds.nbytes/1e9,'GB','\\n')\n",
" except Exception as e:\n",
" print(f'center={center} model={model} failed due to {type(e).__name__}: {e} \\n')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Opendap magic\n",
"\n",
"Opendap URLs be appended for server-side preprocessing.\n",
"\n",
"- https://www.opendap.org/support\n",
"- http://iridl.ldeo.columbia.edu/dochelp/topics/DODS/fnlist.html\n",
"- https://iridl.ldeo.columbia.edu/dochelp/Documentation/funcindex.html?Set-Language=en"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## `curl` or `wget`\n",
"\n",
"You can always work file-based and download from IRIDL via `curl` or `wget`. However, nicer is direct access via `opendap` and `xarray`."
]
},
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0"
]
},
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from subprocess import call\n",
"fname = 'GEFS_pra_hc.nc'\n",
"# endless magic commands selecting week 3-4 and aggregating pr to tp with unit conversion\n",
"dset_url = 'http://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/.EMC/.GEFS/.hindcast/.pr/S/(0000%206%20Jan%201999)/(0000%2028%20Dec%202015)/RANGEEDGES/S/(days%20since%201999-01-01)/streamgridunitconvert/Y/1/20/RANGE/X/-20/10/RANGE/L/(14)/(28)/RANGEEDGES/%5BL%5Daverage/S/(Jun-Aug)/VALUES/SOURCES/.Models/.SubX/.EMC/.GEFS/.hindcast/.dc9915/.pr/Y/1/20/RANGE/X/-20/10/RANGE/L/(14)/(28)/RANGEEDGES/%5BL%5Daverage/S/to366daysample/%5BYR%5Daverage/S/sampleDOY/sub/c%3A/0.001/(m3%20kg-1)/%3Ac/mul/c%3A/1000/(mm%20m-1)/%3Ac/mul/c%3A/86400/(s%20day-1)/%3Ac/mul/c%3A/7.0//units//days/def/%3Ac/mul/data.nc'\n",
"# download data with curl\n",
"call(['curl','-k',dset_url, '-o',fname])"
]
},
{
"cell_type": "code",
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<pre><xarray.Dataset>\n",
"Dimensions: (forecast_time: 226, latitude: 20, longitude: 31, realization: 11)\n",
"Coordinates:\n",
" * latitude (latitude) float32 1.0 2.0 3.0 4.0 ... 17.0 18.0 19.0 20.0\n",
" * forecast_time (forecast_time) datetime64[ns] 1999-06-02 ... 2015-08-26\n",
" * realization (realization) float32 0.0 1.0 2.0 3.0 ... 7.0 8.0 9.0 10.0\n",
" * longitude (longitude) float32 -20.0 -19.0 -18.0 -17.0 ... 8.0 9.0 10.0\n",
" lead_time timedelta64[ns] 14 days\n",
"Data variables:\n",
" tp (realization, forecast_time, latitude, longitude) float64 ...</pre>"
],
"text/plain": [
"<xarray.Dataset>\n",
"Dimensions: (forecast_time: 226, latitude: 20, longitude: 31, realization: 11)\n",
"Coordinates:\n",
" * latitude (latitude) float32 1.0 2.0 3.0 4.0 ... 17.0 18.0 19.0 20.0\n",
" * forecast_time (forecast_time) datetime64[ns] 1999-06-02 ... 2015-08-26\n",
" * realization (realization) float32 0.0 1.0 2.0 3.0 ... 7.0 8.0 9.0 10.0\n",
" * longitude (longitude) float32 -20.0 -19.0 -18.0 -17.0 ... 8.0 9.0 10.0\n",
" lead_time timedelta64[ns] 14 days\n",
"Data variables:\n",
" tp (realization, forecast_time, latitude, longitude) float64 ..."
]
},
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ds = xr.open_dataset(fname).rename({'X':'longitude', 'Y':'latitude', 'S':'forecast_time', 'M':'realization', 'aprod':'tp'}).assign_coords(lead_time=pd.Timedelta('14 d'))\n",
"ds"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## opendap"
]
},
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
]
},
{
"data": {
"text/html": [
"<pre><xarray.DataArray 'tp' (forecast_time: 886, realization: 11, latitude: 121, longitude: 240)>\n",
"dask.array<mul, shape=(886, 11, 121, 240), dtype=float32, chunksize=(443, 6, 102, 120), chunktype=numpy.ndarray>\n",
"Coordinates:\n",
" * forecast_time (forecast_time) datetime64[ns] 2000-01-12 ... 2016-12-28\n",
" * realization (realization) float32 0.0 1.0 2.0 3.0 ... 7.0 8.0 9.0 10.0\n",
" * longitude (longitude) float32 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5\n",
" * latitude (latitude) float32 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0\n",
" lead_time timedelta64[ns] 14 days\n",
"Attributes:\n",
" units: kg m-2\n",
" long_name: total precipitation</pre>"
"<xarray.DataArray 'tp' (forecast_time: 886, realization: 11, latitude: 121, longitude: 240)>\n",
"dask.array<mul, shape=(886, 11, 121, 240), dtype=float32, chunksize=(443, 6, 102, 120), chunktype=numpy.ndarray>\n",
"Coordinates:\n",
" * forecast_time (forecast_time) datetime64[ns] 2000-01-12 ... 2016-12-28\n",
" * realization (realization) float32 0.0 1.0 2.0 3.0 ... 7.0 8.0 9.0 10.0\n",
" * longitude (longitude) float32 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5\n",
" * latitude (latitude) float32 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0\n",
" lead_time timedelta64[ns] 14 days\n",
"Attributes:\n",
" units: kg m-2\n",
" long_name: total precipitation"
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# server-side aggregate w34 precip to tp and regrid to 1.5 degree\n",
"forecast = xr.open_dataset('http://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/.EMC/.GEFS/.hindcast/.pr/S/(0000%206%20Jan%202000)/(0000%2031%20Dec%202019)/RANGEEDGES/L/(14)/(28)/RANGEEDGES/[L]average/X/0/1.5/358.5/GRID/Y/90/1.5/-90/GRID/dods',\n",
" chunks='auto').rename({'X':'longitude', 'Y':'latitude', 'S':'forecast_time', 'M':'realization', 'pr':'tp'}).assign_coords(lead_time=pd.Timedelta('14 d'))\n",
"print(forecast.nbytes/1e9, 'GB')\n",
"\n",
"forecast = forecast * 3600 * 24 * 14 # convert from kg m-2 s-1 to to kg m-2 14day-1\n",
"forecast['tp'].attrs.update({'units':'kg m-2', 'long_name': 'total precipitation'}) # omitting biweekly is units\n",
"forecast.tp"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### create `forecast_like_observations` from observations with `time` dimension\n",
"\n",
"[climetlab_s2s_ai_challenge.extra.forecast_like_observations](https://github.com/ecmwf-lab/climetlab-s2s-ai-challenge/blob/47539214d78251e30b79bd4b0dbb4a6a085cc9bf/climetlab_s2s_ai_challenge/extra.py#L43)"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
"source": [
"import climetlab as cml\n",
"from climetlab_s2s_ai_challenge.extra import forecast_like_observations, create_valid_time_from_forecast_time_and_lead_time"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data. This dataset has been dowloaded from IRIDL. By downloading this data you also agree to the terms and conditions defined at https://iridl.ldeo.columbia.edu.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 1/1 [00:00<00:00, 147.08it/s]\n",
"/opt/conda/lib/python3.8/site-packages/climetlab/readers/__init__.py:86: UserWarning: Error loading wrapper grib: /opt/conda/lib/python3.8/site-packages/gribapi/_bindings.cpython-38-x86_64-linux-gnu.so: undefined symbol: codes_bufr_key_is_header\n",
" warnings.warn(f\"Error loading wrapper {name}: {e}\")\n"
]
},
{
"data": {
"text/html": [
"<pre><xarray.Dataset>\n",
"Dimensions: (latitude: 121, longitude: 240, time: 8154)\n",
"Coordinates:\n",
" * time (time) datetime64[ns] 1999-01-01 1999-01-02 ... 2021-04-28\n",
" * latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0\n",
" * longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 354.0 355.5 357.0 358.5\n",
"Data variables:\n",
" pr (time, latitude, longitude) float32 ...\n",
"Attributes:\n",
" source_dataset_name: NOAA NCEP CPC UNIFIED_PRCP GAUGE_BASED GLOBAL v1p0 ...\n",
" source_hosting: IRIDL\n",
" source_url: http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/...\n",
" created_by_software: climetlab-s2s-ai-challenge\n",
" created_by_script: tools/observations/makefile</pre>"
],
"text/plain": [
"<xarray.Dataset>\n",
"Dimensions: (latitude: 121, longitude: 240, time: 8154)\n",
"Coordinates:\n",
" * time (time) datetime64[ns] 1999-01-01 1999-01-02 ... 2021-04-28\n",
" * latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0\n",
" * longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 354.0 355.5 357.0 358.5\n",
"Data variables:\n",
" pr (time, latitude, longitude) float32 ...\n",
"Attributes:\n",
" source_dataset_name: NOAA NCEP CPC UNIFIED_PRCP GAUGE_BASED GLOBAL v1p0 ...\n",
" source_hosting: IRIDL\n",
" source_url: http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/...\n",
" created_by_software: climetlab-s2s-ai-challenge\n",
" created_by_script: tools/observations/makefile"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# download precipitation_flux pr\n",
"obs_ds = cml.load_dataset('s2s-ai-challenge-observations', parameter='pr').to_xarray()\n",
"\n",
"obs_ds"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"# less data heavy example: reduce forecast\n",
"forecast = forecast.sel(forecast_time='2000', realization=[0, 1]).expand_dims('lead_time')"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"# assign valid_time to forecasts\n",
"forecast = forecast.assign_coords(valid_time=create_valid_time_from_forecast_time_and_lead_time(forecast.forecast_time, forecast.lead_time))"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<pre><xarray.Dataset>\n",
"Dimensions: (forecast_time: 51, latitude: 121, lead_time: 30, longitude: 240, realization: 2)\n",
"Coordinates:\n",
" * forecast_time (forecast_time) datetime64[ns] 2000-01-12 ... 2000-12-27\n",
" * realization (realization) float32 0.0 1.0\n",
" * longitude (longitude) float32 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5\n",
" * latitude (latitude) float32 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0\n",
" * lead_time (lead_time) timedelta64[ns] 13 days 14 days ... 42 days\n",
" valid_time (lead_time, forecast_time) datetime64[ns] 2000-01-26 ... 2...\n",
"Data variables:\n",
" *empty*\n",
"Attributes:\n",
" Conventions: IRIDL</pre>"
],
"text/plain": [
"<xarray.Dataset>\n",
"Dimensions: (forecast_time: 51, latitude: 121, lead_time: 30, longitude: 240, realization: 2)\n",
"Coordinates:\n",
" * forecast_time (forecast_time) datetime64[ns] 2000-01-12 ... 2000-12-27\n",
" * realization (realization) float32 0.0 1.0\n",
" * longitude (longitude) float32 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5\n",
" * latitude (latitude) float32 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0\n",
" * lead_time (lead_time) timedelta64[ns] 13 days 14 days ... 42 days\n",
" valid_time (lead_time, forecast_time) datetime64[ns] 2000-01-26 ... 2...\n",
"Data variables:\n",
" *empty*\n",
"Attributes:\n",
" Conventions: IRIDL"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# create an empty daily lead_time forecast dataset to input into forecast_like_observations, which requires continuous daily leads, first lead_time stays empty\n",
"dummy_forecast_daily = forecast.coords.to_dataset().isel(lead_time=[0]*30).assign_coords(lead_time=[pd.Timedelta(f'{i} d') for i in range(13,13+30)])\n",
"dummy_forecast_daily"
]
},
{
"cell_type": "code",