Skip to content
IRIDL.ipynb 209 KiB
Newer Older
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# `iridl.ldeo.columbia.edu`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "IRI Data Library (IRIDL) hosts various subseasonal initialized forecast, hindcast simulations and observations:\n",
    "- `S2S project`:\n",
    "    - http://iridl.ldeo.columbia.edu/SOURCES/.ECMWF/.S2S/\n",
    "        - hindcast/reforecast: one variable, one model:\n",
    "        - login required\n",
    "- `SubX project`:\n",
    "    - http://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/\n",
    "        - hindcast/reforecast: one variable, one model:\n",
    "        - login not required\n",
    "- `NOAA CPC` observations:\n",
    "    - ground truth for `s2s-ai-challenge`\n",
    "    - login not required\n",
    "    - `pr`: http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/.CPC/.UNIFIED_PRCP/.GAUGE_BASED/.GLOBAL/.v1p0/.extREALTIME/.rain/\n",
    "    - `t2m`: http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/.CPC/.temperature/.daily/\n",
    "---\n",
    "- Notes:\n",
    "    - Output on IRIDL is not always on the 1.5 degree grid requested for the competition. Also dimension names and coordinates differ.\n",
    "    - Beware that most models are not only initialized on thursdays. It is not forbidden to use simulations which are started on other weekdays, buy please pay attention that you may only use information available on `forecast_time`, i.e. if the model is initialized on Mondays, you have to use the day 14+3=17 to day 27+3=30 forecast for week 3-4.\n",
    "    - Beware that conventions (variable name, standard_name, coordinates, output format) from `SubX` and `S2S` might differ.\n",
    "    - First check whether you can find 2020 `test` and previous `training` data for a given feature.\n",
    "This notebook also provides opendap magic, i.e. commands added to the opendap URL which preprocess data server-side: select and aggregate `lead_time` `L` and `forecast_time` `S`. Regridding `X` and `Y`.\n",
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Todo:\n",
    "- [ ] how to handle `hdate` best?\n",
    "- [ ] how to not download empty `S`? specify weekly stride in `S` via ingrid"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# `IRIDL` cookie"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here are instructions for configuring xarray to open protected Data Library datasets, after you have created a Data Library account and accepted the terms and conditions for the dataset.\n",
    "1. Visit https://iridl.ldeo.columbia.edu/auth/genkey . Log in to the Data Library. Copy the key from the response.\n",
    "\n",
    "2. Create a file with the following content, substituting the key from step 1 for `\"xxxx\"`:\n",
    "`Set-Cookie: __dlauth_id=xxxx; domain=.iridl.ldeo.columbia.edu`\n",
    "\n",
    "3. Put the following in `~/.daprc`, which is `/home/jovyan/.daprc` on renku, substituting the path to the above file for `/path/to/cookie/file`:\n",
    "`HTTP.COOKIEJAR=/path/to/cookie/file`. You may need to copy `.daprc` to `/home/jovyan` on renku, because `/home/jovyan` is not tracked by `git`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting /work/s2s-ai-challenge-template/.daprc\n"
     ]
    }
   ],
   "source": [
    "%%writefile /work/s2s-ai-challenge-template/.daprc\n",
    "HTTP.COOKIEJAR=/work/s2s-ai-challenge-template/.cookie_iridl"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "!cp /work/s2s-ai-challenge-template/.daprc /home/jovyan"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "#%writefile /work/s2s-ai-challenge-template/.cookie_iridl\n",
    "#Set-Cookie: __dlauth_id=xxxx; domain=.iridl.ldeo.columbia.edu"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting /work/s2s-ai-challenge-template/.cookie_iridl\n"
     ]
    }
   ],
   "source": [
    "%%writefile /work/s2s-ai-challenge-template/.cookie_iridl\n",
    "Set-Cookie: __dlauth_id=get_from_browser; domain=.iridl.ldeo.columbia.edu"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/opt/conda/lib/python3.8/site-packages/xarray/backends/cfgrib_.py:27: UserWarning: Failed to load cfgrib - most likely there is a problem accessing the ecCodes library. Try `import cfgrib` to get the full error message\n",
      "  warnings.warn(\n"
     ]
    }
   ],
   "source": [
    "import xarray as xr\n",
    "xr.set_options(display_style='text')\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# S2S"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Please beawre that most models are not only initialized on thursdays.\n",
    "It is not forbidden to use simulations which are started on other weekdays,\n",
    "buy please pay attention that you may only use information available on `forecast_time`,\n",
    "i.e. if the model is initialized on Mondays, you have to use the day 14+3=17 to day 27+3=30 forecast for week 3-4."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/opt/conda/lib/python3.8/site-packages/xarray/backends/plugins.py:61: RuntimeWarning: Engine 'cfgrib' loading failed:\n",
      "/opt/conda/lib/python3.8/site-packages/gribapi/_bindings.cpython-38-x86_64-linux-gnu.so: undefined symbol: codes_bufr_key_is_header\n",
      "  warnings.warn(f\"Engine {name!r} loading failed:\\n{ex}\", RuntimeWarning)\n"
     ]
    }
   ],
   "source": [
    "ds = xr.open_dataset('https://iridl.ldeo.columbia.edu/SOURCES/.ECMWF/.S2S/.ECMF/.reforecast/.control/.2m_above_ground/.2t/dods',\n",
    "                     chunks='auto', decode_times=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# calendar '360' not recognized, but '360_day'\n",
    "if ds.hdate.attrs['calendar'] == '360':\n",
    "    ds.hdate.attrs['calendar'] = '360_day'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre>&lt;xarray.DataArray &#x27;t2m&#x27; (hdate: 26, forecast_time: 641, lead_time: 46, latitude: 121, longitude: 240)&gt;\n",
       "dask.array&lt;open_dataset-f89df07098f6ce22c120a08e3f3f29a52t, shape=(26, 641, 46, 121, 240), dtype=float32, chunksize=(7, 197, 12, 33, 60), chunktype=numpy.ndarray&gt;\n",
       "Coordinates:\n",
       "  * latitude       (latitude) float32 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0\n",
       "  * hdate          (hdate) object 1995-07-01 00:00:00 ... 2020-07-01 00:00:00\n",
       "  * lead_time      (lead_time) timedelta64[ns] 0 days 12:00:00 ... 45 days 12...\n",
       "  * forecast_time  (forecast_time) datetime64[ns] 2015-05-14 ... 2021-07-01\n",
       "  * longitude      (longitude) float32 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5\n",
       "Attributes:\n",
       "    pointwidth:      0\n",
       "    gribPDSpattern:  04XXXX003D0000\n",
       "    standard_name:   air_temperature\n",
       "    long_name:       2-meter Temperature\n",
       "    units:           K</pre>"
      ],
      "text/plain": [
       "<xarray.DataArray 't2m' (hdate: 26, forecast_time: 641, lead_time: 46, latitude: 121, longitude: 240)>\n",
       "dask.array<open_dataset-f89df07098f6ce22c120a08e3f3f29a52t, shape=(26, 641, 46, 121, 240), dtype=float32, chunksize=(7, 197, 12, 33, 60), chunktype=numpy.ndarray>\n",
       "Coordinates:\n",
       "  * latitude       (latitude) float32 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0\n",
       "  * hdate          (hdate) object 1995-07-01 00:00:00 ... 2020-07-01 00:00:00\n",
       "  * lead_time      (lead_time) timedelta64[ns] 0 days 12:00:00 ... 45 days 12...\n",
       "  * forecast_time  (forecast_time) datetime64[ns] 2015-05-14 ... 2021-07-01\n",
       "  * longitude      (longitude) float32 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5\n",
       "Attributes:\n",
       "    pointwidth:      0\n",
       "    gribPDSpattern:  04XXXX003D0000\n",
       "    standard_name:   air_temperature\n",
       "    long_name:       2-meter Temperature\n",
       "    units:           K"
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ds = xr.decode_cf(ds).rename({'X':'longitude', 'Y':'latitude', 'S':'forecast_time', 'LA': 'lead_time', '2t':'t2m'})\n",
    "ds['t2m']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(89.052444908, 'GB')"
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ds.nbytes/1e9,'GB'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "# hdate gets the privous years reforecast for that dayofyear"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Hindcast Availability\n",
    "\n",
    "- BOM:\tBoM POAMA Ensemble.\n",
    "- CMA:\tBeijing Climate Center (BCC) Climate Prediction System for S2S.\n",
    "- CNRM:\tCNRM Ensemble Prediction System.\n",
    "- ECCC:\tECCC Ensemble Prediction System.\n",
    "- ECMF:\tECMWF Ensemble.\n",
    "- HMCR:\tHMCR Ensemble.\n",
    "- ISAC:\tISAC-CNR Ensemble.\n",
    "- JMA:\tJMA Ensemble System.\n",
    "- KMA:\tKMA Seasonal Prediction System.\n",
    "- NCEP:\tNCEP CFSv2 Ensemble.\n",
    "- UKMO:\tUKMO Ensemble Prediction System."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "BOM not on-the-fly forecast_time freq not found \n",
      " Coordinates:\n",
      "  * latitude       (latitude) float32 88.1 85.64 83.16 ... -83.16 -85.64 -88.1\n",
      "  * lead_time      (lead_time) timedelta64[ns] 0 days 12:00:00 ... 61 days 12...\n",
      "  * forecast_time  (forecast_time) datetime64[ns] 1981-01-01 ... 2013-12-26\n",
      "  * realization    (realization) float32 1.0 2.0 3.0 4.0 ... 29.0 30.0 31.0 32.0\n",
      "  * longitude      (longitude) float32 0.0 2.507 5.014 ... 353.5 356.0 358.5 \n",
      " Frozen(SortedKeysDict({'latitude': 72, 'lead_time': 62, 'forecast_time': 2376, 'realization': 32, 'longitude': 144})) 195.498364944 GB \n",
      "\n",
      "CNRM not on-the-fly forecast_time freq not found \n",
      " Coordinates:\n",
      "  * latitude       (latitude) float32 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0\n",
      "  * lead_time      (lead_time) timedelta64[ns] 0 days 12:00:00 ... 60 days 12...\n",
      "  * forecast_time  (forecast_time) datetime64[ns] 1993-01-01 ... 2014-12-15\n",
      "  * realization    (realization) float32 1.0 2.0 3.0 4.0 ... 11.0 12.0 13.0 14.0\n",
      "  * longitude      (longitude) float32 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5 \n",
      " Frozen(SortedKeysDict({'latitude': 121, 'lead_time': 61, 'forecast_time': 528, 'realization': 14, 'longitude': 240})) 52.377944132 GB \n",
      "\n",
      "ECCC on-the-fly forecast_time freq:W-THU \n",
      " Coordinates:\n",
      "  * latitude       (latitude) float32 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0\n",
      "  * forecast_time  (forecast_time) datetime64[ns] 2016-01-07 ... 2021-06-10\n",
      "  * realization    (realization) float32 1.0 2.0 3.0\n",
      "  * longitude      (longitude) float32 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5\n",
      "  * hdate          (hdate) object 1995-07-01 00:00:00 ... 2017-07-01 00:00:00\n",
      "  * lead_time      (lead_time) timedelta64[ns] 0 days 12:00:00 ... 31 days 12... \n",
      " Frozen(SortedKeysDict({'latitude': 121, 'forecast_time': 284, 'realization': 3, 'longitude': 240, 'hdate': 23, 'lead_time': 32})) 72.840687688 GB \n",
      "\n",
      "ECMF on-the-fly forecast_time freq not found \n",
      " Coordinates:\n",
      "  * latitude       (latitude) float32 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0\n",
      "  * hdate          (hdate) object 1995-07-01 00:00:00 ... 2020-07-01 00:00:00\n",
      "  * lead_time      (lead_time) timedelta64[ns] 0 days 12:00:00 ... 45 days 12...\n",
      "  * forecast_time  (forecast_time) datetime64[ns] 2015-05-14 ... 2021-07-01\n",
      "  * realization    (realization) float32 1.0 2.0 3.0 4.0 ... 7.0 8.0 9.0 10.0\n",
      "  * longitude      (longitude) float32 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5 \n",
      " Frozen(SortedKeysDict({'latitude': 121, 'hdate': 26, 'lead_time': 46, 'forecast_time': 641, 'realization': 10, 'longitude': 240})) 890.524384788 GB \n",
      "\n",
      "HMCR on-the-fly forecast_time freq not found \n",
      " Coordinates:\n",
      "  * lead_time      (lead_time) timedelta64[ns] 0 days 12:00:00 ... 60 days 12...\n",
      "  * latitude       (latitude) float32 90.0 87.5 85.0 82.5 ... -85.0 -87.5 -90.0\n",
      "  * forecast_time  (forecast_time) datetime64[ns] 2015-01-07 ... 2021-06-10\n",
      "  * realization    (realization) float32 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0\n",
      "  * longitude      (longitude) float32 0.0 2.507 5.014 ... 353.5 356.0 358.5\n",
      "  * hdate          (hdate) object 1985-07-01 00:00:00 ... 2010-07-01 00:00:00 \n",
      " Frozen(SortedKeysDict({'lead_time': 61, 'latitude': 73, 'forecast_time': 336, 'realization': 9, 'longitude': 144, 'hdate': 26})) 201.66490336 GB \n",
      "\n",
      "model=ISAC failed due to OSError: [Errno -90] NetCDF: file not found: b'https://iridl.ldeo.columbia.edu/SOURCES/.ECMWF/.S2S/.ISAC/.reforecast/.perturbed/.2m_above_ground/.2t/dods' \n",
      "\n",
      "JMA not on-the-fly forecast_time freq:D \n",
      " Coordinates:\n",
      "  * lead_time      (lead_time) timedelta64[ns] 1 days 2 days ... 32 days 33 days\n",
      "  * latitude       (latitude) float32 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0\n",
      "  * forecast_time  (forecast_time) datetime64[ns] 1981-01-10T12:00:00 ... 201...\n",
      "  * realization    (realization) float32 1.0 2.0 3.0 4.0\n",
      "  * longitude      (longitude) float32 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5 \n",
      " Frozen(SortedKeysDict({'lead_time': 33, 'latitude': 121, 'forecast_time': 10948, 'realization': 4, 'longitude': 240})) 167.867087068 GB \n",
      "\n",
      "KMA on-the-fly forecast_time freq:D \n",
      " Coordinates:\n",
      "  * hdate          (hdate) object 1991-07-01 00:00:00 ... 2010-07-01 00:00:00\n",
      "  * latitude       (latitude) float32 90.0 87.5 85.0 82.5 ... -85.0 -87.5 -90.0\n",
      "  * lead_time      (lead_time) timedelta64[ns] 0 days 12:00:00 ... 59 days 12...\n",
      "  * forecast_time  (forecast_time) datetime64[ns] 2016-11-01 ... 2021-06-17\n",
      "  * realization    (realization) float32 1.0 2.0\n",
      "  * longitude      (longitude) float32 0.0 2.507 5.014 ... 353.5 356.0 358.5 \n",
      " Frozen(SortedKeysDict({'hdate': 20, 'latitude': 73, 'lead_time': 60, 'forecast_time': 1690, 'realization': 2, 'longitude': 144})) 170.546703036 GB \n",
      "\n",
      "NCEP not on-the-fly forecast_time freq:D \n",
      " Coordinates:\n",
      "  * latitude       (latitude) float32 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0\n",
      "  * lead_time      (lead_time) timedelta64[ns] 0 days 12:00:00 ... 43 days 12...\n",
      "  * forecast_time  (forecast_time) datetime64[ns] 1999-01-01 ... 2010-12-31\n",
      "  * realization    (realization) float32 1.0 2.0 3.0\n",
      "  * longitude      (longitude) float32 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5 \n",
      " Frozen(SortedKeysDict({'latitude': 121, 'lead_time': 44, 'forecast_time': 4383, 'realization': 3, 'longitude': 240})) 67.205101832 GB \n",
      "\n",
      "UKMO on-the-fly forecast_time freq not found \n",
      " Coordinates:\n",
      "  * lead_time      (lead_time) timedelta64[ns] 0 days 12:00:00 ... 59 days 12...\n",
      "  * latitude       (latitude) float32 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0\n",
      "  * forecast_time  (forecast_time) datetime64[ns] 2016-01-01 ... 2019-05-09\n",
      "  * realization    (realization) float32 1.0 2.0\n",
      "  * longitude      (longitude) float32 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5\n",
      "  * hdate          (hdate) object 1993-07-01 00:00:00 ... 2015-07-01 00:00:00 \n",
      " Frozen(SortedKeysDict({'lead_time': 60, 'latitude': 121, 'forecast_time': 162, 'realization': 2, 'longitude': 240, 'hdate': 23})) 51.937462612 GB \n",
      "\n"
     ]
    }
   ],
   "source": [
    "models = ['BOM','CNRM','ECCC','ECMF','HMCR','ISAC','JMA','KMA','NCEP','UKMO']\n",
    "for model in models:\n",
    "    try:\n",
    "        ds = xr.open_dataset(f'https://iridl.ldeo.columbia.edu/SOURCES/.ECMWF/.S2S/.{model}/.reforecast/.perturbed/.2m_above_ground/.2t/dods',\n",
    "                             chunks='auto', decode_times=False).rename({'S':'forecast_time', 'LA':'lead_time','M':'realization', 'X':'longitude', 'Y':'latitude'})\n",
    "        # calendar '360' not recognized, but '360_day'\n",
    "        for c in ['hdate','forecast_time']:\n",
    "            if c in ds.coords:\n",
    "                if ds[c].attrs['calendar'] == '360':\n",
    "                    ds[c].attrs['calendar'] = '360_day'\n",
    "        ds = xr.decode_cf(ds)\n",
    "        onthefly = True if 'hdate' in ds.coords else False\n",
    "        forecast_time_freq = xr.infer_freq(ds.forecast_time)\n",
    "        print(model, 'on-the-fly' if onthefly else 'not on-the-fly',\n",
    "              'forecast_time freq:'+forecast_time_freq if forecast_time_freq else 'forecast_time freq not found',\n",
    "              '\\n',ds.coords,'\\n',ds.sizes,ds.nbytes/1e9,'GB','\\n')\n",
    "    except Exception as e:\n",
    "        print(f'model={model} failed due to {type(e).__name__}: {e} \\n')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# SubX"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The access to output from the SubX project does not require login information via cookie."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "ds = xr.open_dataset('http://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/.CESM/.30LCESM1/.hindcast/.tas/dods',\n",
    "                     chunks='auto', decode_times=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "# calendar '360' not recognized, but '360_day'\n",
    "if ds.S.attrs['calendar'] == '360':\n",
    "    ds.S.attrs['calendar'] = '360_day'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre>&lt;xarray.DataArray &#x27;t2m&#x27; (forecast_time: 887, realization: 10, lead_time: 45, latitude: 181, longitude: 360)&gt;\n",
       "dask.array&lt;open_dataset-1bd5755a82e148fd83330ea4db46cbb8tas, shape=(887, 10, 45, 181, 360), dtype=float32, chunksize=(335, 2, 9, 61, 90), chunktype=numpy.ndarray&gt;\n",
       "Coordinates:\n",
       "  * lead_time      (lead_time) timedelta64[ns] 0 days 12:00:00 ... 44 days 12...\n",
       "  * latitude       (latitude) float32 -90.0 -89.0 -88.0 -87.0 ... 88.0 89.0 90.0\n",
       "  * forecast_time  (forecast_time) datetime64[ns] 1999-01-06 ... 2015-12-30\n",
       "  * realization    (realization) float32 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0\n",
       "  * longitude      (longitude) float32 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0\n",
       "Attributes:\n",
       "    pointwidth:     0.0\n",
       "    cell_methods:   time: mean\n",
       "    units:          Kelvin_scale\n",
       "    standard_name:  air_temperature\n",
       "    long_name:      2-meter Air Temperature\n",
       "    level_type:     2 meters above ground</pre>"
      ],
      "text/plain": [
       "<xarray.DataArray 't2m' (forecast_time: 887, realization: 10, lead_time: 45, latitude: 181, longitude: 360)>\n",
       "dask.array<open_dataset-1bd5755a82e148fd83330ea4db46cbb8tas, shape=(887, 10, 45, 181, 360), dtype=float32, chunksize=(335, 2, 9, 61, 90), chunktype=numpy.ndarray>\n",
       "Coordinates:\n",
       "  * lead_time      (lead_time) timedelta64[ns] 0 days 12:00:00 ... 44 days 12...\n",
       "  * latitude       (latitude) float32 -90.0 -89.0 -88.0 -87.0 ... 88.0 89.0 90.0\n",
       "  * forecast_time  (forecast_time) datetime64[ns] 1999-01-06 ... 2015-12-30\n",
       "  * realization    (realization) float32 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0\n",
       "  * longitude      (longitude) float32 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0\n",
       "Attributes:\n",
       "    pointwidth:     0.0\n",
       "    cell_methods:   time: mean\n",
       "    units:          Kelvin_scale\n",
       "    standard_name:  air_temperature\n",
       "    long_name:      2-meter Air Temperature\n",
       "    level_type:     2 meters above ground"
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ds = xr.decode_cf(ds).rename({'X':'longitude', 'Y':'latitude', 'S':'forecast_time', 'L': 'lead_time', 'M':'realization', 'tas':'t2m'})\n",
    "ds['t2m']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(104.03446566, 'GB')"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ds.nbytes/1e9,'GB'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Hindcast Availability"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- center: model\n",
    "- CESM:\t30LCESM1 46LCESM1\n",
    "- ECCC:\tGEM GEPS6 GEPS5\n",
    "- EMC:\tGEFS GEFSv12\n",
    "- ESRL:\tFIMr1p1\n",
    "- GMAO:\tGEOS_V2p1\n",
    "- NCEP:\tCFSv2\n",
    "- NRL:\tNESM\n",
    "- RSMAS:\tCCSM4"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "30LCESM1 not on-the-fly forecast_time freq:W-WED \n",
      " Coordinates:\n",
      "  * lead_time      (lead_time) timedelta64[ns] 0 days 12:00:00 ... 44 days 12...\n",
      "  * latitude       (latitude) float32 -90.0 -89.0 -88.0 -87.0 ... 88.0 89.0 90.0\n",
      "  * forecast_time  (forecast_time) datetime64[ns] 1999-01-06 ... 2015-12-30\n",
      "  * realization    (realization) float32 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0\n",
      "  * longitude      (longitude) float32 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0 \n",
      " Frozen(SortedKeysDict({'lead_time': 45, 'latitude': 181, 'forecast_time': 887, 'realization': 10, 'longitude': 360})) 104.03446566 GB \n",
      "\n",
      "46LCESM1 not on-the-fly forecast_time freq:W-WED \n",
      " Coordinates:\n",
      "  * lead_time      (lead_time) timedelta64[ns] 0 days 12:00:00 ... 44 days 12...\n",
      "  * latitude       (latitude) float32 -90.0 -89.0 -88.0 -87.0 ... 88.0 89.0 90.0\n",
      "  * forecast_time  (forecast_time) datetime64[ns] 1999-01-06 ... 2015-12-30\n",
      "  * realization    (realization) float32 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0\n",
      "  * longitude      (longitude) float32 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0 \n",
      " Frozen(SortedKeysDict({'lead_time': 45, 'latitude': 181, 'forecast_time': 887, 'realization': 10, 'longitude': 360})) 104.03446566 GB \n",
      "\n",
      "GEM not on-the-fly forecast_time freq:D \n",
      " Coordinates:\n",
      "  * lead_time      (lead_time) timedelta64[ns] 0 days 12:00:00 ... 31 days 12...\n",
      "  * latitude       (latitude) float32 -90.0 -89.0 -88.0 -87.0 ... 88.0 89.0 90.0\n",
      "  * forecast_time  (forecast_time) datetime64[ns] 1995-01-04 ... 2014-12-28\n",
      "  * realization    (realization) float32 1.0 2.0 3.0 4.0\n",
      "  * longitude      (longitude) float32 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0 \n",
      " Frozen(SortedKeysDict({'lead_time': 32, 'latitude': 181, 'forecast_time': 7299, 'realization': 4, 'longitude': 360})) 243.508714908 GB \n",
      "\n",
      "GEPS6 not on-the-fly forecast_time freq:D \n",
      " Coordinates:\n",
      "  * lead_time      (lead_time) timedelta64[ns] 0 days 12:00:00 ... 31 days 12...\n",
      "  * latitude       (latitude) float32 -90.0 -89.0 -88.0 -87.0 ... 88.0 89.0 90.0\n",
      "  * forecast_time  (forecast_time) datetime64[ns] 1998-01-03 ... 2017-12-27\n",
      "  * realization    (realization) float32 1.0 2.0 3.0 4.0\n",
      "  * longitude      (longitude) float32 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0 \n",
      " Frozen(SortedKeysDict({'lead_time': 32, 'latitude': 181, 'forecast_time': 7299, 'realization': 4, 'longitude': 360})) 243.508714908 GB \n",
      "\n",
      "GEPS5 not on-the-fly forecast_time freq:D \n",
      " Coordinates:\n",
      "  * lead_time      (lead_time) timedelta64[ns] 0 days 12:00:00 ... 31 days 12...\n",
      "  * latitude       (latitude) float32 -90.0 -89.0 -88.0 -87.0 ... 88.0 89.0 90.0\n",
      "  * forecast_time  (forecast_time) datetime64[ns] 1998-01-03 ... 2017-12-27\n",
      "  * realization    (realization) float32 1.0 2.0 3.0 4.0\n",
      "  * longitude      (longitude) float32 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0 \n",
      " Frozen(SortedKeysDict({'lead_time': 32, 'latitude': 181, 'forecast_time': 7299, 'realization': 4, 'longitude': 360})) 243.508714908 GB \n",
      "\n",
      "GEFS not on-the-fly forecast_time freq:W-WED \n",
      " Coordinates:\n",
      "  * lead_time      (lead_time) timedelta64[ns] 0 days 12:00:00 ... 34 days 12...\n",
      "  * latitude       (latitude) float32 90.0 89.0 88.0 87.0 ... -88.0 -89.0 -90.0\n",
      "  * forecast_time  (forecast_time) datetime64[ns] 1999-01-06 ... 2016-12-28\n",
      "  * realization    (realization) float32 0.0 1.0 2.0 3.0 ... 7.0 8.0 9.0 10.0\n",
      "  * longitude      (longitude) float32 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0 \n",
      " Frozen(SortedKeysDict({'lead_time': 35, 'latitude': 181, 'forecast_time': 939, 'realization': 11, 'longitude': 360})) 94.2252796 GB \n",
      "\n",
      "center=EMC model=GEFSv12 failed due to OSError: [Errno -90] NetCDF: file not found: b'https://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/.EMC/.GEFSv12/.hindcast/.tas/dods' \n",
      "\n",
      "FIMr1p1 not on-the-fly forecast_time freq:W-WED \n",
      " Coordinates:\n",
      "  * lead_time      (lead_time) timedelta64[ns] 0 days 12:00:00 ... 31 days 12...\n",
      "  * latitude       (latitude) float32 -90.0 -89.0 -88.0 -87.0 ... 88.0 89.0 90.0\n",
      "  * forecast_time  (forecast_time) datetime64[ns] 1999-01-06 ... 2017-06-28\n",
      "  * realization    (realization) float32 1.0 2.0 3.0 4.0\n",
      "  * longitude      (longitude) float32 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0 \n",
      " Frozen(SortedKeysDict({'lead_time': 32, 'latitude': 181, 'forecast_time': 965, 'realization': 4, 'longitude': 360})) 32.194262956 GB \n",
      "\n",
      "GEOS_V2p1 not on-the-fly forecast_time freq:D \n",
      " Coordinates:\n",
      "  * lead_time      (lead_time) timedelta64[ns] 0 days 12:00:00 ... 44 days 12...\n",
      "  * latitude       (latitude) float32 -90.0 -89.0 -88.0 -87.0 ... 88.0 89.0 90.0\n",
      "  * forecast_time  (forecast_time) datetime64[ns] 1999-01-01 ... 2016-12-27\n",
      "  * realization    (realization) float32 1.0 2.0 3.0 4.0\n",
      "  * longitude      (longitude) float32 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0 \n",
      " Frozen(SortedKeysDict({'lead_time': 45, 'latitude': 181, 'forecast_time': 6571, 'realization': 4, 'longitude': 360})) 308.279834308 GB \n",
      "\n",
      "CFSv2 not on-the-fly forecast_time freq:6H \n",
      " Coordinates:\n",
      "  * lead_time      (lead_time) timedelta64[ns] 0 days 12:00:00 ... 43 days 12...\n",
      "  * latitude       (latitude) float32 90.0 89.0 88.0 87.0 ... -88.0 -89.0 -90.0\n",
      "  * forecast_time  (forecast_time) datetime64[ns] 1999-01-01 ... 2017-09-30\n",
      "  * realization    (realization) int32 1\n",
      "  * longitude      (longitude) float32 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0 \n",
      " Frozen(SortedKeysDict({'lead_time': 44, 'latitude': 181, 'forecast_time': 27389, 'realization': 1, 'longitude': 360})) 314.101655872 GB \n",
      "\n",
      "NESM not on-the-fly forecast_time freq:D \n",
      " Coordinates:\n",
      "  * lead_time      (lead_time) timedelta64[ns] 0 days 12:00:00 ... 44 days 12...\n",
      "  * latitude       (latitude) float32 -90.0 -89.0 -88.0 -87.0 ... 88.0 89.0 90.0\n",
      "  * forecast_time  (forecast_time) datetime64[ns] 1999-01-02T12:00:00 ... 201...\n",
      "  * realization    (realization) int32 1\n",
      "  * longitude      (longitude) float32 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0 \n",
      " Frozen(SortedKeysDict({'lead_time': 45, 'latitude': 181, 'forecast_time': 6574, 'realization': 1, 'longitude': 360})) 77.10518632 GB \n",
      "\n",
      "CCSM4 not on-the-fly forecast_time freq:D \n",
      " Coordinates:\n",
      "  * lead_time      (lead_time) timedelta64[ns] 0 days 12:00:00 ... 44 days 12...\n",
      "  * latitude       (latitude) float32 -90.0 -89.0 -88.0 -87.0 ... 88.0 89.0 90.0\n",
      "  * forecast_time  (forecast_time) datetime64[ns] 1999-01-07 ... 2016-12-31\n",
      "  * realization    (realization) float32 1.0 2.0 3.0\n",
      "  * longitude      (longitude) float32 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0 \n",
      " Frozen(SortedKeysDict({'lead_time': 45, 'latitude': 181, 'forecast_time': 6569, 'realization': 3, 'longitude': 360})) 231.139516688 GB \n",
      "\n"
     ]
    }
   ],
   "source": [
    "centers = ['CESM',   'CESM',  'ECCC', 'ECCC', 'ECCC', 'EMC', 'EMC',   'ESRL',    'GMAO'    , 'NCEP', 'NRL','RSMAS']\n",
    "models = ['30LCESM1','46LCESM1','GEM','GEPS6','GEPS5','GEFS','GEFSv12','FIMr1p1','GEOS_V2p1','CFSv2','NESM','CCSM4']\n",
    "for center,model in zip(centers,models):\n",
    "    try:\n",
    "        ds = xr.open_dataset(f'https://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/.{center}/.{model}/.hindcast/.tas/dods',\n",
    "                             chunks='auto', decode_times=False).rename({'S':'forecast_time', 'L':'lead_time','M':'realization', 'X':'longitude', 'Y':'latitude'})\n",
    "        # calendar '360' not recognized, but '360_day'\n",
    "        for c in ['hdate','forecast_time']:\n",
    "            if c in ds.coords:\n",
    "                if ds[c].attrs['calendar'] == '360':\n",
    "                    ds[c].attrs['calendar'] = '360_day'\n",
    "        ds = xr.decode_cf(ds)\n",
    "        onthefly = True if 'hdate' in ds.coords else False\n",
    "        forecast_time_freq = xr.infer_freq(ds.forecast_time)\n",
    "        print(model, 'on-the-fly' if onthefly else 'not on-the-fly',\n",
    "              'forecast_time freq:'+forecast_time_freq if forecast_time_freq else 'forecast_time freq not found',\n",
    "              '\\n',ds.coords,'\\n',ds.sizes,ds.nbytes/1e9,'GB','\\n')\n",
    "    except Exception as e:\n",
    "        print(f'center={center} model={model} failed due to {type(e).__name__}: {e} \\n')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Opendap magic\n",
    "\n",
    "Opendap URLs be appended for server-side preprocessing.\n",
    "\n",
    "- https://www.opendap.org/support\n",
    "- http://iridl.ldeo.columbia.edu/dochelp/topics/DODS/fnlist.html\n",
    "- https://iridl.ldeo.columbia.edu/dochelp/Documentation/funcindex.html?Set-Language=en"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## `curl` or `wget`\n",
    "\n",
    "You can always work file-based and download from IRIDL via `curl` or `wget`. However, nicer is direct access via `opendap` and `xarray`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from subprocess import call\n",
    "fname = 'GEFS_pra_hc.nc'\n",
    "# endless magic commands selecting week 3-4 and aggregating pr to tp with unit conversion\n",
    "dset_url = 'http://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/.EMC/.GEFS/.hindcast/.pr/S/(0000%206%20Jan%201999)/(0000%2028%20Dec%202015)/RANGEEDGES/S/(days%20since%201999-01-01)/streamgridunitconvert/Y/1/20/RANGE/X/-20/10/RANGE/L/(14)/(28)/RANGEEDGES/%5BL%5Daverage/S/(Jun-Aug)/VALUES/SOURCES/.Models/.SubX/.EMC/.GEFS/.hindcast/.dc9915/.pr/Y/1/20/RANGE/X/-20/10/RANGE/L/(14)/(28)/RANGEEDGES/%5BL%5Daverage/S/to366daysample/%5BYR%5Daverage/S/sampleDOY/sub/c%3A/0.001/(m3%20kg-1)/%3Ac/mul/c%3A/1000/(mm%20m-1)/%3Ac/mul/c%3A/86400/(s%20day-1)/%3Ac/mul/c%3A/7.0//units//days/def/%3Ac/mul/data.nc'\n",
    "# download data with curl\n",
    "call(['curl','-k',dset_url, '-o',fname])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre>&lt;xarray.Dataset&gt;\n",
       "Dimensions:        (forecast_time: 226, latitude: 20, longitude: 31, realization: 11)\n",
       "Coordinates:\n",
       "  * latitude       (latitude) float32 1.0 2.0 3.0 4.0 ... 17.0 18.0 19.0 20.0\n",
       "  * forecast_time  (forecast_time) datetime64[ns] 1999-06-02 ... 2015-08-26\n",
       "  * realization    (realization) float32 0.0 1.0 2.0 3.0 ... 7.0 8.0 9.0 10.0\n",
       "  * longitude      (longitude) float32 -20.0 -19.0 -18.0 -17.0 ... 8.0 9.0 10.0\n",
       "    lead_time      timedelta64[ns] 14 days\n",
       "Data variables:\n",
       "    tp             (realization, forecast_time, latitude, longitude) float64 ...</pre>"
      ],
      "text/plain": [
       "<xarray.Dataset>\n",
       "Dimensions:        (forecast_time: 226, latitude: 20, longitude: 31, realization: 11)\n",
       "Coordinates:\n",
       "  * latitude       (latitude) float32 1.0 2.0 3.0 4.0 ... 17.0 18.0 19.0 20.0\n",
       "  * forecast_time  (forecast_time) datetime64[ns] 1999-06-02 ... 2015-08-26\n",
       "  * realization    (realization) float32 0.0 1.0 2.0 3.0 ... 7.0 8.0 9.0 10.0\n",
       "  * longitude      (longitude) float32 -20.0 -19.0 -18.0 -17.0 ... 8.0 9.0 10.0\n",
       "    lead_time      timedelta64[ns] 14 days\n",
       "Data variables:\n",
       "    tp             (realization, forecast_time, latitude, longitude) float64 ..."
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ds = xr.open_dataset(fname).rename({'X':'longitude', 'Y':'latitude', 'S':'forecast_time', 'M':'realization', 'aprod':'tp'}).assign_coords(lead_time=pd.Timedelta('14 d'))\n",
    "ds"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## opendap"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1.132103944 GB\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<pre>&lt;xarray.DataArray &#x27;tp&#x27; (forecast_time: 886, realization: 11, latitude: 121, longitude: 240)&gt;\n",
       "dask.array&lt;mul, shape=(886, 11, 121, 240), dtype=float32, chunksize=(443, 6, 102, 120), chunktype=numpy.ndarray&gt;\n",
       "Coordinates:\n",
       "  * forecast_time  (forecast_time) datetime64[ns] 2000-01-12 ... 2016-12-28\n",
       "  * realization    (realization) float32 0.0 1.0 2.0 3.0 ... 7.0 8.0 9.0 10.0\n",
       "  * longitude      (longitude) float32 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5\n",
       "  * latitude       (latitude) float32 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0\n",
       "    lead_time      timedelta64[ns] 14 days\n",
       "Attributes:\n",
       "    units:      kg m-2\n",
       "    long_name:  total precipitation</pre>"
      ],
      "text/plain": [
       "<xarray.DataArray 'tp' (forecast_time: 886, realization: 11, latitude: 121, longitude: 240)>\n",
       "dask.array<mul, shape=(886, 11, 121, 240), dtype=float32, chunksize=(443, 6, 102, 120), chunktype=numpy.ndarray>\n",
       "Coordinates:\n",
       "  * forecast_time  (forecast_time) datetime64[ns] 2000-01-12 ... 2016-12-28\n",
       "  * realization    (realization) float32 0.0 1.0 2.0 3.0 ... 7.0 8.0 9.0 10.0\n",
       "  * longitude      (longitude) float32 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5\n",
       "  * latitude       (latitude) float32 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0\n",
       "    lead_time      timedelta64[ns] 14 days\n",
       "Attributes:\n",
       "    units:      kg m-2\n",
       "    long_name:  total precipitation"
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# server-side aggregate w34 precip to tp and regrid to 1.5 degree\n",
    "forecast = xr.open_dataset('http://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/.EMC/.GEFS/.hindcast/.pr/S/(0000%206%20Jan%202000)/(0000%2031%20Dec%202019)/RANGEEDGES/L/(14)/(28)/RANGEEDGES/[L]average/X/0/1.5/358.5/GRID/Y/90/1.5/-90/GRID/dods',\n",
    "                     chunks='auto').rename({'X':'longitude', 'Y':'latitude', 'S':'forecast_time', 'M':'realization', 'pr':'tp'}).assign_coords(lead_time=pd.Timedelta('14 d'))\n",
    "print(forecast.nbytes/1e9, 'GB')\n",
    "\n",
    "forecast = forecast * 3600 * 24 * 14 # convert from kg m-2 s-1 to to kg m-2 14day-1\n",
    "forecast['tp'].attrs.update({'units':'kg m-2', 'long_name': 'total precipitation'})  # omitting biweekly is units\n",
    "forecast.tp"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### create `forecast_like_observations` from observations with `time` dimension\n",
    "\n",
    "[climetlab_s2s_ai_challenge.extra.forecast_like_observations](https://github.com/ecmwf-lab/climetlab-s2s-ai-challenge/blob/47539214d78251e30b79bd4b0dbb4a6a085cc9bf/climetlab_s2s_ai_challenge/extra.py#L43)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "import climetlab as cml\n",
    "from climetlab_s2s_ai_challenge.extra import forecast_like_observations, create_valid_time_from_forecast_time_and_lead_time"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data.  This dataset has been dowloaded from IRIDL. By downloading this data you also agree to the terms and conditions defined at https://iridl.ldeo.columbia.edu.\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 1/1 [00:00<00:00, 147.08it/s]\n",
      "/opt/conda/lib/python3.8/site-packages/climetlab/readers/__init__.py:86: UserWarning: Error loading wrapper grib: /opt/conda/lib/python3.8/site-packages/gribapi/_bindings.cpython-38-x86_64-linux-gnu.so: undefined symbol: codes_bufr_key_is_header\n",
      "  warnings.warn(f\"Error loading wrapper {name}: {e}\")\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<pre>&lt;xarray.Dataset&gt;\n",
       "Dimensions:    (latitude: 121, longitude: 240, time: 8154)\n",
       "Coordinates:\n",
       "  * time       (time) datetime64[ns] 1999-01-01 1999-01-02 ... 2021-04-28\n",
       "  * latitude   (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0\n",
       "  * longitude  (longitude) float64 0.0 1.5 3.0 4.5 ... 354.0 355.5 357.0 358.5\n",
       "Data variables:\n",
       "    pr         (time, latitude, longitude) float32 ...\n",
       "Attributes:\n",
       "    source_dataset_name:  NOAA NCEP CPC UNIFIED_PRCP GAUGE_BASED GLOBAL v1p0 ...\n",
       "    source_hosting:       IRIDL\n",
       "    source_url:           http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/...\n",
       "    created_by_software:  climetlab-s2s-ai-challenge\n",
       "    created_by_script:    tools/observations/makefile</pre>"
      ],
      "text/plain": [
       "<xarray.Dataset>\n",
       "Dimensions:    (latitude: 121, longitude: 240, time: 8154)\n",
       "Coordinates:\n",
       "  * time       (time) datetime64[ns] 1999-01-01 1999-01-02 ... 2021-04-28\n",
       "  * latitude   (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0\n",
       "  * longitude  (longitude) float64 0.0 1.5 3.0 4.5 ... 354.0 355.5 357.0 358.5\n",
       "Data variables:\n",
       "    pr         (time, latitude, longitude) float32 ...\n",
       "Attributes:\n",
       "    source_dataset_name:  NOAA NCEP CPC UNIFIED_PRCP GAUGE_BASED GLOBAL v1p0 ...\n",
       "    source_hosting:       IRIDL\n",
       "    source_url:           http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/...\n",
       "    created_by_software:  climetlab-s2s-ai-challenge\n",
       "    created_by_script:    tools/observations/makefile"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# download precipitation_flux pr\n",
    "obs_ds = cml.load_dataset('s2s-ai-challenge-observations', parameter='pr').to_xarray()\n",
    "\n",
    "obs_ds"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "# less data heavy example: reduce forecast\n",
    "forecast = forecast.sel(forecast_time='2000', realization=[0, 1]).expand_dims('lead_time')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "# assign valid_time to forecasts\n",
    "forecast = forecast.assign_coords(valid_time=create_valid_time_from_forecast_time_and_lead_time(forecast.forecast_time, forecast.lead_time))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre>&lt;xarray.Dataset&gt;\n",
       "Dimensions:        (forecast_time: 51, latitude: 121, lead_time: 30, longitude: 240, realization: 2)\n",
       "Coordinates:\n",
       "  * forecast_time  (forecast_time) datetime64[ns] 2000-01-12 ... 2000-12-27\n",
       "  * realization    (realization) float32 0.0 1.0\n",
       "  * longitude      (longitude) float32 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5\n",
       "  * latitude       (latitude) float32 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0\n",
       "  * lead_time      (lead_time) timedelta64[ns] 13 days 14 days ... 42 days\n",
       "    valid_time     (lead_time, forecast_time) datetime64[ns] 2000-01-26 ... 2...\n",
       "Data variables:\n",
       "    *empty*\n",
       "Attributes:\n",
       "    Conventions:  IRIDL</pre>"
      ],
      "text/plain": [
       "<xarray.Dataset>\n",
       "Dimensions:        (forecast_time: 51, latitude: 121, lead_time: 30, longitude: 240, realization: 2)\n",
       "Coordinates:\n",
       "  * forecast_time  (forecast_time) datetime64[ns] 2000-01-12 ... 2000-12-27\n",
       "  * realization    (realization) float32 0.0 1.0\n",
       "  * longitude      (longitude) float32 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5\n",
       "  * latitude       (latitude) float32 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0\n",
       "  * lead_time      (lead_time) timedelta64[ns] 13 days 14 days ... 42 days\n",
       "    valid_time     (lead_time, forecast_time) datetime64[ns] 2000-01-26 ... 2...\n",
       "Data variables:\n",
       "    *empty*\n",
       "Attributes:\n",
       "    Conventions:  IRIDL"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# create an empty daily lead_time forecast dataset to input into forecast_like_observations, which requires continuous daily leads, first lead_time stays empty\n",
    "dummy_forecast_daily = forecast.coords.to_dataset().isel(lead_time=[0]*30).assign_coords(lead_time=[pd.Timedelta(f'{i} d') for i in range(13,13+30)])\n",
    "dummy_forecast_daily"
   ]
  },
  {
   "cell_type": "code",