{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Train ML model to correct predictions of week 3-4 & 5-6\n",
    "\n",
    "This notebook create a Machine Learning `ML_model` to predict weeks 3-4 & 5-6 based on `S2S` weeks 3-4 & 5-6 forecasts and is compared to `CPC` observations for the [`s2s-ai-challenge`](https://s2s-ai-challenge.github.io/)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Synopsis"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Method: `ML-based mean bias reduction`\n",
    "\n",
    "- calculate the ML-based bias from 2000-2019 deterministic ensemble mean forecast\n",
    "- remove that the ML-based bias from 2020 forecast deterministic ensemble mean forecast"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data used\n",
    "\n",
    "type: renku datasets\n",
    "\n",
    "Training-input for Machine Learning model:\n",
    "- hindcasts of models:\n",
    "    - ECMWF: `ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr`\n",
    "\n",
    "Forecast-input for Machine Learning model:\n",
    "- real-time 2020 forecasts of models:\n",
    "    - ECMWF: `ecmwf_forecast-input_2020_biweekly_deterministic.zarr`\n",
    "\n",
    "Compare Machine Learning model forecast against against ground truth:\n",
    "- `CPC` observations:\n",
    "    - `hindcast-like-observations_biweekly_deterministic.zarr`\n",
    "    - `forecast-like-observations_2020_biweekly_deterministic.zarr`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Resources used\n",
    "for training, details in reproducibility\n",
    "\n",
    "- platform: renku\n",
    "- memory: 8 GB\n",
    "- processors: 2 CPU\n",
    "- storage required: 10 GB"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Safeguards\n",
    "\n",
    "All points have to be [x] checked. If not, your submission is invalid.\n",
    "\n",
    "Changes to the code after submissions are not possible, as the `commit` before the `tag` will be reviewed.\n",
    "(Only in exceptions and if previous effort in reproducibility can be found, it may be allowed to improve readability and reproducibility after November 1st 2021.)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Safeguards to prevent [overfitting](https://en.wikipedia.org/wiki/Overfitting?wprov=sfti1) \n",
    "\n",
    "If the organizers suspect overfitting, your contribution can be disqualified.\n",
    "\n",
    "  - [x] We did not use 2020 observations in training (explicit overfitting and cheating)\n",
    "  - [x] We did not repeatedly verify my model on 2020 observations and incrementally improved my RPSS (implicit overfitting)\n",
    "  - [x] We provide RPSS scores for the training period with script `print_RPS_per_year`, see in section 6.3 `predict`.\n",
    "  - [x] We tried our best to prevent [data leakage](https://en.wikipedia.org/wiki/Leakage_(machine_learning)?wprov=sfti1).\n",
    "  - [x] We honor the `train-validate-test` [split principle](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets). This means that the hindcast data is split into `train` and `validate`, whereas `test` is withheld.\n",
    "  - [x] We did not use `test` explicitly in training or implicitly in incrementally adjusting parameters.\n",
    "  - [x] We considered [cross-validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics))."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Safeguards for Reproducibility\n",
    "Notebook/code must be independently reproducible from scratch by the organizers (after the competition), if not possible: no prize\n",
    "  - [x] All training data is publicly available (no pre-trained private neural networks, as they are not reproducible for us)\n",
    "  - [x] Code is well documented, readable and reproducible.\n",
    "  - [x] Code to reproduce training and predictions is preferred to run within a day on the described architecture. If the training takes longer than a day, please justify why this is needed. Please do not submit training piplelines, which take weeks to train."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Todos to improve template\n",
    "\n",
    "This is just a demo.\n",
    "\n",
    "- [ ] use multiple predictor variables and two predicted variables\n",
    "- [ ] for both `lead_time`s in one go\n",
    "- [ ] consider seasonality, for now all `forecast_time` months are mixed\n",
    "- [ ] make probabilistic predictions with `category` dim, for now works deterministic"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from tensorflow.keras.layers import Input, Dense, Flatten\n",
    "from tensorflow.keras.models import Sequential\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "import xarray as xr\n",
    "xr.set_options(display_style='text')\n",
    "import numpy as np\n",
    "\n",
    "from dask.utils import format_bytes\n",
    "import xskillscore as xs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Get training data\n",
    "\n",
    "preprocessing of input data may be done in separate notebook/script"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Hindcast\n",
    "\n",
    "get weekly initialized hindcasts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "v='t2m'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[33m\u001b[1mWarning: \u001b[0mRun CLI commands only from project's root directory.\n",
      "\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "# preprocessed as renku dataset\n",
    "!renku storage pull ../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "hind_2000_2019 = xr.open_zarr(\"../data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr\", consolidated=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[33m\u001b[1mWarning: \u001b[0mRun CLI commands only from project's root directory.\n",
      "\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "# preprocessed as renku dataset\n",
    "!renku storage pull ../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "fct_2020 = xr.open_zarr(\"../data/ecmwf_forecast-input_2020_biweekly_deterministic.zarr\", consolidated=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Observations\n",
    "corresponding to hindcasts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[33m\u001b[1mWarning: \u001b[0mRun CLI commands only from project's root directory.\n",
      "\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "# preprocessed as renku dataset\n",
    "!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "obs_2000_2019 = xr.open_zarr(\"../data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr\", consolidated=True)#[v]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[33m\u001b[1mWarning: \u001b[0mRun CLI commands only from project's root directory.\n",
      "\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "# preprocessed as renku dataset\n",
    "!renku storage pull ../data/forecast-like-observations_2020_biweekly_deterministic.zarr"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "obs_2020 = xr.open_zarr(\"../data/forecast-like-observations_2020_biweekly_deterministic.zarr\", consolidated=True)#[v]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# ML model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "based on [Weatherbench](https://github.com/pangeo-data/WeatherBench/blob/master/quickstart.ipynb)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "fatal: destination path 'WeatherBench' already exists and is not an empty directory.\n"
     ]
    }
   ],
   "source": [
    "# run once only and dont commit\n",
    "!git clone https://github.com/pangeo-data/WeatherBench/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys\n",
    "sys.path.insert(1, 'WeatherBench')\n",
    "from WeatherBench.src.train_nn import DataGenerator, PeriodicConv2D, create_predictions\n",
    "import tensorflow.keras as keras"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "bs=32\n",
    "\n",
    "import numpy as np\n",
    "class DataGenerator(keras.utils.Sequence):\n",
    "    def __init__(self, fct, verif, lead_time, batch_size=bs, shuffle=True, load=True,\n",
    "                 mean=None, std=None):\n",
    "        \"\"\"\n",
    "        Data generator for WeatherBench data.\n",
    "        Template from https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly\n",
    "\n",
    "        Args:\n",
    "            fct: forecasts from S2S models: xr.DataArray (xr.Dataset doesnt work properly)\n",
    "            verif: observations with same dimensionality (xr.Dataset doesnt work properly)\n",
    "            lead_time: Lead_time as in model\n",
    "            batch_size: Batch size\n",
    "            shuffle: bool. If True, data is shuffled.\n",
    "            load: bool. If True, datadet is loaded into RAM.\n",
    "            mean: If None, compute mean from data.\n",
    "            std: If None, compute standard deviation from data.\n",
    "            \n",
    "        Todo:\n",
    "        - use number in a better way, now uses only ensemble mean forecast\n",
    "        - dont use .sel(lead_time=lead_time) to train over all lead_time at once\n",
    "        - be sensitive with forecast_time, pool a few around the weekofyear given\n",
    "        - use more variables as predictors\n",
    "        - predict more variables\n",
    "        \"\"\"\n",
    "\n",
    "        if isinstance(fct, xr.Dataset):\n",
    "            print('convert fct to array')\n",
    "            fct = fct.to_array().transpose(...,'variable')\n",
    "            self.fct_dataset=True\n",
    "        else:\n",
    "            self.fct_dataset=False\n",
    "            \n",
    "        if isinstance(verif, xr.Dataset):\n",
    "            print('convert verif to array')\n",
    "            verif = verif.to_array().transpose(...,'variable')\n",
    "            self.verif_dataset=True\n",
    "        else:\n",
    "            self.verif_dataset=False\n",
    "        \n",
    "        #self.fct = fct\n",
    "        self.batch_size = batch_size\n",
    "        self.shuffle = shuffle\n",
    "        self.lead_time = lead_time\n",
    "\n",
    "        self.fct_data = fct.transpose('forecast_time', ...).sel(lead_time=lead_time)\n",
    "        self.fct_mean = self.fct_data.mean('forecast_time').compute() if mean is None else mean\n",
    "        self.fct_std = self.fct_data.std('forecast_time').compute() if std is None else std\n",
    "        \n",
    "        self.verif_data = verif.transpose('forecast_time', ...).sel(lead_time=lead_time)\n",
    "        self.verif_mean = self.verif_data.mean('forecast_time').compute() if mean is None else mean\n",
    "        self.verif_std = self.verif_data.std('forecast_time').compute() if std is None else std\n",
    "\n",
    "        # Normalize\n",
    "        self.fct_data = (self.fct_data - self.fct_mean) / self.fct_std\n",
    "        self.verif_data = (self.verif_data - self.verif_mean) / self.verif_std\n",
    "        \n",
    "        self.n_samples = self.fct_data.forecast_time.size\n",
    "        self.forecast_time = self.fct_data.forecast_time\n",
    "\n",
    "        self.on_epoch_end()\n",
    "\n",
    "        # For some weird reason calling .load() earlier messes up the mean and std computations\n",
    "        if load:\n",
    "            # print('Loading data into RAM')\n",
    "            self.fct_data.load()\n",
    "\n",
    "    def __len__(self):\n",
    "        'Denotes the number of batches per epoch'\n",
    "        return int(np.ceil(self.n_samples / self.batch_size))\n",
    "\n",
    "    def __getitem__(self, i):\n",
    "        'Generate one batch of data'\n",
    "        idxs = self.idxs[i * self.batch_size:(i + 1) * self.batch_size]\n",
    "        # got all nan if nans not masked\n",
    "        X = self.fct_data.isel(forecast_time=idxs).fillna(0.).values\n",
    "        y = self.verif_data.isel(forecast_time=idxs).fillna(0.).values\n",
    "        return X, y\n",
    "\n",
    "    def on_epoch_end(self):\n",
    "        'Updates indexes after each epoch'\n",
    "        self.idxs = np.arange(self.n_samples)\n",
    "        if self.shuffle == True:\n",
    "            np.random.shuffle(self.idxs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre>&lt;xarray.DataArray &#x27;lead_time&#x27; ()&gt;\n",
       "array(1209600000000000, dtype=&#x27;timedelta64[ns]&#x27;)\n",
       "Coordinates:\n",
       "    lead_time  timedelta64[ns] 14 days\n",
       "Attributes:\n",
       "    aggregate:      The pd.Timedelta corresponds to the first day of a biweek...\n",
       "    description:    Forecast period is the time interval between the forecast...\n",
       "    long_name:      lead time\n",
       "    standard_name:  forecast_period\n",
       "    week34_t2m:     mean[14 days, 27 days]\n",
       "    week34_tp:      28 days minus 14 days\n",
       "    week56_t2m:     mean[28 days, 41 days]\n",
       "    week56_tp:      42 days minus 28 days</pre>"
      ],
      "text/plain": [
       "<xarray.DataArray 'lead_time' ()>\n",
       "array(1209600000000000, dtype='timedelta64[ns]')\n",
       "Coordinates:\n",
       "    lead_time  timedelta64[ns] 14 days\n",
       "Attributes:\n",
       "    aggregate:      The pd.Timedelta corresponds to the first day of a biweek...\n",
       "    description:    Forecast period is the time interval between the forecast...\n",
       "    long_name:      lead time\n",
       "    standard_name:  forecast_period\n",
       "    week34_t2m:     mean[14 days, 27 days]\n",
       "    week34_tp:      28 days minus 14 days\n",
       "    week56_t2m:     mean[28 days, 41 days]\n",
       "    week56_tp:      42 days minus 28 days"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 2 bi-weekly `lead_time`: week 3-4\n",
    "lead = hind_2000_2019.isel(lead_time=0).lead_time\n",
    "\n",
    "lead"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "# mask, needed?\n",
    "hind_2000_2019 = hind_2000_2019.where(obs_2000_2019.isel(forecast_time=0, lead_time=0,drop=True).notnull())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## data prep: train, valid, test\n",
    "\n",
    "[Use the hindcast period to split train and valid.](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets) Do not use the 2020 data for testing!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "# time is the forecast_time\n",
    "time_train_start,time_train_end='2000','2017' # train\n",
    "time_valid_start,time_valid_end='2018','2019' # valid\n",
    "time_test = '2020'                            # test"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide\n",
      "  x = np.divide(x1, x2, out)\n",
      "/opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide\n",
      "  x = np.divide(x1, x2, out)\n",
      "/opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide\n",
      "  x = np.divide(x1, x2, out)\n",
      "/opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide\n",
      "  x = np.divide(x1, x2, out)\n",
      "/opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide\n",
      "  x = np.divide(x1, x2, out)\n",
      "/opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide\n",
      "  x = np.divide(x1, x2, out)\n"
     ]
    }
   ],
   "source": [
    "dg_train = DataGenerator(\n",
    "    hind_2000_2019.mean('realization').sel(forecast_time=slice(time_train_start,time_train_end))[v],\n",
    "    obs_2000_2019.sel(forecast_time=slice(time_train_start,time_train_end))[v],\n",
    "    lead_time=lead, batch_size=bs, load=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide\n",
      "  x = np.divide(x1, x2, out)\n",
      "/opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide\n",
      "  x = np.divide(x1, x2, out)\n",
      "/opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide\n",
      "  x = np.divide(x1, x2, out)\n",
      "/opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide\n",
      "  x = np.divide(x1, x2, out)\n",
      "/opt/conda/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide\n",
      "  x = np.divide(x1, x2, out)\n"
     ]
    }
   ],
   "source": [
    "dg_valid = DataGenerator(\n",
    "    hind_2000_2019.mean('realization').sel(forecast_time=slice(time_valid_start,time_valid_end))[v],\n",
    "    obs_2000_2019.sel(forecast_time=slice(time_valid_start,time_valid_end))[v],\n",
    "    lead_time=lead, batch_size=bs, shuffle=False, load=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "# do not use, delete?\n",
    "dg_test = DataGenerator(\n",
    "    fct_2020.mean('realization').sel(forecast_time=time_test)[v],\n",
    "    obs_2020.sel(forecast_time=time_test)[v],\n",
    "    lead_time=lead, batch_size=bs, load=True, mean=dg_train.fct_mean, std=dg_train.fct_std, shuffle=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "((32, 121, 240), (32, 121, 240))"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X, y = dg_valid[0]\n",
    "X.shape, y.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "# short look into training data: large biases\n",
    "# any problem from normalizing?\n",
    "# i=4\n",
    "# xr.DataArray(np.vstack([X[i],y[i]])).plot(yincrease=False, robust=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## `fit`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:AutoGraph could not transform <bound method PeriodicPadding2D.call of <WeatherBench.src.train_nn.PeriodicPadding2D object at 0x7f86042986a0>> and will run it as-is.\n",
      "Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.\n",
      "Cause: module 'gast' has no attribute 'Index'\n",
      "To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert\n",
      "WARNING: AutoGraph could not transform <bound method PeriodicPadding2D.call of <WeatherBench.src.train_nn.PeriodicPadding2D object at 0x7f86042986a0>> and will run it as-is.\n",
      "Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.\n",
      "Cause: module 'gast' has no attribute 'Index'\n",
      "To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert\n"
     ]
    }
   ],
   "source": [
    "cnn = keras.models.Sequential([\n",
    "    PeriodicConv2D(filters=32, kernel_size=5, conv_kwargs={'activation':'relu'}, input_shape=(32, 64, 1)),\n",
    "    PeriodicConv2D(filters=1, kernel_size=5)\n",
    "])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Model: \"sequential\"\n",
      "_________________________________________________________________\n",
      "Layer (type)                 Output Shape              Param #   \n",
      "=================================================================\n",
      "periodic_conv2d (PeriodicCon (None, 32, 64, 32)        832       \n",
      "_________________________________________________________________\n",
      "periodic_conv2d_1 (PeriodicC (None, 32, 64, 1)         801       \n",
      "=================================================================\n",
      "Total params: 1,633\n",
      "Trainable params: 1,633\n",
      "Non-trainable params: 0\n",
      "_________________________________________________________________\n"
     ]
    }
   ],
   "source": [
    "cnn.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
    "cnn.compile(keras.optimizers.Adam(1e-4), 'mse')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "import warnings\n",
    "warnings.simplefilter(\"ignore\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 1/2\n",
      "30/30 [==============================] - 58s 2s/step - loss: 0.1472 - val_loss: 0.0742\n",
      "Epoch 2/2\n",
      "30/30 [==============================] - 45s 1s/step - loss: 0.0712 - val_loss: 0.0545\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<tensorflow.python.keras.callbacks.History at 0x7f865c2103d0>"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cnn.fit(dg_train, epochs=2, validation_data=dg_valid)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## `predict`\n",
    "\n",
    "Create predictions and print `mean(variable, lead_time, longitude, weighted latitude)` RPSS for all years as calculated by `skill_by_year`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "from scripts import add_valid_time_from_forecast_reference_time_and_lead_time\n",
    "\n",
    "def _create_predictions(model, dg, lead):\n",
    "    \"\"\"Create non-iterative predictions\"\"\"\n",
    "    preds = model.predict(dg).squeeze()\n",
    "    # Unnormalize\n",
    "    preds = preds * dg.fct_std.values + dg.fct_mean.values\n",
    "    if dg.verif_dataset:\n",
    "        da = xr.DataArray(\n",
    "                    preds,\n",
    "                    dims=['forecast_time', 'latitude', 'longitude','variable'],\n",
    "                    coords={'forecast_time': dg.fct_data.forecast_time, 'latitude': dg.fct_data.latitude,\n",
    "                            'longitude': dg.fct_data.longitude},\n",
    "                ).to_dataset() # doesnt work yet\n",
    "    else:\n",
    "        da = xr.DataArray(\n",
    "                    preds,\n",
    "                    dims=['forecast_time', 'latitude', 'longitude'],\n",
    "                    coords={'forecast_time': dg.fct_data.forecast_time, 'latitude': dg.fct_data.latitude,\n",
    "                            'longitude': dg.fct_data.longitude},\n",
    "                )\n",
    "    da = da.assign_coords(lead_time=lead)\n",
    "    # da = add_valid_time_from_forecast_reference_time_and_lead_time(da)\n",
    "    return da"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "# optionally masking the ocean when making probabilistic\n",
    "mask = obs_2020.std(['lead_time','forecast_time']).notnull()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [],
   "source": [
    "from scripts import make_probabilistic"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[33m\u001b[1mWarning: \u001b[0mRun CLI commands only from project's root directory.\n",
      "\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [],
   "source": [
    "cache_path='../data'\n",
    "tercile_file = f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc'\n",
    "tercile_edges = xr.open_dataset(tercile_file)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [],
   "source": [
    "# this is not useful but results have expected dimensions\n",
    "# actually train for each lead_time\n",
    "\n",
    "def create_predictions(cnn, fct, obs, time):\n",
    "    preds_test=[]\n",
    "    for lead in fct.lead_time:\n",
    "        dg = DataGenerator(fct.mean('realization').sel(forecast_time=time)[v],\n",
    "                           obs.sel(forecast_time=time)[v],\n",
    "                           lead_time=lead, batch_size=bs, mean=dg_train.fct_mean, std=dg_train.fct_std, shuffle=False)\n",
    "        preds_test.append(_create_predictions(cnn, dg, lead))\n",
    "    preds_test = xr.concat(preds_test, 'lead_time')\n",
    "    preds_test['lead_time'] = fct.lead_time\n",
    "    # add valid_time coord\n",
    "    preds_test = add_valid_time_from_forecast_reference_time_and_lead_time(preds_test)\n",
    "    preds_test = preds_test.to_dataset(name=v)\n",
    "    # add fake var\n",
    "    preds_test['tp'] = preds_test['t2m']\n",
    "    # make probabilistic\n",
    "    preds_test = make_probabilistic(preds_test.expand_dims('realization'), tercile_edges, mask=mask)\n",
    "    return preds_test"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### `predict` training period in-sample"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[33m\u001b[1mWarning: \u001b[0mRun CLI commands only from project's root directory.\n",
      "\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "!renku storage pull ../data/forecast-like-observations_2020_biweekly_terciled.nc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[33m\u001b[1mWarning: \u001b[0mRun CLI commands only from project's root directory.\n",
      "\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "!renku storage pull ../data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>RPSS</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>year</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2000</th>\n",
       "      <td>-0.862483</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2001</th>\n",
       "      <td>-1.015485</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2002</th>\n",
       "      <td>-1.101022</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2003</th>\n",
       "      <td>-1.032647</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2004</th>\n",
       "      <td>-1.056348</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2005</th>\n",
       "      <td>-1.165675</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2006</th>\n",
       "      <td>-1.057217</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2007</th>\n",
       "      <td>-1.170849</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2008</th>\n",
       "      <td>-1.049785</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2009</th>\n",
       "      <td>-1.169108</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2010</th>\n",
       "      <td>-1.130845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2011</th>\n",
       "      <td>-1.052670</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012</th>\n",
       "      <td>-1.126449</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013</th>\n",
       "      <td>-1.126930</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2014</th>\n",
       "      <td>-1.095896</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2015</th>\n",
       "      <td>-1.117486</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          RPSS\n",
       "year          \n",
       "2000 -0.862483\n",
       "2001 -1.015485\n",
       "2002 -1.101022\n",
       "2003 -1.032647\n",
       "2004 -1.056348\n",
       "2005 -1.165675\n",
       "2006 -1.057217\n",
       "2007 -1.170849\n",
       "2008 -1.049785\n",
       "2009 -1.169108\n",
       "2010 -1.130845\n",
       "2011 -1.052670\n",
       "2012 -1.126449\n",
       "2013 -1.126930\n",
       "2014 -1.095896\n",
       "2015 -1.117486"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from scripts import skill_by_year\n",
    "import os\n",
    "if os.environ['HOME'] == '/home/jovyan':\n",
    "    import pandas as pd\n",
    "    # assume on renku with small memory\n",
    "    step = 2\n",
    "    skill_list = []\n",
    "    for year in np.arange(int(time_train_start), int(time_train_end) -1, step): # loop over years to consume less memory on renku\n",
    "        preds_is = create_predictions(cnn, hind_2000_2019, obs_2000_2019, time=slice(str(year), str(year+step-1))).compute()\n",
    "        skill_list.append(skill_by_year(preds_is))\n",
    "    skill = pd.concat(skill_list)\n",
    "else: # with larger memory, simply do\n",
    "    preds_is = create_predictions(cnn, hind_2000_2019, obs_2000_2019, time=slice(time_train_start, time_train_end))\n",
    "    skill = skill_by_year(preds_is)\n",
    "skill"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### `predict` validation period out-of-sample"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>RPSS</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>year</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2018</th>\n",
       "      <td>-1.099744</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019</th>\n",
       "      <td>-1.172401</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          RPSS\n",
       "year          \n",
       "2018 -1.099744\n",
       "2019 -1.172401"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "preds_os = create_predictions(cnn, hind_2000_2019, obs_2000_2019, time=slice(time_valid_start, time_valid_end))\n",
    "\n",
    "skill_by_year(preds_os)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### `predict` test"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>RPSS</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>year</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2020</th>\n",
       "      <td>-1.076834</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          RPSS\n",
       "year          \n",
       "2020 -1.076834"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "preds_test = create_predictions(cnn, fct_2020, obs_2020, time=time_test)\n",
    "\n",
    "skill_by_year(preds_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Submission"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [],
   "source": [
    "from scripts import assert_predictions_2020\n",
    "assert_predictions_2020(preds_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "preds_test.to_netcdf('../submissions/ML_prediction_2020.nc')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# !git add ../submissions/ML_prediction_2020.nc\n",
    "# !git add ML_train_and_prediction.ipynb"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# !git commit -m \"template_test commit message\" # whatever message you want"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# !git tag \"submission-template_test-0.0.1\" # if this is to be checked by scorer, only the last submitted==tagged version will be considered"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# !git push --tags"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Reproducibility"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## memory"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "              total        used        free      shared  buff/cache   available\n",
      "Mem:             31           7          11           0          12          24\n",
      "Swap:             0           0           0\n"
     ]
    }
   ],
   "source": [
    "# https://phoenixnap.com/kb/linux-commands-check-memory-usage\n",
    "!free -g"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## CPU"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Architecture:                    x86_64\n",
      "CPU op-mode(s):                  32-bit, 64-bit\n",
      "Byte Order:                      Little Endian\n",
      "Address sizes:                   40 bits physical, 48 bits virtual\n",
      "CPU(s):                          8\n",
      "On-line CPU(s) list:             0-7\n",
      "Thread(s) per core:              1\n",
      "Core(s) per socket:              1\n",
      "Socket(s):                       8\n",
      "NUMA node(s):                    1\n",
      "Vendor ID:                       GenuineIntel\n",
      "CPU family:                      6\n",
      "Model:                           85\n",
      "Model name:                      Intel Xeon Processor (Skylake, IBRS)\n",
      "Stepping:                        4\n",
      "CPU MHz:                         2095.078\n",
      "BogoMIPS:                        4190.15\n",
      "Virtualization:                  VT-x\n",
      "Hypervisor vendor:               KVM\n",
      "Virtualization type:             full\n",
      "L1d cache:                       256 KiB\n",
      "L1i cache:                       256 KiB\n",
      "L2 cache:                        32 MiB\n",
      "L3 cache:                        128 MiB\n",
      "NUMA node0 CPU(s):               0-7\n",
      "Vulnerability Itlb multihit:     KVM: Mitigation: Split huge pages\n",
      "Vulnerability L1tf:              Mitigation; PTE Inversion; VMX conditional cach\n",
      "                                 e flushes, SMT disabled\n",
      "Vulnerability Mds:               Vulnerable: Clear CPU buffers attempted, no mic\n",
      "                                 rocode; SMT Host state unknown\n",
      "Vulnerability Meltdown:          Mitigation; PTI\n",
      "Vulnerability Spec store bypass: Vulnerable\n",
      "Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user\n",
      "                                  pointer sanitization\n",
      "Vulnerability Spectre v2:        Mitigation; Full generic retpoline, IBPB condit\n",
      "                                 ional, IBRS_FW, STIBP disabled, RSB filling\n",
      "Vulnerability Srbds:             Not affected\n",
      "Vulnerability Tsx async abort:   Not affected\n",
      "Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtr\n",
      "                                 r pge mca cmov pat pse36 clflush mmx fxsr sse s\n",
      "                                 se2 syscall nx pdpe1gb rdtscp lm constant_tsc r\n",
      "                                 ep_good nopl xtopology cpuid tsc_known_freq pni\n",
      "                                  pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_\n",
      "                                 2 x2apic movbe popcnt tsc_deadline_timer aes xs\n",
      "                                 ave avx f16c rdrand hypervisor lahf_lm abm 3dno\n",
      "                                 wprefetch cpuid_fault invpcid_single pti ibrs i\n",
      "                                 bpb tpr_shadow vnmi flexpriority ept vpid ept_a\n",
      "                                 d fsgsbase bmi1 avx2 smep bmi2 erms invpcid avx\n",
      "                                 512f avx512dq rdseed adx smap clwb avx512cd avx\n",
      "                                 512bw avx512vl xsaveopt xsavec xgetbv1 arat pku\n",
      "                                  ospke\n"
     ]
    }
   ],
   "source": [
    "!lscpu"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## software"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "# packages in environment at /opt/conda:\n",
      "#\n",
      "# Name                    Version                   Build  Channel\n",
      "_libgcc_mutex             0.1                 conda_forge    conda-forge\n",
      "_openmp_mutex             4.5                       1_gnu    conda-forge\n",
      "_pytorch_select           0.1                       cpu_0    defaults\n",
      "_tflow_select             2.3.0                       mkl    defaults\n",
      "absl-py                   0.13.0           py38h06a4308_0    defaults\n",
      "aiobotocore               1.4.1              pyhd3eb1b0_0    defaults\n",
      "aiohttp                   3.7.4.post0      py38h7f8727e_2    defaults\n",
      "aioitertools              0.7.1              pyhd3eb1b0_0    defaults\n",
      "alembic                   1.4.3              pyh9f0ad1d_0    conda-forge\n",
      "ansiwrap                  0.8.4                    pypi_0    pypi\n",
      "appdirs                   1.4.4                    pypi_0    pypi\n",
      "argcomplete               1.12.3                   pypi_0    pypi\n",
      "argon2-cffi               20.1.0           py38h497a2fe_2    conda-forge\n",
      "argparse                  1.4.0                    pypi_0    pypi\n",
      "asciitree                 0.3.3                      py_2    defaults\n",
      "astor                     0.8.1            py38h06a4308_0    defaults\n",
      "astunparse                1.6.3                      py_0    defaults\n",
      "async-timeout             3.0.1                    pypi_0    pypi\n",
      "async_generator           1.10                       py_0    conda-forge\n",
      "attrs                     21.2.0                   pypi_0    pypi\n",
      "backcall                  0.2.0              pyh9f0ad1d_0    conda-forge\n",
      "backports                 1.0                        py_2    conda-forge\n",
      "backports.functools_lru_cache 1.6.1                      py_0    conda-forge\n",
      "bagit                     1.8.1                    pypi_0    pypi\n",
      "beautifulsoup4            4.10.0             pyh06a4308_0    defaults\n",
      "binutils_impl_linux-64    2.35.1               h193b22a_1    conda-forge\n",
      "binutils_linux-64         2.35                h67ddf6f_30    conda-forge\n",
      "black                     20.8b1                   pypi_0    pypi\n",
      "blas                      1.0                         mkl    defaults\n",
      "bleach                    3.2.1              pyh9f0ad1d_0    conda-forge\n",
      "blinker                   1.4                        py_1    conda-forge\n",
      "bokeh                     2.3.3            py38h06a4308_0    defaults\n",
      "botocore                  1.20.106           pyhd3eb1b0_0    defaults\n",
      "bottleneck                1.3.2            py38heb32a55_1    defaults\n",
      "bracex                    2.1.1                    pypi_0    pypi\n",
      "branca                    0.3.1                    pypi_0    pypi\n",
      "brotli                    1.0.9                he6710b0_2    defaults\n",
      "brotlipy                  0.7.0           py38h497a2fe_1001    conda-forge\n",
      "bzip2                     1.0.8                h7f98852_4    conda-forge\n",
      "c-ares                    1.17.1               h36c2ea0_0    conda-forge\n",
      "ca-certificates           2021.7.5             h06a4308_1    defaults\n",
      "cachecontrol              0.12.6                   pypi_0    pypi\n",
      "cachetools                4.2.4                    pypi_0    pypi\n",
      "calamus                   0.3.12                   pypi_0    pypi\n",
      "cdsapi                    0.5.1                    pypi_0    pypi\n",
      "certifi                   2021.5.30                pypi_0    pypi\n",
      "certipy                   0.1.3                      py_0    conda-forge\n",
      "cffi                      1.14.6                   pypi_0    pypi\n",
      "cfgrib                    0.9.9.0            pyhd8ed1ab_1    conda-forge\n",
      "cftime                    1.5.0            py38h6323ea4_0    defaults\n",
      "chardet                   3.0.4                    pypi_0    pypi\n",
      "click                     7.1.2                    pypi_0    pypi\n",
      "click-completion          0.5.2                    pypi_0    pypi\n",
      "click-option-group        0.5.3                    pypi_0    pypi\n",
      "click-plugins             1.1.1                    pypi_0    pypi\n",
      "climetlab                 0.8.31                   pypi_0    pypi\n",
      "climetlab-s2s-ai-challenge 0.8.0                    pypi_0    pypi\n",
      "cloudpickle               2.0.0              pyhd3eb1b0_0    defaults\n",
      "colorama                  0.4.4                    pypi_0    pypi\n",
      "coloredlogs               15.0.1                   pypi_0    pypi\n",
      "commonmark                0.9.1                    pypi_0    pypi\n",
      "conda                     4.9.2            py38h578d9bd_0    conda-forge\n",
      "conda-package-handling    1.7.2            py38h8df0ef7_0    conda-forge\n",
      "configargparse            1.5.2                    pypi_0    pypi\n",
      "configurable-http-proxy   1.3.0                         0    conda-forge\n",
      "coverage                  5.5              py38h27cfd23_2    defaults\n",
      "cryptography              3.4.8                    pypi_0    pypi\n",
      "curl                      7.71.1               he644dc0_8    conda-forge\n",
      "cwlgen                    0.4.2                    pypi_0    pypi\n",
      "cwltool                   3.1.20211004060744          pypi_0    pypi\n",
      "cycler                    0.10.0                   py38_0    defaults\n",
      "cython                    0.29.24          py38h295c915_0    defaults\n",
      "cytoolz                   0.11.0           py38h7b6447c_0    defaults\n",
      "dask                      2021.8.1           pyhd3eb1b0_0    defaults\n",
      "dask-core                 2021.8.1           pyhd3eb1b0_0    defaults\n",
      "dataclasses               0.8                pyh6d0b6a4_7    defaults\n",
      "decorator                 4.4.2                      py_0    conda-forge\n",
      "defusedxml                0.6.0                      py_0    conda-forge\n",
      "distributed               2021.8.1         py38h06a4308_0    defaults\n",
      "distro                    1.5.0                    pypi_0    pypi\n",
      "docopt                    0.6.2            py38h06a4308_0    defaults\n",
      "eccodes                   2.21.0               ha0e6eb6_0    conda-forge\n",
      "ecmwf-api-client          1.6.1                    pypi_0    pypi\n",
      "ecmwflibs                 0.3.14                   pypi_0    pypi\n",
      "entrypoints               0.3             pyhd8ed1ab_1003    conda-forge\n",
      "environ-config            21.2.0                   pypi_0    pypi\n",
      "fasteners                 0.16.3             pyhd3eb1b0_0    defaults\n",
      "filelock                  3.0.12                   pypi_0    pypi\n",
      "findlibs                  0.0.2                    pypi_0    pypi\n",
      "fonttools                 4.25.0             pyhd3eb1b0_0    defaults\n",
      "freetype                  2.10.4               h5ab3b9f_0    defaults\n",
      "frozendict                2.0.6                    pypi_0    pypi\n",
      "fsspec                    2021.7.0           pyhd3eb1b0_0    defaults\n",
      "gast                      0.4.0              pyhd3eb1b0_0    defaults\n",
      "gcc_impl_linux-64         9.3.0               h70c0ae5_18    conda-forge\n",
      "gcc_linux-64              9.3.0               hf25ea35_30    conda-forge\n",
      "gitdb                     4.0.7                    pypi_0    pypi\n",
      "gitpython                 3.1.14                   pypi_0    pypi\n",
      "google-auth               1.33.0             pyhd3eb1b0_0    defaults\n",
      "google-auth-oauthlib      0.4.4              pyhd3eb1b0_0    defaults\n",
      "google-pasta              0.2.0              pyhd3eb1b0_0    defaults\n",
      "grpcio                    1.36.1           py38h2157cd5_1    defaults\n",
      "gxx_impl_linux-64         9.3.0               hd87eabc_18    conda-forge\n",
      "gxx_linux-64              9.3.0               h3fbe746_30    conda-forge\n",
      "h5netcdf                  0.11.0             pyhd8ed1ab_0    conda-forge\n",
      "h5py                      2.10.0           py38hd6299e0_1    defaults\n",
      "hdf4                      4.2.13               h3ca952b_2    defaults\n",
      "hdf5                      1.10.6          nompi_h6a2412b_1114    conda-forge\n",
      "heapdict                  1.0.1              pyhd3eb1b0_0    defaults\n",
      "humanfriendly             10.0                     pypi_0    pypi\n",
      "humanize                  3.7.1                    pypi_0    pypi\n",
      "icu                       68.1                 h58526e2_0    conda-forge\n",
      "idna                      2.10               pyh9f0ad1d_0    conda-forge\n",
      "importlib-metadata        3.4.0            py38h578d9bd_0    conda-forge\n",
      "importlib_metadata        3.4.0                hd8ed1ab_0    conda-forge\n",
      "intake                    0.6.3              pyhd3eb1b0_0    defaults\n",
      "intake-xarray             0.5.0              pyhd3eb1b0_0    defaults\n",
      "intel-openmp              2019.4                      243    defaults\n",
      "ipykernel                 5.4.2            py38h81c977d_0    conda-forge\n",
      "ipython                   7.19.0           py38h81c977d_2    conda-forge\n",
      "ipython_genutils          0.2.0                      py_1    conda-forge\n",
      "isodate                   0.6.0                    pypi_0    pypi\n",
      "jasper                    1.900.1              hd497a04_4    defaults\n",
      "jedi                      0.17.2           py38h578d9bd_1    conda-forge\n",
      "jellyfish                 0.8.8                    pypi_0    pypi\n",
      "jinja2                    3.0.1                    pypi_0    pypi\n",
      "jmespath                  0.10.0             pyhd3eb1b0_0    defaults\n",
      "joblib                    1.0.1              pyhd3eb1b0_0    defaults\n",
      "jpeg                      9d                   h7f8727e_0    defaults\n",
      "json5                     0.9.5              pyh9f0ad1d_0    conda-forge\n",
      "jsonschema                3.2.0                      py_2    conda-forge\n",
      "jupyter-server-proxy      1.6.0                    pypi_0    pypi\n",
      "jupyter_client            6.1.11             pyhd8ed1ab_1    conda-forge\n",
      "jupyter_core              4.7.0            py38h578d9bd_0    conda-forge\n",
      "jupyter_telemetry         0.1.0              pyhd8ed1ab_1    conda-forge\n",
      "jupyterhub                1.2.2                    pypi_0    pypi\n",
      "jupyterlab                2.2.9                      py_0    conda-forge\n",
      "jupyterlab-git            0.23.3                   pypi_0    pypi\n",
      "jupyterlab_pygments       0.1.2              pyh9f0ad1d_0    conda-forge\n",
      "jupyterlab_server         1.2.0                      py_0    conda-forge\n",
      "keras-preprocessing       1.1.2              pyhd3eb1b0_0    defaults\n",
      "kernel-headers_linux-64   2.6.32              h77966d4_13    conda-forge\n",
      "kiwisolver                1.3.1            py38h2531618_0    defaults\n",
      "krb5                      1.17.2               h926e7f8_0    conda-forge\n",
      "lazy-object-proxy         1.6.0                    pypi_0    pypi\n",
      "lcms2                     2.12                 h3be6417_0    defaults\n",
      "ld_impl_linux-64          2.35.1               hea4e1c9_1    conda-forge\n",
      "libaec                    1.0.4                he6710b0_1    defaults\n",
      "libblas                   3.9.0           1_h86c2bf4_netlib    conda-forge\n",
      "libcblas                  3.9.0           5_h92ddd45_netlib    conda-forge\n",
      "libcurl                   7.71.1               hcdd3856_8    conda-forge\n",
      "libedit                   3.1.20191231         he28a2e2_2    conda-forge\n",
      "libev                     4.33                 h516909a_1    conda-forge\n",
      "libffi                    3.3                  h58526e2_2    conda-forge\n",
      "libgcc-devel_linux-64     9.3.0               h7864c58_18    conda-forge\n",
      "libgcc-ng                 9.3.0               h2828fa1_18    conda-forge\n",
      "libgfortran-ng            9.3.0               ha5ec8a7_17    defaults\n",
      "libgfortran5              9.3.0               ha5ec8a7_17    defaults\n",
      "libgomp                   9.3.0               h2828fa1_18    conda-forge\n",
      "liblapack                 3.9.0           5_h92ddd45_netlib    conda-forge\n",
      "libllvm10                 10.0.1               hbcb73fb_5    defaults\n",
      "libmklml                  2019.0.5                      0    defaults\n",
      "libnetcdf                 4.7.4           nompi_h56d31a8_107    conda-forge\n",
      "libnghttp2                1.41.0               h8cfc5f6_2    conda-forge\n",
      "libpng                    1.6.37               hbc83047_0    defaults\n",
      "libprotobuf               3.17.2               h4ff587b_1    defaults\n",
      "libsodium                 1.0.18               h36c2ea0_1    conda-forge\n",
      "libssh2                   1.9.0                hab1572f_5    conda-forge\n",
      "libstdcxx-devel_linux-64  9.3.0               hb016644_18    conda-forge\n",
      "libstdcxx-ng              9.3.0               h6de172a_18    conda-forge\n",
      "libtiff                   4.2.0                h85742a9_0    defaults\n",
      "libuv                     1.40.0               h7f98852_0    conda-forge\n",
      "libwebp-base              1.2.0                h27cfd23_0    defaults\n",
      "llvmlite                  0.36.0           py38h612dafd_4    defaults\n",
      "locket                    0.2.1            py38h06a4308_1    defaults\n",
      "lockfile                  0.12.2                   pypi_0    pypi\n",
      "lxml                      4.6.3                    pypi_0    pypi\n",
      "lz4-c                     1.9.3                h295c915_1    defaults\n",
      "magics                    1.5.6                    pypi_0    pypi\n",
      "mako                      1.1.4              pyh44b312d_0    conda-forge\n",
      "markdown                  3.3.4            py38h06a4308_0    defaults\n",
      "markupsafe                2.0.1                    pypi_0    pypi\n",
      "marshmallow               3.13.0                   pypi_0    pypi\n",
      "matplotlib-base           3.4.2            py38hab158f2_0    defaults\n",
      "mistune                   0.8.4           py38h497a2fe_1003    conda-forge\n",
      "mkl                       2020.2                      256    defaults\n",
      "mkl-service               2.3.0            py38he904b0f_0    defaults\n",
      "mkl_fft                   1.3.0            py38h54f3939_0    defaults\n",
      "mkl_random                1.1.1            py38h0573a6f_0    defaults\n",
      "msgpack-python            1.0.2            py38hff7bd54_1    defaults\n",
      "multidict                 5.1.0            py38h27cfd23_2    defaults\n",
      "munkres                   1.1.4                      py_0    defaults\n",
      "mypy-extensions           0.4.3                    pypi_0    pypi\n",
      "nbclient                  0.5.0                    pypi_0    pypi\n",
      "nbconvert                 6.0.7            py38h578d9bd_3    conda-forge\n",
      "nbdime                    2.1.0                    pypi_0    pypi\n",
      "nbformat                  5.1.2              pyhd8ed1ab_1    conda-forge\n",
      "nbresuse                  0.4.0                    pypi_0    pypi\n",
      "nc-time-axis              1.3.1              pyhd8ed1ab_2    conda-forge\n",
      "ncurses                   6.2                  h58526e2_4    conda-forge\n",
      "ndg-httpsclient           0.5.1                    pypi_0    pypi\n",
      "nest-asyncio              1.4.3              pyhd8ed1ab_0    conda-forge\n",
      "netcdf4                   1.5.4                    pypi_0    pypi\n",
      "networkx                  2.6.3                    pypi_0    pypi\n",
      "ninja                     1.10.2               hff7bd54_1    defaults\n",
      "nodejs                    15.3.0               h25f6087_0    conda-forge\n",
      "notebook                  6.2.0            py38h578d9bd_0    conda-forge\n",
      "numba                     0.53.1           py38ha9443f7_0    defaults\n",
      "numcodecs                 0.8.0            py38h2531618_0    defaults\n",
      "numexpr                   2.7.3            py38hb2eb853_0    defaults\n",
      "numpy                     1.19.2           py38h54aff64_0    defaults\n",
      "numpy-base                1.19.2           py38hfa32c7d_0    defaults\n",
      "oauthlib                  3.0.1                      py_0    conda-forge\n",
      "olefile                   0.46               pyhd3eb1b0_0    defaults\n",
      "openjpeg                  2.4.0                h3ad879b_0    defaults\n",
      "openssl                   1.1.1l               h7f8727e_0    defaults\n",
      "opt_einsum                3.3.0              pyhd3eb1b0_1    defaults\n",
      "owlrl                     5.2.3                    pypi_0    pypi\n",
      "packaging                 20.8               pyhd3deb0d_0    conda-forge\n",
      "pamela                    1.0.0                      py_0    conda-forge\n",
      "pandas                    1.3.2            py38h8c16a72_0    defaults\n",
      "pandoc                    2.11.3.2             h7f98852_0    conda-forge\n",
      "pandocfilters             1.4.2                      py_1    conda-forge\n",
      "papermill                 2.3.1                    pypi_0    pypi\n",
      "parso                     0.7.1              pyh9f0ad1d_0    conda-forge\n",
      "partd                     1.2.0              pyhd3eb1b0_0    defaults\n",
      "pathspec                  0.9.0                    pypi_0    pypi\n",
      "patool                    1.12                     pypi_0    pypi\n",
      "pdbufr                    0.9.0                    pypi_0    pypi\n",
      "pexpect                   4.8.0              pyh9f0ad1d_2    conda-forge\n",
      "pickleshare               0.7.5                   py_1003    conda-forge\n",
      "pillow                    8.3.1            py38h2c7a002_0    defaults\n",
      "pip                       21.0.1                   pypi_0    pypi\n",
      "pipx                      0.16.1.0                 pypi_0    pypi\n",
      "pluggy                    0.13.1                   pypi_0    pypi\n",
      "portalocker               2.3.2                    pypi_0    pypi\n",
      "powerline-shell           0.7.0                    pypi_0    pypi\n",
      "prometheus_client         0.9.0              pyhd3deb0d_0    conda-forge\n",
      "prompt-toolkit            3.0.10             pyha770c72_0    conda-forge\n",
      "properscoring             0.1                        py_0    conda-forge\n",
      "protobuf                  3.17.2           py38h295c915_0    defaults\n",
      "prov                      1.5.1                    pypi_0    pypi\n",
      "psutil                    5.8.0            py38h27cfd23_1    defaults\n",
      "ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge\n",
      "pyasn1                    0.4.8              pyhd3eb1b0_0    defaults\n",
      "pyasn1-modules            0.2.8                      py_0    defaults\n",
      "pycosat                   0.6.3           py38h497a2fe_1006    conda-forge\n",
      "pycparser                 2.20               pyh9f0ad1d_2    conda-forge\n",
      "pycurl                    7.43.0.6         py38h996a351_1    conda-forge\n",
      "pydap                     3.2.2           pyh9f0ad1d_1001    conda-forge\n",
      "pydot                     1.4.2                    pypi_0    pypi\n",
      "pygments                  2.10.0                   pypi_0    pypi\n",
      "pyjwt                     2.1.0                    pypi_0    pypi\n",
      "pyld                      2.0.3                    pypi_0    pypi\n",
      "pyodc                     1.1.1                    pypi_0    pypi\n",
      "pyopenssl                 20.0.1             pyhd8ed1ab_0    conda-forge\n",
      "pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge\n",
      "pyrsistent                0.17.3           py38h497a2fe_2    conda-forge\n",
      "pyshacl                   0.17.0.post1             pypi_0    pypi\n",
      "pysocks                   1.7.1            py38h578d9bd_3    conda-forge\n",
      "python                    3.8.6           hffdb5ce_4_cpython    conda-forge\n",
      "python-dateutil           2.8.1                      py_0    conda-forge\n",
      "python-eccodes            2021.03.0        py38hb5d20a5_1    conda-forge\n",
      "python-editor             1.0.4                    pypi_0    pypi\n",
      "python-flatbuffers        1.12               pyhd3eb1b0_0    defaults\n",
      "python-json-logger        2.0.1              pyh9f0ad1d_0    conda-forge\n",
      "python-snappy             0.6.0            py38h2531618_3    defaults\n",
      "python_abi                3.8                      1_cp38    conda-forge\n",
      "pytorch                   1.8.1           cpu_py38h60491be_0    defaults\n",
      "pytz                      2021.1             pyhd3eb1b0_0    defaults\n",
      "pyyaml                    5.4.1                    pypi_0    pypi\n",
      "pyzmq                     21.0.1           py38h3d7ac18_0    conda-forge\n",
      "rdflib                    6.0.1                    pypi_0    pypi\n",
      "rdflib-jsonld             0.5.0                    pypi_0    pypi\n",
      "readline                  8.0                  he28a2e2_2    conda-forge\n",
      "regex                     2021.4.4                 pypi_0    pypi\n",
      "renku                     0.16.2                   pypi_0    pypi\n",
      "requests                  2.24.0                   pypi_0    pypi\n",
      "requests-oauthlib         1.3.0                      py_0    defaults\n",
      "rich                      10.3.0                   pypi_0    pypi\n",
      "rsa                       4.7.2              pyhd3eb1b0_1    defaults\n",
      "ruamel-yaml               0.16.5                   pypi_0    pypi\n",
      "ruamel.yaml.clib          0.2.2            py38h497a2fe_2    conda-forge\n",
      "ruamel_yaml               0.15.80         py38h497a2fe_1003    conda-forge\n",
      "s3fs                      2021.7.0           pyhd3eb1b0_0    defaults\n",
      "schema-salad              8.2.20210918131710          pypi_0    pypi\n",
      "scikit-learn              0.24.2           py38ha9443f7_0    defaults\n",
      "scipy                     1.7.0            py38h7b17777_1    conda-forge\n",
      "send2trash                1.5.0                      py_0    conda-forge\n",
      "setuptools                58.2.0                   pypi_0    pypi\n",
      "setuptools-scm            6.0.1                    pypi_0    pypi\n",
      "shellescape               3.8.1                    pypi_0    pypi\n",
      "shellingham               1.4.0                    pypi_0    pypi\n",
      "simpervisor               0.4                      pypi_0    pypi\n",
      "six                       1.16.0                   pypi_0    pypi\n",
      "smmap                     4.0.0                    pypi_0    pypi\n",
      "snappy                    1.1.8                he6710b0_0    defaults\n",
      "sortedcontainers          2.4.0              pyhd3eb1b0_0    defaults\n",
      "soupsieve                 2.2.1              pyhd3eb1b0_0    defaults\n",
      "sqlalchemy                1.3.22           py38h497a2fe_1    conda-forge\n",
      "sqlite                    3.34.0               h74cdb3f_0    conda-forge\n",
      "sysroot_linux-64          2.12                h77966d4_13    conda-forge\n",
      "tabulate                  0.8.9                    pypi_0    pypi\n",
      "tbb                       2020.3               hfd86e86_0    defaults\n",
      "tblib                     1.7.0              pyhd3eb1b0_0    defaults\n",
      "tenacity                  7.0.0                    pypi_0    pypi\n",
      "tensorboard               2.4.0              pyhc547734_0    defaults\n",
      "tensorboard-plugin-wit    1.6.0                      py_0    defaults\n",
      "tensorflow                2.4.1           mkl_py38hb2083e0_0    defaults\n",
      "tensorflow-base           2.4.1           mkl_py38h43e0292_0    defaults\n",
      "tensorflow-estimator      2.6.0              pyh7b7c402_0    defaults\n",
      "termcolor                 1.1.0            py38h06a4308_1    defaults\n",
      "terminado                 0.9.2            py38h578d9bd_0    conda-forge\n",
      "testpath                  0.4.4                      py_0    conda-forge\n",
      "textwrap3                 0.9.2                    pypi_0    pypi\n",
      "threadpoolctl             2.2.0              pyh0d69192_0    defaults\n",
      "tini                      0.18.0            h14c3975_1001    conda-forge\n",
      "tk                        8.6.10               h21135ba_1    conda-forge\n",
      "toml                      0.10.2                   pypi_0    pypi\n",
      "toolz                     0.11.1             pyhd3eb1b0_0    defaults\n",
      "tornado                   6.1              py38h497a2fe_1    conda-forge\n",
      "tqdm                      4.60.0                   pypi_0    pypi\n",
      "traitlets                 5.0.5                      py_0    conda-forge\n",
      "typed-ast                 1.4.2                    pypi_0    pypi\n",
      "typing-extensions         3.7.4.3                  pypi_0    pypi\n",
      "typing_extensions         3.10.0.2           pyh06a4308_0    defaults\n",
      "urllib3                   1.25.11                  pypi_0    pypi\n",
      "userpath                  1.4.2                    pypi_0    pypi\n",
      "wcmatch                   8.2                      pypi_0    pypi\n",
      "wcwidth                   0.2.5              pyh9f0ad1d_2    conda-forge\n",
      "webencodings              0.5.1                      py_1    conda-forge\n",
      "webob                     1.8.7              pyhd3eb1b0_0    defaults\n",
      "werkzeug                  2.0.1              pyhd3eb1b0_0    defaults\n",
      "wheel                     0.36.2             pyhd3deb0d_0    conda-forge\n",
      "wrapt                     1.12.1           py38h7b6447c_1    defaults\n",
      "xarray                    0.19.0             pyhd3eb1b0_1    defaults\n",
      "xhistogram                0.3.0              pyhd8ed1ab_0    conda-forge\n",
      "xskillscore               0.0.23             pyhd8ed1ab_0    conda-forge\n",
      "xz                        5.2.5                h516909a_1    conda-forge\n",
      "yagup                     0.1.1                    pypi_0    pypi\n",
      "yaml                      0.2.5                h516909a_0    conda-forge\n",
      "yarl                      1.6.3            py38h27cfd23_0    defaults\n",
      "zarr                      2.8.1              pyhd3eb1b0_0    defaults\n",
      "zeromq                    4.3.3                h58526e2_3    conda-forge\n",
      "zict                      2.0.0              pyhd3eb1b0_0    defaults\n",
      "zipp                      3.4.0                      py_0    conda-forge\n",
      "zlib                      1.2.11            h516909a_1010    conda-forge\n",
      "zstd                      1.4.9                haebb681_0    defaults\n"
     ]
    }
   ],
   "source": [
    "!conda list"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.6"
  },
  "toc-autonumbering": true
 },
 "nbformat": 4,
 "nbformat_minor": 4
}