Commit 3e1da264 authored by Mirko Birbaumer's avatar Mirko Birbaumer
Browse files

Test of notebooks. Path adapted

parent 0033a614
......@@ -13,7 +13,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 1,
"metadata": {
"nbpresent": {
"id": "409a1ab7-fe1d-4430-b904-7694020a6223"
......@@ -50,7 +50,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 2,
"metadata": {
"nbpresent": {
"id": "f77bd9ec-de3b-4c56-b08d-4a65f0780408"
......@@ -63,7 +63,7 @@
"dict"
]
},
"execution_count": 3,
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
......@@ -85,7 +85,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 3,
"metadata": {
"nbpresent": {
"id": "c874a7c9-de0c-4ccd-a0f1-8f8a3265a0b6"
......@@ -98,7 +98,7 @@
"dict_keys([b'batch_label', b'labels', b'data', b'filenames'])"
]
},
"execution_count": 4,
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
......
%% Cell type:markdown id: tags:
# Exercise 1 - K-Nearest Neighbor Classifier for MNIST
%% Cell type:markdown id: tags:
In this exercise, we'll apply KNN Classifiers to the MNIST dataset. The aim of the exercise is to get acquainted with the MNIST dataset.
This guide uses [tf.keras](https://www.tensorflow.org/guide/keras), a high-level API to build and train models in TensorFlow.
%% Cell type:markdown id: tags:
## Install and import dependencies
We'll need [TensorFlow Datasets](https://www.tensorflow.org/datasets/), an API that simplifies downloading and accessing datasets, and provides several sample datasets to work with. We're also using a few helper libraries.
%% Cell type:code id: tags:
``` python
!pip install -U tensorflow_datasets
```
%%%% Output: stream
Requirement already up-to-date: tensorflow_datasets in /opt/conda/lib/python3.7/site-packages (4.6.0)
Requirement already satisfied, skipping upgrade: absl-py in /opt/conda/lib/python3.7/site-packages (from tensorflow_datasets) (1.2.0)
Collecting tensorflow_datasets
Downloading tensorflow_datasets-4.6.0-py3-none-any.whl (4.3 MB)
 |████████████████████████████████| 4.3 MB 5.1 MB/s eta 0:00:01
[?25hRequirement already satisfied, skipping upgrade: termcolor in /opt/conda/lib/python3.7/site-packages (from tensorflow_datasets) (2.0.1)
Requirement already satisfied, skipping upgrade: typing-extensions; python_version < "3.8" in /opt/conda/lib/python3.7/site-packages (from tensorflow_datasets) (3.7.4.3)
Requirement already satisfied, skipping upgrade: dill in /opt/conda/lib/python3.7/site-packages (from tensorflow_datasets) (0.3.5.1)
Requirement already satisfied, skipping upgrade: termcolor in /opt/conda/lib/python3.7/site-packages (from tensorflow_datasets) (2.0.1)
Requirement already satisfied, skipping upgrade: toml in /opt/conda/lib/python3.7/site-packages (from tensorflow_datasets) (0.10.2)
Requirement already satisfied, skipping upgrade: numpy in /opt/conda/lib/python3.7/site-packages (from tensorflow_datasets) (1.19.1)
Requirement already satisfied, skipping upgrade: protobuf>=3.12.2 in /opt/conda/lib/python3.7/site-packages (from tensorflow_datasets) (3.20.2)
Requirement already satisfied, skipping upgrade: importlib-resources; python_version < "3.9" in /opt/conda/lib/python3.7/site-packages (from tensorflow_datasets) (5.9.0)
Requirement already satisfied, skipping upgrade: tensorflow-metadata in /opt/conda/lib/python3.7/site-packages (from tensorflow_datasets) (1.10.0)
Requirement already satisfied, skipping upgrade: etils[epath] in /opt/conda/lib/python3.7/site-packages (from tensorflow_datasets) (0.8.0)
Requirement already satisfied, skipping upgrade: tqdm in /opt/conda/lib/python3.7/site-packages (from tensorflow_datasets) (4.45.0)
Requirement already satisfied, skipping upgrade: numpy in /opt/conda/lib/python3.7/site-packages (from tensorflow_datasets) (1.19.1)
Requirement already satisfied, skipping upgrade: requests>=2.19.0 in /opt/conda/lib/python3.7/site-packages (from tensorflow_datasets) (2.23.0)
Requirement already satisfied, skipping upgrade: six in /opt/conda/lib/python3.7/site-packages (from tensorflow_datasets) (1.14.0)
Collecting dill
Downloading dill-0.3.5.1-py2.py3-none-any.whl (95 kB)
 |████████████████████████████████| 95 kB 3.4 MB/s eta 0:00:01
[?25hRequirement already satisfied, skipping upgrade: six in /opt/conda/lib/python3.7/site-packages (from tensorflow_datasets) (1.14.0)
Requirement already satisfied, skipping upgrade: protobuf>=3.12.2 in /opt/conda/lib/python3.7/site-packages (from tensorflow_datasets) (3.20.0)
Requirement already satisfied, skipping upgrade: toml in /opt/conda/lib/python3.7/site-packages (from tensorflow_datasets) (0.10.2)
Collecting importlib-resources; python_version < "3.9"
Downloading importlib_resources-5.9.0-py3-none-any.whl (33 kB)
Collecting etils[epath]
Downloading etils-0.8.0-py3-none-any.whl (127 kB)
 |████████████████████████████████| 127 kB 41.0 MB/s eta 0:00:01
[?25hRequirement already satisfied, skipping upgrade: absl-py in /opt/conda/lib/python3.7/site-packages (from tensorflow_datasets) (1.2.0)
Requirement already satisfied, skipping upgrade: promise in /opt/conda/lib/python3.7/site-packages (from tensorflow_datasets) (2.3)
Requirement already satisfied, skipping upgrade: zipp>=3.1.0; python_version < "3.10" in /opt/conda/lib/python3.7/site-packages (from importlib-resources; python_version < "3.9"->tensorflow_datasets) (3.1.0)
Requirement already satisfied, skipping upgrade: googleapis-common-protos<2,>=1.52.0 in /opt/conda/lib/python3.7/site-packages (from tensorflow-metadata->tensorflow_datasets) (1.56.4)
Collecting tensorflow-metadata
Downloading tensorflow_metadata-1.10.0-py3-none-any.whl (50 kB)
 |████████████████████████████████| 50 kB 6.5 MB/s eta 0:00:01
[?25hRequirement already satisfied, skipping upgrade: chardet<4,>=3.0.2 in /opt/conda/lib/python3.7/site-packages (from requests>=2.19.0->tensorflow_datasets) (3.0.4)
Requirement already satisfied, skipping upgrade: idna<3,>=2.5 in /opt/conda/lib/python3.7/site-packages (from requests>=2.19.0->tensorflow_datasets) (2.9)
Requirement already satisfied, skipping upgrade: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/lib/python3.7/site-packages (from requests>=2.19.0->tensorflow_datasets) (1.25.11)
Requirement already satisfied, skipping upgrade: certifi>=2017.4.17 in /opt/conda/lib/python3.7/site-packages (from requests>=2.19.0->tensorflow_datasets) (2020.6.20)
Requirement already satisfied, skipping upgrade: chardet<4,>=3.0.2 in /opt/conda/lib/python3.7/site-packages (from requests>=2.19.0->tensorflow_datasets) (3.0.4)
Requirement already satisfied, skipping upgrade: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/lib/python3.7/site-packages (from requests>=2.19.0->tensorflow_datasets) (1.25.9)
Requirement already satisfied, skipping upgrade: zipp>=3.1.0; python_version < "3.10" in /opt/conda/lib/python3.7/site-packages (from importlib-resources; python_version < "3.9"->tensorflow_datasets) (3.1.0)
Collecting googleapis-common-protos<2,>=1.52.0
Downloading googleapis_common_protos-1.56.4-py2.py3-none-any.whl (211 kB)
 |████████████████████████████████| 211 kB 39.6 MB/s eta 0:00:01
[?25hInstalling collected packages: dill, importlib-resources, etils, googleapis-common-protos, tensorflow-metadata, tensorflow-datasets
Successfully installed dill-0.3.5.1 etils-0.8.0 googleapis-common-protos-1.56.4 importlib-resources-5.9.0 tensorflow-datasets-4.6.0 tensorflow-metadata-1.10.0
WARNING: You are using pip version 20.2.4; however, version 22.2.2 is available.
You should consider upgrading via the '/opt/conda/bin/python3 -m pip install --upgrade pip' command.
%% Cell type:code id: tags:
``` python
# Import TensorFlow
# FOR COLAB USERS:
# If you run this notebook in Colab, then execute the following line (uncomment it)
# %tensorflow_version 2.x
# If you run this noteook in your tensorflow 2.x environment, then
# verify you have version > 2.0
import tensorflow as tf
print(tf.__version__)
# Now you should get version 2.x
# If you still get version 1.x, then execute (uncomment) the following lines and run the cell
#!pip uninstall tensorflow
#!pip install --upgrade pip
#!pip install --upgrade tensorflow
#!python3 -c "import tensorflow as tf;print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
#try:
# import tensorflow as tf
#except Exception:
# pass
#print(tf.__version__)
```
%%%% Output: stream
2.7.1
%% Cell type:code id: tags:
``` python
from __future__ import absolute_import, division, print_function, unicode_literals
# Import TensorFlow Datasets
import tensorflow as tf
import tensorflow_datasets as tfds
tfds.disable_progress_bar()
# Helper libraries
import math
import numpy as np
import matplotlib.pyplot as plt
```
%% Cell type:code id: tags:
``` python
import logging
logger = tf.get_logger()
logger.setLevel(logging.ERROR)
!python -V
```
%%%% Output: stream
Python 3.7.6
%% Cell type:markdown id: tags:
## Import the MNIST dataset
%% Cell type:markdown id: tags:
This guide uses the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset—often used as the "Hello, World" of machine learning programs for computer vision. The MNIST dataset contains images of handwritten digits (0, 1, 2, etc)
We will use 60,000 images to train the network and 10,000 images to evaluate how accurately the network learned to classify images. You can access the MNIST directly from TensorFlow, using the [Datasets](https://www.tensorflow.org/datasets) API:
%% Cell type:code id: tags:
``` python
dataset, metadata = tfds.load('mnist', as_supervised=True, with_info=True)
train_dataset, test_dataset = dataset['train'], dataset['test']
```
%%%% Output: stream
Downloading and preparing dataset 11.06 MiB (download: 11.06 MiB, generated: 21.00 MiB, total: 32.06 MiB) to ~/tensorflow_datasets/mnist/3.0.1...
Dataset mnist downloaded and prepared to ~/tensorflow_datasets/mnist/3.0.1. Subsequent calls will reuse this data.
%% Cell type:markdown id: tags:
Loading the dataset returns metadata as well as a *training dataset* and *test dataset*.
* The model is trained using `train_dataset`.
* The model is tested against `test_dataset`.
The images are 28 $\times$ 28 arrays, with pixel values in the range `[0, 255]`. The *labels* are an array of integers, in the range `[0, 9]`. These correspond to the handwritten numbers.
Each image is mapped to a single label. Since the *class names* are not included with the dataset, store them here to use later when plotting the images:
%% Cell type:code id: tags:
``` python
class_names = ['Zero', 'One', 'Two', 'Three', 'Four', 'Five',
'Six', 'Seven', 'Eight', 'Nine']
```
%% Cell type:markdown id: tags:
### Explore the data
Let's explore the format of the dataset before training the model. The following shows there are 60,000 images in the training set, and 10000 images in the test set:
%% Cell type:code id: tags:
``` python
num_train_examples = metadata.splits['train'].num_examples
num_test_examples = metadata.splits['test'].num_examples
print("Number of training examples: {}".format(num_train_examples))
print("Number of test examples: {}".format(num_test_examples))
```
%%%% Output: stream
Number of training examples: 60000
Number of test examples: 10000
%% Cell type:markdown id: tags:
Let's plot an image to see what it looks like.
%% Cell type:code id: tags:
``` python
# Take a single image, and remove the color dimension by reshaping
for image, label in test_dataset.take(1):
break
image = image.numpy().reshape((28,28))
# Plot the image - voila an example of a handwritten digit
plt.figure()
plt.imshow(image, cmap=plt.cm.binary)
plt.colorbar()
plt.grid(False)
plt.show()
```
%%%% Output: display_data
![]()
%% Cell type:markdown id: tags:
Display the first 25 images from the *test set* and display the class name below each image. Verify that the data is in the correct format and we're ready to build and train the network.
%% Cell type:code id: tags:
``` python
plt.figure(figsize=(10,10))
i = 0
for (image, label) in test_dataset.take(25):
image = image.numpy().reshape((28,28))
plt.subplot(5,5,i+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(image, cmap=plt.cm.binary)
plt.xlabel(class_names[label])
i += 1
plt.show()
```
%%%% Output: display_data
![]()
%% Cell type:markdown id: tags:
## Import the Fashion MNIST dataset
%% Cell type:markdown id: tags:
If numbers are not your thing then use the [Fashion MNIST](https://github.com/zalandoresearch/fashion-mnist) dataset, which contains 70,000 grayscale images in 10 categories. The images show individual articles of clothing at low resolution (28 $\times$ 28 pixels), as seen here:
<table>
<tr><td>
<img src="https://tensorflow.org/images/fashion-mnist-sprite.png"
alt="Fashion MNIST sprite" width="600">
</td></tr>
<tr><td align="center">
<b>Figure 1.</b> <a href="https://github.com/zalandoresearch/fashion-mnist">Fashion-MNIST samples</a> (by Zalando, MIT License).<br/>&nbsp;
</td></tr>
</table>
You may use Fashion MNIST for variety, and because it's a slightly more challenging problem than regular MNIST. Both datasets are relatively small and are used to verify that an algorithm works as expected. They're good starting points to test and debug code.
We will use 60,000 images to train the network and 10,000 images to evaluate how accurately the network learned to classify images. You can access the Fashion MNIST directly from TensorFlow, using the [Datasets](https://www.tensorflow.org/datasets) API:
%% Cell type:code id: tags:
``` python
dataset, metadata = tfds.load('fashion_mnist', as_supervised=True, with_info=True)
train_dataset, test_dataset = dataset['train'], dataset['test']
```
%%%% Output: stream
Downloading and preparing dataset 29.45 MiB (download: 29.45 MiB, generated: 36.42 MiB, total: 65.87 MiB) to ~/tensorflow_datasets/fashion_mnist/3.0.1...
Dataset fashion_mnist downloaded and prepared to ~/tensorflow_datasets/fashion_mnist/3.0.1. Subsequent calls will reuse this data.
%% Cell type:markdown id: tags:
Loading the dataset returns metadata as well as a *training dataset* and *test dataset*.
* The model is trained using `train_dataset`.
* The model is tested against `test_dataset`.
The images are 28 $\times$ 28 arrays, with pixel values in the range `[0, 255]`. The *labels* are an array of integers, in the range `[0, 9]`. These correspond to the *class* of clothing the image represents:
<table>
<tr>
<th>Label</th>
<th>Class</th>
</tr>
<tr>
<td>0</td>
<td>T-shirt/top</td>
</tr>
<tr>
<td>1</td>
<td>Trouser</td>
</tr>
<tr>
<td>2</td>
<td>Pullover</td>
</tr>
<tr>
<td>3</td>
<td>Dress</td>
</tr>
<tr>
<td>4</td>
<td>Coat</td>
</tr>
<tr>
<td>5</td>
<td>Sandal</td>
</tr>
<tr>
<td>6</td>
<td>Shirt</td>
</tr>
<tr>
<td>7</td>
<td>Sneaker</td>
</tr>
<tr>
<td>8</td>
<td>Bag</td>
</tr>
<tr>
<td>9</td>
<td>Ankle boot</td>
</tr>
</table>
Each image is mapped to a single label. Since the *class names* are not included with the dataset, store them here to use later when plotting the images:
%% Cell type:code id: tags:
``` python
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
```
%% Cell type:markdown id: tags:
### Explore the data
Let's explore the format of the dataset before training the model. The following shows there are 60,000 images in the training set, and 10000 images in the test set:
%% Cell type:code id: tags:
``` python
num_train_examples = metadata.splits['train'].num_examples
num_test_examples = metadata.splits['test'].num_examples
print("Number of training examples: {}".format(num_train_examples))
print("Number of test examples: {}".format(num_test_examples))
```
%%%% Output: stream
Number of training examples: 60000
Number of test examples: 10000
%% Cell type:markdown id: tags:
Let's plot an image to see what it looks like.
%% Cell type:code id: tags:
``` python
# Take a single image, and remove the color dimension by reshaping
for image, label in test_dataset.take(1):
break
image = image.numpy().reshape((28,28))
# Plot the image - voila a piece of fashion clothing
plt.figure()
plt.imshow(image, cmap=plt.cm.binary)
plt.colorbar()
plt.grid(False)
plt.show()
```
%%%% Output: display_data
![]()
%% Cell type:markdown id: tags:
Display the first 25 images from the *test set* and display the class name below each image. Verify that the data is in the correct format and we're ready to build and train the network.
%% Cell type:code id: tags:
``` python
plt.figure(figsize=(10,10))
i = 0
for (image, label) in test_dataset.take(25):
image = image.numpy().reshape((28,28))
plt.subplot(5,5,i+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(image, cmap=plt.cm.binary)
plt.xlabel(class_names[label])
i += 1
plt.show()
```
%%%% Output: display_data
![]()
%% Cell type:markdown id: tags:
Decide whether you want to work with the traditional or Fashin MNIST dataset, then extract 5000 training examples and
500 test examples.
%% Cell type:code id: tags:
``` python
i=0
for (image, label) in train_dataset.take(5000):
if i==0:
X_train = image.numpy().reshape((1,28*28))
y_train = np.array([label])
else:
X_train = np.concatenate([X_train, image.numpy().reshape((1,28*28))], axis=0)
y_train = np.concatenate([y_train, np.array([label])], axis=0)
i+=1
print("Shape of image training data : ", X_train.shape)
print("Shape of training data labels : ", y_train.shape)
```
%%%% Output: stream
Shape of image training data : (5000, 784)
Shape of training data labels : (5000,)
%% Cell type:code id: tags:
``` python
j=0
for (image, label) in test_dataset.take(500):
if j==0:
X_test = image.numpy().reshape((1,28*28))
y_test = np.array([label])
else:
X_test = np.concatenate([X_test, image.