Any machine learning problem can be represented with the following three concepts:
- We will have to learn to solve a task T.
- We will need some experience E to learn to perform the task.
- We will need a measure of performance P to know how well we are solving the task and also to know whether after doing some modifications, our results are improving or getting worse.
Check everything is ready to run:
%pylab import IPython import platform import sklearn as sk import numpy as np import matplotlib import matplotlib.pyplot as plt print('Python version: ', platform.python_version()) print('IPython version: ', IPython.__version__) print('numpy version: ', np.__version__) print('scikit-learn version: ', sk.__version__) print('matplotlib version: ', matplotlib.__version__)
print('Python version: ', platform.python_version()) print('IPython version: ', IPython.__version__) print('numpy version: ', np.__version__) print('scikit-learn version: ', sk.__version__) print('matplotlib version: ', matplotlib.__version__)
Machine learning methods rely on previous experience, usually represented by a dataset. Scikit-learn includes some well-known datasets, like iris-flower.
Iris flower dataset includes information about 150 instances from three different Iris flower species, including sepal and petal length and width. The natural task to solve using this dataset is to learn to guess the Iris species knowing the sepal and petal measures.
from sklearn import datasets iris = datasets.load_iris() X_iris = iris.data Y_iris = iris.target
The dataset includes 150 instances, with 4 attributes each. For each instance, we will also have a target class (in our case, the species). This class is a special attribute which we will aim to predict for new, previously unseen instances, given the remaining (known) attributes.