Learning scikit-learn
Preface
Any machine learning problem can be represented with the following three concepts:
- We will have to learn to solve a task T.
- We will need some experience E to learn to perform the task.
- We will need a measure of performance P to know how well we are solving the task and also to know whether after doing some modifications, our results are improving or getting worse.
Use scikit-learn
Check everything is ready to run:
%pylab
import IPython
import platform
import sklearn as sk
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
print('Python version: ', platform.python_version())
print('IPython version: ', IPython.__version__)
print('numpy version: ', np.__version__)
print('scikit-learn version: ', sk.__version__)
print('matplotlib version: ', matplotlib.__version__)
Output:
print('Python version: ', platform.python_version())
print('IPython version: ', IPython.__version__)
print('numpy version: ', np.__version__)
print('scikit-learn version: ', sk.__version__)
print('matplotlib version: ', matplotlib.__version__)
Datasets
Machine learning methods rely on previous experience, usually represented by a dataset. Scikit-learn includes some well-known datasets, like iris-flower.
The Iris flower dataset
includes information about 150 instances from three different Iris flower species, including sepal and petal length and width. The natural task to solve using this dataset is to learn to guess the Iris species knowing the sepal and petal measures.
from sklearn import datasets
iris = datasets.load_iris()
X_iris = iris.data
Y_iris = iris.target
The dataset includes 150 instances, with 4 attributes each. For each instance, we will also have a target class (in our case, the species). This class is a special attribute which we will aim to predict for new, previously unseen instances, given the remaining (known) attributes.