Learning scikit-learn

Preface

Any machine learning problem can be represented with the following three concepts:

  • We will have to learn to solve a task T.
  • We will need some experience E to learn to perform the task.
  • We will need a measure of performance P to know how well we are solving the task and also to know whether after doing some modifications, our results are improving or getting worse.
Use scikit-learn

Check everything is ready to run:

%pylab
import IPython  
import platform  
import sklearn as sk  
import numpy as np  
import matplotlib  
import matplotlib.pyplot as plt

print('Python version: ', platform.python_version())  
print('IPython version: ', IPython.__version__)  
print('numpy version: ', np.__version__)  
print('scikit-learn version: ', sk.__version__)  
print('matplotlib version: ', matplotlib.__version__)  

Output:

print('Python version: ', platform.python_version())  
print('IPython version: ', IPython.__version__)  
print('numpy version: ', np.__version__)  
print('scikit-learn version: ', sk.__version__)  
print('matplotlib version: ', matplotlib.__version__)  
Datasets

Machine learning methods rely on previous experience, usually represented by a dataset. Scikit-learn includes some well-known datasets, like iris-flower.

The Iris flower dataset includes information about 150 instances from three different Iris flower species, including sepal and petal length and width. The natural task to solve using this dataset is to learn to guess the Iris species knowing the sepal and petal measures.

from sklearn import datasets  
iris = datasets.load_iris()  
X_iris = iris.data  
Y_iris = iris.target  

The dataset includes 150 instances, with 4 attributes each. For each instance, we will also have a target class (in our case, the species). This class is a special attribute which we will aim to predict for new, previously unseen instances, given the remaining (known) attributes.