Ricardo Licona

Capitule 1. Machine Learning for "dummies"

Pattern Recognition (PR) is the scientific field that has the aim of objects classification into classes (typically are two or three classes). There are four main tasks in PR, classification, regression, clustering, and recuperation. For this blog entry, we focus on the classification task.

Classification and regression task belongs to supervised learning. So we are going to have two main subsets (i.e., training and testing).

A pattern is a set of attributes, so if we have a dataset of types of lemons, color and texture could be two attributes. The possible class for each pattern could be 0 and 1, where one means "is a lemon" and 0 "is not a lemon." The class is not an attribute!.

The pipeline that most of the researchers follow to classify a dataset:

Have a Dataset, e.g., Iris dataset.
Apply a technique to divide the dataset into two subsets ( i.e., training and testing).
Training and testing the classifier with the subsets mentioned in step 2.
Apply a performance metric such as accuracy to know the final performance of the classifier.

THE TASK FOR YOU:

Investigate the websites that have available and confidence datasets with classification task.
Investigate the three main techniques that exist in Machine Learning to divide a dataset.
Give another example of a performance metric.
Identify the total number of patterns, attributes, and classes of the Iris dataset.