Capitule 2. Websites to download datasets for classification task
Before to apply machine learning (ML) techniques, we need to have confidence datasets. Some cases,
the research groups in the world have their datasets that are not available for the public and need
to send a request by email.
Fortunately, there are confidence websites which let us download datasets of different fields such as
UCI Machine Learning Repository and KEEL. An essential thing to consider is to know the phenomenon of
the task to classify. It permits us to understand the appropriate pre-processing step to apply to the
dataset, previous to classify. My recommendation is to contact an expert in the area which you want to
merge with ML; it helps you to know in the future, the best pipeline to follows for the classification task.
THE TASK FOR YOU:
- Investigate and select seven datasets and create for each one a table that contains the following information: area, number of instances (a.k.a. patterns), attributes and classes, association task.
- Which datasets have missing values?
- What is your opinion about the classes of the datasets?