Capitule 4. Handling missing values and outliers
In datasets, we can find missing values that can be denoted as as "?" , "Null," etc.
On the other hand, outliers are extreme or wrong values. An example is a temperature
value of 2000 degrees when typical temperatures are between 19 and 40 degrees.
Handling missing values:
- Ignore.
- Delete or Replace all the column, but be careful with the existence of many missing values; it can not be an appropriate technique. It is possible to replace a column using other dependent columns.
- Delete all the row.
- Replace the value, using statistics concepts like mean or variance.
- Work with them. It depends on the classifier.
Handling outliers values:
- Ignore.
- Delete all the column.
- Delete all the row.
- Discretize them as "too high" or "too low."