Capitule 4. Handling missing values and outliers

In datasets, we can find missing values that can be denoted as as "?" , "Null," etc. On the other hand, outliers are extreme or wrong values. An example is a temperature value of 2000 degrees when typical temperatures are between 19 and 40 degrees.
Handling missing values:

  • Ignore.
  • Delete or Replace all the column, but be careful with the existence of many missing values; it can not be an appropriate technique. It is possible to replace a column using other dependent columns.
  • Delete all the row.
  • Replace the value, using statistics concepts like mean or variance.
  • Work with them. It depends on the classifier.

Handling outliers values:
  • Ignore.
  • Delete all the column.
  • Delete all the row.
  • Discretize them as "too high" or "too low."