Data Cleaning in Machine Learning

All you need to know about cleaning data

data.isnull().sum()dropna(axis=1)drop(features_list)data.select_dtypes(exclude=[features_list])

Imputer

from sklearn.impute import SimpleImputer
my_imputer = SimpleImputer()
filled_data = my_imputer.fit_transform(data) //on train data
filled_data = my_imputer.transform(data) //on test data

Low cardinality data(#unique values) to select categorical columns.

#low_cardinality_cols
data[feature].nunique() < 10

Cross-validation

from sklearn.model_selection import cross_val_scorecross_val_score(RandomForestRegressor(50),X, y,scoring = 'neg_mean_absolute_error').mean()

Terminology

Consultant