Machine Learning - Finding Missing Data From Datasets and replace by mean - Python
For Professionals and Frehsers |
This is my dataset (this is simple excel file)
our agenda is to load data sets and replace missing values by mean
Step1: importing libraries.
import numpy as np
// library that contain mathematical tools , basically this library that we need to include ant types of mathematics in our code
import matplotlib.pyplot as plt
//This library is going to help us plot nice chart , In python if you want to plot nice chart then use this library
import pandas as pd
// this library is the best to import and manage datasets
import os
Step2: Importing Data Sets
os.chdir('your data source file path')
dataset=pd.read_csv('dataset filename')
x=datasets.iloc([:,:-1]).values
// ([rows, all columns except last one ]) all the lines of datasets and excepting last column , loading independent columns
y=datasets.iloc([:,3]).values
//([row:lastcolumn]) all rows and last column from datasets , loading dependent column
//see in sets there is two missing data one missing data in age column and one in salary column , so how u can handle this problem , one solution is remove the line of observation were missing data, but that quite dangerous bec if contain imp data this also lost
so we need to figure out better idia , most comon idia , take mean of column
age and same fro salary
// trying to care of missing data
from sklearn.preprocessing import Imputer
imputer=Imputer(missing_values='NaN', strategy='mean',axis=0)
//missing_value recognition , strategy is mean strategy , axis=0
axis=0 mean of column
axis=1=mean of rows
imputer=imputer.fit(x[:,1:3])
//fit on missing data in all rowsof one column and 2 column
index start from 0
x[:,1:3]=imputer.transform(x[[:,1:3])
// transform function replace missing data by mean values
print(x)
//
// NOTE - USE ANACONDA PYTHON FOR THIS, IF YOU ARE IDE LOVER
USE SPYDER IDE.
Comments
Post a Comment