0. Data Preprocessing

0.1 Importing the libraries

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

0.2 Importing the dataset

dataset = pd.read_csv('Social_Network_Ads.csv')
dataset
User ID Gender Age EstimatedSalary Purchased
0 15624510 Male 19 19000 0
1 15810944 Male 35 20000 0
2 15668575 Female 26 43000 0
3 15603246 Female 27 57000 0
4 15804002 Male 19 76000 0
... ... ... ... ... ...
395 15691863 Female 46 41000 1
396 15706071 Male 51 23000 1
397 15654296 Female 50 20000 1
398 15755018 Male 36 33000 0
399 15594041 Female 49 36000 1

400 rows × 5 columns

0.3 Check if any null value

dataset.isna().sum()
User ID            0
Gender             0
Age                0
EstimatedSalary    0
Purchased          0
dtype: int64
dataset.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 400 entries, 0 to 399
Data columns (total 5 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   User ID          400 non-null    int64 
 1   Gender           400 non-null    object
 2   Age              400 non-null    int64 
 3   EstimatedSalary  400 non-null    int64 
 4   Purchased        400 non-null    int64 
dtypes: int64(4), object(1)
memory usage: 15.8+ KB

Drop User ID

dataset.drop('User ID', axis=1, inplace=True)
dataset.head()
Gender Age EstimatedSalary Purchased
0 Male 19 19000 0
1 Male 35 20000 0
2 Female 26 43000 0
3 Female 27 57000 0
4 Male 19 76000 0

0.4 Split into X & y

X = dataset.drop('Purchased', axis=1)
X.head()
Gender Age EstimatedSalary
0 Male 19 19000
1 Male 35 20000
2 Female 26 43000
3 Female 27 57000
4 Male 19 76000
y = dataset['Purchased']
y.head()
0    0
1    0
2    0
3    0
4    0
Name: Purchased, dtype: int64

0.5 Convert categories into numbers

from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
 
categorical_feature = ["Gender"]
one_hot = OneHotEncoder()
transformer = ColumnTransformer([("one_hot",
                                  one_hot,
                                  categorical_feature)],
                                 remainder="passthrough")

transformed_X = transformer.fit_transform(X)
pd.DataFrame(transformed_X).head()
0 1 2 3
0 0.0 1.0 19.0 19000.0
1 0.0 1.0 35.0 20000.0
2 1.0 0.0 26.0 43000.0
3 1.0 0.0 27.0 57000.0
4 0.0 1.0 19.0 76000.0

0.6 Splitting the dataset into the Training set and Test set

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(transformed_X, y, test_size = 0.25, random_state = 2509)

0.7 Feature Scaling

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

1.Training the model on the Training set

from sklearn.svm import SVC
classifier = SVC(kernel = 'rbf')
classifier.fit(X_train, y_train)
SVC()

1.1 Score

classifier.score(X_test,y_test)
0.92

2.Predicting the Test set results

y_pred = classifier.predict(X_test)

2.2 Making the Confusion Matrix

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)
[[65  5]
 [ 3 27]]