Photo by Robina Weermeijer on Unsplash

HEART DISEASE PREDICTION BY USING MACHINE LEARNING WITH PYTHON :

Tharanginimohan

--

My project is based on Heart disease prediction. it's based on various symptoms such as age, gender, etc. due to the increase of technology we can now predict heart disease by using machine learning algorithms with python.

STEP: 1

First of all, we should be importing all libraries that we need for this project. why are we importing all libraries at first? Yeah, It's easy to run our algorithms easily and especially without error like import error, module not found like that…

>>Numpy: to work with arrays

>>Pandas: work with CSV files and data frames

>>Matplotlib: create charts using pyplot, define parameters with rcparams and colour with cm.rainbow

>>Warnings: ignore all warnings showing up in the features

>>train_test_split: split the dataset into training and testing data

>> Standardscalar: scale all the features for a model better adapts to the dataset.

STEP: 2

In this, we will import our dataset for identifying file path and take a look at data.

STEP : 3

UNDERSTANDING THE DATA

#Correlatation matrix It is used to identify the difference between variables x and y. #Histograms It is used for even distribution and it provides so much information. #Bar plots It’s really essential that the dataset should be approximately balanced.An extremely imbalanced Dataset can render the whole model training is useless so we use it.

STEP: 4

PROCESSING THE DATA

First, we convert our categorical variables into dummy variables. It is an essential part of data pre-processing, and an integral part of machine learning. dummy variables act as indicators of the presence or absence of a category in a categorical variable.0 means absence and 1 means presence. The conversion of a categorical variable into a dummy variable by three methods, in this we used, get_dummies() Function of the panda's library. Now our dataset is ready to train the model.

STEP: 5

Now we come to the final step in this project I took 3 algorithms for compare the models. I split my dataset into 63% training data and 33% testing data.

X =df[ ]

Y =df[ ]

#>> K NEIGHBOURS CLASSIFIER

I took the k neighbours classifier as my first algorithm.it is used for both classification and regression models.it is a simple and lazy algorithm. here the value of k must be in odd number. the k in the name of this classifier represent the k nearest neighbour.it belongs to supervised machine learning. Here the num of neighbours can be varied. I varied then 1 to 20 neighbours and calculate the test score in each case. After plot we achieved the max score 87% when the num of neighbours was chosen to be 8.

#>>SVM algorithm

It is a Powerful flexible supervised machine learning for classification, regression and outlier detection.it is dimensionally high spaces and are used in classification problems.it is classified into 3 types. From that we use SVC classifier, It is c- support vector classification is based on libsvm.it is done by based on certain parameters and attributes. This classifier aims at forming a hyperplane that can seperate the classes as much as possible by adjusting the distance bt the data points and the hyperplane.there are several kerbala based on which the hyperplane is decided,in this project we used four kernals namely,linear,poly,rbf and sigmoid.here i used rainbow to differentiate the colors of each bar. Finally the linear kernal performed the best for this dataset and achieved 83%.

#>>DECISION TREE CLASSIFIER

The general motive of using a Decision Tree is to create a training model which can use to predict the class or value of target variables by learning decision rules. Here we can vary the max num of features to be considered the model. I take the range from 1 to 30. After the plot, we can clearly see that the max score is 79% and is achieved for max features 2,4 or 18.

K Neighbors Classifier scored the best score of 87% with 8 neighbours.

Photo by Tim Mossholder on Unsplash

--

--