Website-Based Application for Classification of Diabetes Using Logistic Regression Method

Written by Muhamad Soleh, Naufal Ammar, Indrati Sukmadi
on April 29, 2021

JURNAL ILMIAH MERPATI VOL. 9, NO. 1 APRIL 2021

p-ISSN: 2252-3006

e-ISSN: 2685-2411

Website-Based Application for Classification of Diabetes Using Logistic Regression Method

Muhamad Soleh^a1, Naufal Ammar^a2, Indrati Sukmadi^a3^aProgram Studi Informatika, Institut Teknologi Indonesia, Indonesia e-mail: ¹muhamad.soleh@iti.ac.id, ²ammarhtr@gmail.com, ³indrati.sukmadi@iti.ac.id

Abstrak

Pembelajaran mesin merupakan salah satu ilmu yang mempelajari tentang bagaimana komputer mampu belajar dari data untuk meningkatkan kecerdasannya. Machine learning terdiri dari banyak metode klasifikasi, antara lain Neural Network, Support Vector Machine, Logistics Regression, dan lain – lain. Pada penelitian ini dilakukan proses klasifikasi dengan menggunakan metode Logistics Regression untuk kasus penyakit Diabetes. Diabetes adalah kenaikan glukosa dalam aliran darah karena kekurangan insulin yang bertanggung jawab untuk transfer glukosa dari darah ke jaringan atau sel. Penelitian ini dibuat dengan tujuan untuk memperbaiki penelitian sebelumnya. Data yang digunakan pada penelitian ini yaitu data yang sama dengan penelitian sebelumnya yang diterbitkan oleh Pima Indian Diabetes Dataset. Pada penelitian ini digunakan beberapa tahapan yaitu pre processing, proses, evaluasi, serta pengembangan aplikasi berbasis website. Data pada penelitian ini dibagi menjadi dua yaitu 75% untuk data training, dan 25% untuk data testing. Penelitian ini menghasilkan evaluasi dengan nilai akurasi sebesar 80%, yang itu berarti lebih baik dengan penelitian sebelumnya yaitu sebesar 75, 97%.

Kata kunci: Pembelajaran Mesin, Logistics Regression, Diabetes, Aplikasi, Website.

Abstract

Machine learning is a one of computer science field, machine-learning studies how computers are able to learn from data to improve their intelligence. Machine learning consists of many classification methods, including Neural Networks, Support Vector Machines, Logistics Regression, and others. In this study, a classification process carried out using the Logistics Regression method for cases of Diabetes. Diabetes is an increase in glucose in the bloodstream due to a lack of insulin, which is responsible for the transfer of glucose from the blood to tissues or cells. This study created with the aim of improving previous paper. The data used in this study are the same data as previous studies published by the Pima Indian Diabetes Dataset. In this study, several stages used, those are pre-processing, processing, evaluation, and website-based application development. The data in this study divided into two, 75% for training data, and 25% for testing data. This study produces an evaluation with an accuracy 80%, which means it is better than the previous paper, which is 75, 97%.

Keywords: Machine Learning, Logistics Regression, Diabetes, Application, Website.

1. Introduction

Machine learning defined as computer applications and mathematical algorithms that adopted by learning derives from data and produces predictions in the future. The learning process to acquire intelligence that goes through two stages, training and testing. Build computer programs with the intelligence increases automatically based on experience [1].

Previous paperers have carried out paper on the use of machine learning in the health sector. As paper conducted by [2], this paper used Convolution Neural Network (CNN) method in classifying patterns and evaluated 120 cases from two different hospitals. This method produces a pattern more superior to other methods and faster in classifying diseases present in the lung with larger and clearer image result. In other paper conducted by [3] This study uses the forward chaining method used to develop a machine learning prototype for the prognosis of dementia, using literature data on examining patients who have been diagnosed with dementia, including examination of blood pressure, blood lipid levels, blood sugar levels, vesicular and

abdominal inspection. In this paper, successfully carried out and obtained by paperer, a prognosis solution for dementia according to optimal examination results. In another paper conducted by [4] in this paper using the KNN algorithm in solving problems, this study got an accuracy for distinguishing HC and MS subjects reaching 80%. In this paper included 111 subjects, 71 healthy controls from the Alzheimer's disease neuroimaging initiative (ADNI) database and used 40 patients with multiple sclerosis [5].

Health is something expensive and desired by humans. Therefore, every human being wants to maintain his health in order to enjoy life comfortably. According to the World Health Organization, in its constitution, it defines health as "a state of complete physical, mental and social health". Today, it continues as an all-encompassing definition of health. This becomes the standard ideal condition for every human being [6]. Unfortunately, many humans cannot maintain their health, many diseases have emerged and harm the condition of the body, even in today's modern era, and many people suffer from diseases that are at risk of causing death.

Diabetes is one of the diseases most feared by humans, because diabetes can harm parts of the body and can even cause death. Diabetes defined as a disease caused by the inability of the pancreas to produce insulin, which controls sugar in the blood. Ignoring and unchecked this desease, it will be dangerous because complications can occur such as heart disease, stroke, glaucoma or cancer. In Indonesia, based on data released by the World Health Organization (WHO) in 2016, there were 99,400 deaths caused by diabetes, of which 48,300 deaths occurred in productive age [7]. Diabetes ranks as one of the most chronic diseases in the world. Statistics show that in 2013 about 382.8 million people between the ages of 20 and 79 diagnosed with diabetes worldwide. During 2013, 4.6 million deaths occurred and cost $ 548 billion in medical expenses [8].

Paper conducted by [7] Logistic Regression Method, also known as Logistic Regression analysis, this method often used in various fields, such as data mining, automatic disease diagnosis, economic prediction and other fields. [8]. Data used comes from the Pima Indians Diabetes Dataset. Pima Indian data itself comes from the National Institute of Diabetes, Digestive, and Kidney Disease by taking samples of 768 women of Pima Indian descent aged 21 years and over. Of the 768 data samples, 500 samples not diagnosed with diabetes while 268 samples diagnosed positive for diabetes. The data in the study divided into 75% for training data and 25% for testing data. In this paper, it resulted in an accuracy of 75.97%. Based on those problems, this paper conduct Website-Based Application for Classification of Diabetes Using Logistic Regression Method was conduct by the author based on those problems.

In this paper, the process of oversampling data carried out using the One Point Cross over technique. The addition of data carried out with the aim of balancing the number of patients detected with diabetes and non-diabetes. Data addition was done by taking 232 samples of patients with detected diabetes, then from these 232 samples a new sample was generated through the One Point Cross Over process, so that the number of samples became 1000, with 500 samples detected diabetes and 500 samples not diabetes. This oversampling technique expected to increase the accuracy of the logistic regression model, as done in this paper [10].

2. Reseach methodology

The reseach methodology used in completing the paper, which consists of:

1. Data collection / Acquaring

Retrieve and collect diabetes classification data. The data is secondary data taken from the internet.

2. Data Pre-porcessing

In the pre-processing stage, carried out several steps. Missing value process, correlation coefficient analysis, data cleaning process, data addition process and data partitioning process.

3. Requirement Analysis and Software Design

Analyze system requirements and identify information needs based on observations. The system implemented using the Python programming language.

4. Implementation

Implement logistic regression algorithms

5. Testing and Evaluation

The test includes testing the diabetes classification using Logistic Regression method. The testing results analyzed to obtain conclusions from the entire paper series.

2.1. Dataset

The dataset process by retrieving diabetes medical examination data from the internet. Data obtained from [11]. The data called "Pima Indian Diabetes Databases" published by the "National Institute of Diabetes and Digestive and Kidney Diseases". The data nine attributes and 768 samples. Of the 768 sample data, there are 500 samples of undetectable diabetes and 268 samples of detected diabetes.

Pima Indian data took samples of 768 people of Pima Indian descent between the ages of 21 years and over. The Pima Indian data has nine attributes. These are Pregnancies, Glucose, Blood Pressure, Skin Thickness, Insulin, Body Mass Index (BMI), Diabetes Pedigree Function, Age, and Outcome.

2.2. Data Pre Processing

In the pre-processing stage, carried out several steps. Missing value process, correlation coefficient analysis, data cleaning process, data addition process and data partitioning process. Mising value can be handle in several ways, in previous paper the detected data for the mising value replaced with a value of zero, in this paper, mising value was handle by replacing the data with the average value of each attribute. The data cleaning process carried out with the aim of deleting columns / attributes with the least correlation, the deleted attributes based on the correlation value of each data attribute of the independent variable to the dependent variable. Analysis of the correlation coefficient divided into several types, in this paper the Pearson correlation coefficient analysis used as in equation 1:

β∑xr-∑j∑ γ

F — —— ■

√∣2∑x^z-(∑x∣≡ √n∑Y"-(∑Y∣≡ (1)

r = Pearson Correlation

n = Number of Data

X = Independent Variable

Y = Dependent Variable

Table 1. Pearsons Correlation Result

	Pregnancies	Glucose	BloodPressure	SkinThickness	Insulin	BMI	Diabetes Pedigree Function	Age	Outcome
Pregnancies	1	0,149021347	0,24633527	0,033918637	-0,01811	0,08054	-0,01615	0,5382	0,24547
Glucose	0,14902135	1	0,219765132	0,158060093	0,396137	0,23146	0,13716	0,2667	0,49288
BloodPressure	0,24633527	0,219765132	1	0,130403192	0,010492	0,28122	0,00047	0,3268	0,16288
SkinThickness	0,03391864	0,158060093	0,130403192	1	0,24541	0,53255	0,1572	0,0206	0,17186
Insulin	-0,0181124	0,396136563	0,010491597	0,245410498	1	0,18992	0,15824	0,0377	0,1787
BMI	0,08053802	0,231463748	0,281221732	0,532551906	0,189919	1	0,15351	0,0257	0,31225
DiabetesPedigree Function	-0,01615088	0,137158309	0,000471125	0,157196074	0,158243	0,15351	1	0,0336	0,17384
Age	0,53816885	0,266673434	0,326791303	0,020582297	0,037676	0,02575	0,03356	1	0,23836
Outcome	0,24546646	0,492884103	0,162879099	0,171856814	0,178696	0,31225	0,17384	0,2384	1

From table 1, it known the lowest order of correlation for each variable on the dependent variable. In this paper, two independent variables with the lowest correlation value removed, therefore the Blood Pressure and Skin Thickness variables removed because they had low correlation values. Feature Selection is a method used to optimize the performance of the classifier. The way it works based on a large reduction in feature space, namely by eliminating the less relevant attributes and by using the feature selection algorithm to increase accuracy [12].

In this paper, "One Point Cross Over" technique was used to increase the amount of data as needed. There were 500 target data samples that had a value of 0 and 268 samples that had a value of 1. In order for the number of target data values to be the same, 500 data is 0

and 500 data is 1, it is necessary to do the One Point Cross Over technique. One Point Crossover is a data exchange technique that carried out by exchanging genes from one chromosome with another chromosome to produce new chromosomes through one intersection point [13]. The cut point is obtained by generating a random number with a limit of 1 to n (chromosome length. The resulting random number used as the chromosome intersection point. For example, two chromosomes have a length of 6 and the resulting random number is 4, then genes 1 to 4 will be cut with genes 5 to 6. Then, genes 1 to 4 on the first chromosome will be crossovered with genes 5 and 6 on the second chromosome, and vice versa.

∣ι∣ι∣o∣ι∣ι∣ι∣ -i I 11 11 o I ι⅛⅝1

IFWI J W∣⅞∣ i [F

Figure 1. One Point Crossover

The process of partitioning data in this paper carried out in two stages. The first was dividing the data into two parts, the independent variable or feature data, which usually symbolized by X, and the dependent variable or target data, which usually symbolized by Y. After dividing the independent and dependent variables, then the dependent and independent variable data are partitioned into two parts, training data and testing data, in this study 75% training data and 25% testing data were used, so that the data distribution became X training, Y training, and X testing, Y testing.

2.3. Logistic Regression Process

The process of Logistic Regression is divided into two stages, the first is to find the value of weights and constants and the second stage is to evaluate the Logistic Regression model with the weights and constants that have been found.

2.3.1 . Finding the Weight Value and Constant Value

The essence of linearly separable based machine learning algorithms is to find the weight and constant values in the equation of the line. To get the weight and constant values from training data used with equation 2

f(x) = wθ + wl * xl + w2 * x2 + ••■ ⅛7i * xn (2)

f(x) = logistic regression function

wQ = constant value

XI = independent variable

wi = wight value

Then initialize the weights and constants, perform mathematical operations according to the logistix regression equation. After getting the results, perform the sigmoid function operation with the formula 3

^yl=^^:(3)

y = sigmoid function as predicted value

f(x) = logistic regression function

Then calculate the error value with formula 4 as follows:

Error = y - y' (4)

y : dependent variable as true value

y' : sigmoid function as predicted value

update weights and constants value using equation 5 - 7:

coeff_update = (y — y') * y' * (1 — y') (5)

wθbaru = wθ+oc* coef_update (6)

wθbaru : updated constants value

K : learning rate (set as 0,3)

wd : initial constants value

coef_update : coefficient update

wbaru = w+κ* coef_update * x (7)

wbaru : updated weight value

w : initial weight value

x : independent variable

Repeat the process of finding the value of weights and constants and the iteration process will stop at the stoping criteria if the error value has reached 0.0001 and the maximum number of iterations in this research is 1000 iterations.

2.2.2 Evaluasi Logistics Regression

The evaluation of the logistic regression performance in this research uses a Confusion matrix. Confusion matrix is a tabular representation of the actual and predicted values of data as shown in Table 2 [14].

Tabel 2. Confusion Matrix

True Value Predicted Value

Class 1 Class 0

Class 1 TP FN

Class 0 FP TN

Each column in the matrix represents the prediction class, while each row represents the events in the actual class [15]. From the four sections, an evaluation will be generated in the form of accuracy, F1-score, precision, support and recall. Precision is used to measure the probability of classifier exactness, while recall is used to measure the probability of classifier completeness. Unlike precision and recall, the F1-score tries to compare the balance between precision and recall, while support is used to calculate the amount of data.

3. Results and Discussion

The implementation of this research is the application of the Logistic Regression method for diabetes classification techniques using streamlit as a library in python-based web development [16]. This section will explain the user interface of the application, the application made with the aim of making a diagnostic prediction whether a person has diabetes or not.

3.1. Performance Comparison of Logistic Regression Model

The performance of the logistic regression model in this research measured using four parameters, accuracy, precision, recall, and F1-Score. The higher the value of the four

evaluation parameters, the better the resulting model will be. With the same treatment as previous studies. Table 3 is a table of classification report results obtained from this research.

	Table 3. Classification Report
class	Precissi Rec F1- Supp on all Score ort
0 1	0,78 0,86 0,81 125 0,84 0,75 0,79 125
Accuracy	0,80 125

From the table 3, the precission value in class 0 is 78%, the precission value in class 1 is 84%, the recall value in class 0 is 86%, the f1-score value in class 0 is 81% and the f1-score value in class 1 by 79%. From this research, an accuracy of 80% is obtained, which means it is better than previous research.

Table 4. Comparison of logistic regression performance results

Evaluation	Result [7]	Result
Accuracy	75,97 %	80 %
Precision	76,92 %	84 %
Recall	51,72 %	75 %
F1-score	61,86 %	79 %

Table 4 is the result of a comparison between this research and previous research [7]. Based on the table, it`s known that this research has better classification reports than previous research. There are several process-engineering features carried out in this research, including handling misising values, feature selection, and data oversampling which in fact have an effect on the research results.

3.2. Application view

The main display contains several menus, including the data retrieval menu, the preprocessing data menu, the train test split menu, the Logistic Regression process menu, the Logistic Regression method evaluation menu, and the data prediction feature menu as shown in Figure 2. In Figure 3, the data collection menu displays the initial data used in this research, while the data in this menu is the same data as the previous research, "Pima Indian Diabetes Dataset".

Figure 2. Application View

Data Pima Indian Diabetes Datasets

	Pregnancies	Glucose	BloodPressure	SkinThickness	Insulin	BHI	Diab
0	6	148	72	35	O	33.6000	I
1	1	85	66	29	O	26.6000
2	S	183	64	0	O	23.3000
3	1	89	66	23	94	28.1000
4	0	137	40	35	168	43.1000
5	5	116	74	0	O	25.6000
6	3	78	50	32	88	31
7	10	115	O	0	O	35.3Θ0Θ
S	2	197	70	45	543	30.5000
9	S	125	96	0	0	0
10	-	110	92		0	37.6000
Informasi Data
	Pregnancies	Glucose	ElLoodPressuie	SkinThickness	Insulin		BMI
count	763	763	768	768		763	768
mean	3.8451	120.8945	69.1055	20.5365	79.	7995 31.	9926
std	3.3696	31.9726	19.3553	15.9522	115.	2440 7.	8842
min	0	0	0	O		0	0
25«	1	99	62	O		0 27.	300'3
5D⅛	3	117	72	23	30.	5000	32
75S	6	140.2500	80	32	127.	2500 36.	5500
max	17	199	122	99		846 67.	1000
			Figure 3. Data Retrieval Menu

3.3. Pre-Processing Menu

The Pre Processing menu contains the steps that are prepared for the Logistic Regression method process, as for the sub menu on the pre-processing menu, namely the mising value menu, the correlation coefficient menu, the data-cleaning menu, and oversampling data menu. The Mising Value menu contains improvements to the independent variable data, which is zero and replaced with the average value of each attribute. The Correlation Coefficient menu contains correlation data for each attribute; in the Correlation Coefficient Menu, you can also see the correlation order between each independent variable and the dependent variable. The Data Cleaning menu contains data through several stages, the data cleaning process with removing attributes no longer needed in this research. Data cleaning is one of most popular step in data mining, such [17] use data cleaning for gathering Tourism object in Bali.The next menu is oversampling, this menu displays additional data with the one point cross over technique, while additional data in this research made with the aim of balancing the number of class 0 and class 1. Figure 4 shows the display of one of the preprocessing menus.

Data Hasil MisingValue

	Pregnancies Glucose EloodPressuie			SkinThickness	Insulin	BMI Diab

2	S	183	64	20	79	23.3000
3	1	89	66	23	94	28.1000
4	3	137	4θ	35	168	43.1B00
5	5	116	74	20	79	25.6000
6	3	78	50	32	88	31
7	10	115	69	20	'79	35.3000
8	2	197	70	45	543	30.5000
9	8	125	96	20	79	31.9926
lθ	4	llθ	92	20	79	37.6000
11	10	168	74	20	'79	38

Q Lihat Informasi Data M		ising Value

	Pregnancies	Glucose EloodPresEure		SkinThickness	Insulin EMI
count	768	768	768	768		768 768
mean	4.2786	121.6758	72 ..2500	26.4479	118.:	2708 32.4508
std	3.0215	30.4363	12.. 1172	9.7339	93.:	2438 6.8754
min	1	44	24	7		14 18.2000
25S	2	99.7500	64	20		79 27.5000
50%	3	117	72	23		79 32
75%	6	140.2500	80	32	127.:	2500 36.6000
max	17	199	122	99		846 67.1000
		Figure 4.	One of the pre-processing		menus

3.4. Train Test Split menu

The train test split menu contains the distribution of train and test data, the training data functions to conduct training on machine learning, while the testing data functions to test the Logistic Regression model, in this menu, 75% of training data is used and 25% testing data. The display of the train test spit menu as shown in Figure 5.

Tabel DataTraining

	Pregnancies	Glucose	Insulin	BMI	DiabetesPedigreeFuncti-	Age	Outc
0	3	109	79	32.5000	0.2580	38	I
1	8	108	79	30.5000	0.3800	33
2	2	56	45	24.2Θ00	θ.3320	22
3	4	97	79	28.2000	0.4430	22
4	8	120	79	28.4000	0.2590	22
5	5	88	23	24.4000	θ.3420	30
6	3	125	79	31.6000	0.1510	24
7	8	151	210	42.9Θ00	0.5160	36
8	3	67	79	45.3000	Θ.1940	46
9	7	196	96	39.8000	0.5290	41
10	8	176	300	33.7Θ00	θ.4670	58

I ] Informasi Train Data

Figure 5. Train Test Split menu

3.5. Logistic Regression Process Menu

The Logistic Regression process menu contains the Logistic Regression prediction results; the prediction obtained from the sigmoid function in the Logistics Regression process. The Logistic Regression menu display design shown in Figure 6

Tabel Prediksi Data

	Insulin	BMI	DiabstesPedigieeFuncti...		Outcome	Outcome Piedict
0	94	33.3000	0.2610	23	0	0 I
1	79	25	0.2530	22	0	0
2	205	30.5000	0.8750	25	1	0
3	105	39.7000	0..2150	29	0	0
4	82	30.8000	0.8210	24	0	O
5	41	19.5000	0.4820	25	0	0
6	79	21	0.2070	37	0	0
7	130	32.7000	0.7190	36	1	1
8	145	34.5000	0.4030	40	1	O
9	275	27.7000	1.6000	25	0	1
10	176	30	1.3180	49	1	1

Keluar Dari Menu Proses

Figure 6. Logistic Regression Process Menu

3.6. Logistic Regression Method Evaluation Menu

The Logistic Regression Evaluation Menu contains information about the evaluation of the Logistic Regression method in the form of Confusion Matrix and Classification Report tables. Figure 7 is a display image of the Logistic Regression Method Evaluation Menu.

ai

Θ 9431

1 18107

precision recall fl-score support

©	0.78	Θ.86	0.81	125
1	0.84	Θ.75	0.79	125
accuracy			0.80	250

macro avgθ.81 0.80 0.80 250 weighted avg 0.81 0.80 0.80 250

Figure 7. Logistic Regression Method Evaluation Menu

3.7. Data Prediction Menu

The data prediction made with the aim that the user can provide manual input for each variable, and the user can find out the prediction results on the system. The data prediction menu display shown in Figure 8. Data prediction is the main key of machine learning. There are some algorithm can be used in machine learning. One of them is logistic regression. The other one such as support vector regression. Machine learning can solve problem in real life, such as

forecasting the Number of Traffic Accidents Using the Support Vector Regression Method [18] and many others.

Figure 8. Data Prediction Menu

4.

1.

2.

Conclusion

From this whole series of research, several conclusions can, including the following: In this research, the Logistics Regression method used for the classification of diabetes, and software made based on website using streamlit.

To make predictions using the Logistic Regression method consists of several stages, the first is the pre-processing of data, the Logistic Regression Process and the evaluation of the Logistic Regression method.

3. There are differences in pre-processing in this research with previous research, in previous reseach including data cleansing and oversampling data. Correlation coefficient process using for data cleaning process. Oversampling data using the One Point Crossover technique.
4. The results of the evaluation of the Logistic Regression method in this study obtained a predictive accuracy value of 80%, which means that it is better than previous studies with a predictive accuracy of 75.97%.

References

[1].

[2].

[3]

[4]

[5].

A. Roihan, P. A. Sunarya and A. S. Rafika, "Pemanfaatan Machine Learning dalam Berbagai Bidang," Indonesian Journal on Computer and Information Technology, vol. Vol. 5 No.1, pp. 75-82, 2020.

Marios Anthimopoulos, stargios chistodoulidis, Lucas ebner, Adreas christe dan Stavroula mougiakakou, “Lung Pattern Classification for Interstitial Luang Disiases Using a Deep Convulutional Neural Network”, IEEE Trans Med Imaging. 2016 May;35(5):1207-1216. doi: 10.1109/TMI.2016.2535865. Epub 2016 Feb 29.

Rifqi Hammad, Julia Kurniasih, Nur Fitrianingsih Hasan, Christin Nandari Dengen, Kusrini. “Prototipe Machine Learning untuk Prognosis Penyakit Demensia” IPTEK-KOM, Vol. 21 No. 1, Juni 2019: 17 - 29

Sana Rebbah, Daniel Delahaye, Stepane Puechmorel, Pierre Marechal, Florenco Nicol. “Classification of Multiple Sclerosis Patients Using a Histogram based KNN Algorithm”. OHBM 2019, pertemuan tahunan ke 25 organisasi pemetaan otak manusia, juni 2019, Roma, Italia.

F. D. Telaumbanua, P. Hulu, T. Z. Nadeak, R. R. Lumbantong and A. Dharma,

"Penggunaan Machine Learning Di Bidang Kesehatan," Jurnal Teknologi dan Ilmu Komputer Prima, vol. 3 No. 1, pp. 57-64, 2018.

[6] . D. Margaret F. Schulte, Healthcare Delivery in the U.S.A, New York: CRC Press, 2013.
[7] . F. I. Kurniadi dan P. K. Vinnia , “Perbandingan Regresi Linear dengan Heaviside

Activation Function dengan Logistic Regression untuk Klasifikasi Diabetes,” ULTIMATICS, Vol. 1, No. 1, pp. 7-10, 2018.

[8] M. Alotaibi and M. Albalawi, "A Mobile Gestational Diabetes Management and," 2018

9th IEEE Control and System Graduate Paper Colloquium (ICSGRC 2018), 3 - 4

August 2018, Shah Alam, Malaysia, pp. 193-196, 2018.

[9] . Y.-h. Wang, Y. Ou, X.-d. Deng, L.-r. Zhao dan C.-y. Zhang, “The Ship Collision

Accidents Based on Logistic Regression and Big Data,” The 31th Chinese Control and Decision Conference (2019 CCDC), pp. 4438-4440, 2019.

[10] M Soleh1, E R Djuwitaningrum1, M Ramli1 and M Indriasari, ”Feature engineering

strategies based on a One-p oint Crossover for fraud detection on Big Data Analytics” Published under licence by IOP Publishing Ltd, Journal of Physics: Conference Series, Volume 1566, 4th International Conference on Computing and Applied Informatics 2019 (ICCAI 2019) 26-27 November 2019, Medan, Indonesia

[11] https://www.kaggle.com/uciml/pima-indians-diabetes-database
[12] . O. Somantri dan M. Khambali, “Feature Selection Klasifikasi Kategori Cerita Pendek Menggunakan Naïve Bayes dan Algoritme Genetika,” J NTETI, Vol. %1 dari %26, No.3, pp. 301-306, 2017.
[13] J. Suryaputra, C. Lubis and T. Sutrisno, "PEMILIHAN CROSSOVER PADA

ALGORITMA GENETIKA UNTUK PROGRAM APLIKASI PENGENALAN KARAKTER TULISAN TANGAN," Jurnal Ilmu Komputer dan Sistem Informasi, pp. 69-72.

[14] A. Chowdhury, . S. Tejas and T. K, "Predicting whether songs will be hit using Logistic Regression," International Journal Of Engineering And Computer Science, vol. 6, p. 22434, 2017.
[15] M. Nawawi dan R. Marliansyah, “Klasifikasi Tingkat Popularitas Siswa Berdasarkan

Aktifitas Komunikasi Siswa Menggunakan Smartphone dengan Teknik Logistic

Regression,” Prosiding Annual Paper Seminar 2018 Computer Science and ICT, Vol. %1 dari %24, No.1, pp. 251-254, 2018.

[16] www.streamlit.io
[17] WIDIARI, Ni Putu Ayu; SUARJAYA, I Made Agus Dwi; GITHA, Dwi Putra. Teknik Data

Cleaning Menggunakan Snowflake untuk Studi Kasus Objek Pariwisata di Bali. Jurnal Ilmiah Merpati (Menara Penelitian Akademika Teknologi Informasi), [S.l.], p. 137-145, july 2020. ISSN 2685-2411

[18] APRIYANTI, Ni Putu Ratindia; PUTRA, I Ketut Gede Darma; PUTRA, I Made Suwija. Peramalan Jumlah Kecelakaan Lalu Lintas Menggunakan Metode Support Vector Regression. Jurnal Ilmiah Merpati (Menara Penelitian Akademika Teknologi Informasi), [S.l.], p. 72-80, june 2020. ISSN 2685-2411

Website-Based Application for Classification of Diabetes Using Logistic Regression Method 33

(Muhamad Soleh)

Website-Based Application for Classification of Diabetes Using Logistic Regression Method

Website-Based Application for Classification of Diabetes Using Logistic Regression Method

1. Introduction

2. Reseach methodology

2.1. Dataset

2.2. Data Pre Processing

∣ι∣ι∣o∣ι∣ι∣ι∣ -i I 11 11 o I ι⅛⅝1

2.3. Logistic Regression Process

2.3.1 . Finding the Weight Value and Constant Value

f(x) = wθ + wl * xl + w2 * x2 + ••■ ⅛7i * xn (2)

2.2.2 Evaluasi Logistics Regression

True Value Predicted ValueClass 1 Class 0

3. Results and Discussion

3.1. Performance Comparison of Logistic Regression Model

3.2. Application view

3.3. Pre-Processing Menu

3.4. Train Test Split menu

3.5. Logistic Regression Process Menu

3.6. Logistic Regression Method Evaluation Menu

3.7. Data Prediction Menu

4.

Conclusion

References

Discussion and feedback

True Value Predicted Value

Class 1 Class 0