Website-Based Application for Classification of Diabetes Using Logistic Regression Method
on
JURNAL ILMIAH MERPATI VOL. 9, NO. 1 APRIL 2021
p-ISSN: 2252-3006
e-ISSN: 2685-2411
Website-Based Application for Classification of Diabetes Using Logistic Regression Method
Muhamad Soleha1, Naufal Ammara2, Indrati Sukmadia3 aProgram Studi Informatika, Institut Teknologi Indonesia, Indonesia e-mail: 1[email protected], 2[email protected], 3[email protected]
Abstrak
Pembelajaran mesin merupakan salah satu ilmu yang mempelajari tentang bagaimana komputer mampu belajar dari data untuk meningkatkan kecerdasannya. Machine learning terdiri dari banyak metode klasifikasi, antara lain Neural Network, Support Vector Machine, Logistics Regression, dan lain – lain. Pada penelitian ini dilakukan proses klasifikasi dengan menggunakan metode Logistics Regression untuk kasus penyakit Diabetes. Diabetes adalah kenaikan glukosa dalam aliran darah karena kekurangan insulin yang bertanggung jawab untuk transfer glukosa dari darah ke jaringan atau sel. Penelitian ini dibuat dengan tujuan untuk memperbaiki penelitian sebelumnya. Data yang digunakan pada penelitian ini yaitu data yang sama dengan penelitian sebelumnya yang diterbitkan oleh Pima Indian Diabetes Dataset. Pada penelitian ini digunakan beberapa tahapan yaitu pre processing, proses, evaluasi, serta pengembangan aplikasi berbasis website. Data pada penelitian ini dibagi menjadi dua yaitu 75% untuk data training, dan 25% untuk data testing. Penelitian ini menghasilkan evaluasi dengan nilai akurasi sebesar 80%, yang itu berarti lebih baik dengan penelitian sebelumnya yaitu sebesar 75, 97%.
Kata kunci: Pembelajaran Mesin, Logistics Regression, Diabetes, Aplikasi, Website.
Abstract
Machine learning is a one of computer science field, machine-learning studies how computers are able to learn from data to improve their intelligence. Machine learning consists of many classification methods, including Neural Networks, Support Vector Machines, Logistics Regression, and others. In this study, a classification process carried out using the Logistics Regression method for cases of Diabetes. Diabetes is an increase in glucose in the bloodstream due to a lack of insulin, which is responsible for the transfer of glucose from the blood to tissues or cells. This study created with the aim of improving previous paper. The data used in this study are the same data as previous studies published by the Pima Indian Diabetes Dataset. In this study, several stages used, those are pre-processing, processing, evaluation, and website-based application development. The data in this study divided into two, 75% for training data, and 25% for testing data. This study produces an evaluation with an accuracy 80%, which means it is better than the previous paper, which is 75, 97%.
Keywords: Machine Learning, Logistics Regression, Diabetes, Application, Website.
Machine learning defined as computer applications and mathematical algorithms that adopted by learning derives from data and produces predictions in the future. The learning process to acquire intelligence that goes through two stages, training and testing. Build computer programs with the intelligence increases automatically based on experience [1].
Previous paperers have carried out paper on the use of machine learning in the health sector. As paper conducted by [2], this paper used Convolution Neural Network (CNN) method in classifying patterns and evaluated 120 cases from two different hospitals. This method produces a pattern more superior to other methods and faster in classifying diseases present in the lung with larger and clearer image result. In other paper conducted by [3] This study uses the forward chaining method used to develop a machine learning prototype for the prognosis of dementia, using literature data on examining patients who have been diagnosed with dementia, including examination of blood pressure, blood lipid levels, blood sugar levels, vesicular and
abdominal inspection. In this paper, successfully carried out and obtained by paperer, a prognosis solution for dementia according to optimal examination results. In another paper conducted by [4] in this paper using the KNN algorithm in solving problems, this study got an accuracy for distinguishing HC and MS subjects reaching 80%. In this paper included 111 subjects, 71 healthy controls from the Alzheimer's disease neuroimaging initiative (ADNI) database and used 40 patients with multiple sclerosis [5].
Health is something expensive and desired by humans. Therefore, every human being wants to maintain his health in order to enjoy life comfortably. According to the World Health Organization, in its constitution, it defines health as "a state of complete physical, mental and social health". Today, it continues as an all-encompassing definition of health. This becomes the standard ideal condition for every human being [6]. Unfortunately, many humans cannot maintain their health, many diseases have emerged and harm the condition of the body, even in today's modern era, and many people suffer from diseases that are at risk of causing death.
Diabetes is one of the diseases most feared by humans, because diabetes can harm parts of the body and can even cause death. Diabetes defined as a disease caused by the inability of the pancreas to produce insulin, which controls sugar in the blood. Ignoring and unchecked this desease, it will be dangerous because complications can occur such as heart disease, stroke, glaucoma or cancer. In Indonesia, based on data released by the World Health Organization (WHO) in 2016, there were 99,400 deaths caused by diabetes, of which 48,300 deaths occurred in productive age [7]. Diabetes ranks as one of the most chronic diseases in the world. Statistics show that in 2013 about 382.8 million people between the ages of 20 and 79 diagnosed with diabetes worldwide. During 2013, 4.6 million deaths occurred and cost $ 548 billion in medical expenses [8].
Paper conducted by [7] Logistic Regression Method, also known as Logistic Regression analysis, this method often used in various fields, such as data mining, automatic disease diagnosis, economic prediction and other fields. [8]. Data used comes from the Pima Indians Diabetes Dataset. Pima Indian data itself comes from the National Institute of Diabetes, Digestive, and Kidney Disease by taking samples of 768 women of Pima Indian descent aged 21 years and over. Of the 768 data samples, 500 samples not diagnosed with diabetes while 268 samples diagnosed positive for diabetes. The data in the study divided into 75% for training data and 25% for testing data. In this paper, it resulted in an accuracy of 75.97%. Based on those problems, this paper conduct Website-Based Application for Classification of Diabetes Using Logistic Regression Method was conduct by the author based on those problems.
In this paper, the process of oversampling data carried out using the One Point Cross over technique. The addition of data carried out with the aim of balancing the number of patients detected with diabetes and non-diabetes. Data addition was done by taking 232 samples of patients with detected diabetes, then from these 232 samples a new sample was generated through the One Point Cross Over process, so that the number of samples became 1000, with 500 samples detected diabetes and 500 samples not diabetes. This oversampling technique expected to increase the accuracy of the logistic regression model, as done in this paper [10].
The reseach methodology used in completing the paper, which consists of:
-
1. Data collection / Acquaring
Retrieve and collect diabetes classification data. The data is secondary data taken from the internet.
-
2. Data Pre-porcessing
In the pre-processing stage, carried out several steps. Missing value process, correlation coefficient analysis, data cleaning process, data addition process and data partitioning process.
-
3. Requirement Analysis and Software Design
Analyze system requirements and identify information needs based on observations. The system implemented using the Python programming language.
-
4. Implementation
Implement logistic regression algorithms
-
5. Testing and Evaluation
The test includes testing the diabetes classification using Logistic Regression method. The testing results analyzed to obtain conclusions from the entire paper series.
The dataset process by retrieving diabetes medical examination data from the internet. Data obtained from [11]. The data called "Pima Indian Diabetes Databases" published by the "National Institute of Diabetes and Digestive and Kidney Diseases". The data nine attributes and 768 samples. Of the 768 sample data, there are 500 samples of undetectable diabetes and 268 samples of detected diabetes.
Pima Indian data took samples of 768 people of Pima Indian descent between the ages of 21 years and over. The Pima Indian data has nine attributes. These are Pregnancies, Glucose, Blood Pressure, Skin Thickness, Insulin, Body Mass Index (BMI), Diabetes Pedigree Function, Age, and Outcome.
In the pre-processing stage, carried out several steps. Missing value process, correlation coefficient analysis, data cleaning process, data addition process and data partitioning process. Mising value can be handle in several ways, in previous paper the detected data for the mising value replaced with a value of zero, in this paper, mising value was handle by replacing the data with the average value of each attribute. The data cleaning process carried out with the aim of deleting columns / attributes with the least correlation, the deleted attributes based on the correlation value of each data attribute of the independent variable to the dependent variable. Analysis of the correlation coefficient divided into several types, in this paper the Pearson correlation coefficient analysis used as in equation 1:
β∑xr-∑j∑ γ
F — —— ■
√∣2∑xz-(∑x∣≡ √n∑Y"-(∑Y∣≡ (1)
r = Pearson Correlation
n = Number of Data
X = Independent Variable
Y = Dependent Variable
Table 1. Pearsons Correlation Result
Pregnancies |
Glucose |
BloodPressure |
SkinThickness |
Insulin |
BMI |
Diabetes Pedigree Function |
Age |
Outcome | |
Pregnancies |
1 |
0,149021347 |
0,24633527 |
0,033918637 |
-0,01811 |
0,08054 |
-0,01615 |
0,5382 |
0,24547 |
Glucose |
0,14902135 |
1 |
0,219765132 |
0,158060093 |
0,396137 |
0,23146 |
0,13716 |
0,2667 |
0,49288 |
BloodPressure |
0,24633527 |
0,219765132 |
1 |
0,130403192 |
0,010492 |
0,28122 |
0,00047 |
0,3268 |
0,16288 |
SkinThickness |
0,03391864 |
0,158060093 |
0,130403192 |
1 |
0,24541 |
0,53255 |
0,1572 |
0,0206 |
0,17186 |
Insulin |
-0,0181124 |
0,396136563 |
0,010491597 |
0,245410498 |
1 |
0,18992 |
0,15824 |
0,0377 |
0,1787 |
BMI |
0,08053802 |
0,231463748 |
0,281221732 |
0,532551906 |
0,189919 |
1 |
0,15351 |
0,0257 |
0,31225 |
DiabetesPedigree Function |
-0,01615088 |
0,137158309 |
0,000471125 |
0,157196074 |
0,158243 |
0,15351 |
1 |
0,0336 |
0,17384 |
Age |
0,53816885 |
0,266673434 |
0,326791303 |
0,020582297 |
0,037676 |
0,02575 |
0,03356 |
1 |
0,23836 |
Outcome |
0,24546646 |
0,492884103 |
0,162879099 |
0,171856814 |
0,178696 |
0,31225 |
0,17384 |
0,2384 |
1 |
From table 1, it known the lowest order of correlation for each variable on the dependent variable. In this paper, two independent variables with the lowest correlation value removed, therefore the Blood Pressure and Skin Thickness variables removed because they had low correlation values. Feature Selection is a method used to optimize the performance of the classifier. The way it works based on a large reduction in feature space, namely by eliminating the less relevant attributes and by using the feature selection algorithm to increase accuracy [12].
In this paper, "One Point Cross Over" technique was used to increase the amount of data as needed. There were 500 target data samples that had a value of 0 and 268 samples that had a value of 1. In order for the number of target data values to be the same, 500 data is 0
and 500 data is 1, it is necessary to do the One Point Cross Over technique. One Point Crossover is a data exchange technique that carried out by exchanging genes from one chromosome with another chromosome to produce new chromosomes through one intersection point [13]. The cut point is obtained by generating a random number with a limit of 1 to n (chromosome length. The resulting random number used as the chromosome intersection point. For example, two chromosomes have a length of 6 and the resulting random number is 4, then genes 1 to 4 will be cut with genes 5 to 6. Then, genes 1 to 4 on the first chromosome will be crossovered with genes 5 and 6 on the second chromosome, and vice versa.
∣ι∣ι∣o∣ι∣ι∣ι∣ -i I 11 11 o I ι⅛⅝1
IFWI J W∣⅞∣ i [F
-
Figure 1. One Point Crossover
The process of partitioning data in this paper carried out in two stages. The first was dividing the data into two parts, the independent variable or feature data, which usually symbolized by X, and the dependent variable or target data, which usually symbolized by Y. After dividing the independent and dependent variables, then the dependent and independent variable data are partitioned into two parts, training data and testing data, in this study 75% training data and 25% testing data were used, so that the data distribution became X training, Y training, and X testing, Y testing.
The process of Logistic Regression is divided into two stages, the first is to find the value of weights and constants and the second stage is to evaluate the Logistic Regression model with the weights and constants that have been found.
The essence of linearly separable based machine learning algorithms is to find the weight and constant values in the equation of the line. To get the weight and constant values from training data used with equation 2
f(x) = wθ + wl * xl + w2 * x2 + ••■ ⅛7i * xn (2)
f(x) = logistic regression function
wQ = constant value
XI = independent variable
wi = wight value
Then initialize the weights and constants, perform mathematical operations according to the logistix regression equation. After getting the results, perform the sigmoid function operation with the formula 3
yl=^^:(3)
y = sigmoid function as predicted value
f(x) = logistic regression function
Then calculate the error value with formula 4 as follows:
Error = y - y' (4)
y : dependent variable as true value
y' : sigmoid function as predicted value
update weights and constants value using equation 5 - 7:
coeff_update = (y — y') * y' * (1 — y') (5)
wθbaru = wθ+oc* coef_update (6)
wθbaru : updated constants value
K : learning rate (set as 0,3)
wd : initial constants value
coef_update : coefficient update
wbaru = w+κ* coef_update * x (7)
wbaru : updated weight value
w : initial weight value
x : independent variable
Repeat the process of finding the value of weights and constants and the iteration process will stop at the stoping criteria if the error value has reached 0.0001 and the maximum number of iterations in this research is 1000 iterations.
The evaluation of the logistic regression performance in this research uses a Confusion matrix. Confusion matrix is a tabular representation of the actual and predicted values of data as shown in Table 2 [14].
Tabel 2. Confusion Matrix
True Value Predicted Value
Class 1 Class 0
Class 1 TP FN
Class 0 FP TN
Each column in the matrix represents the prediction class, while each row represents the events in the actual class [15]. From the four sections, an evaluation will be generated in the form of accuracy, F1-score, precision, support and recall. Precision is used to measure the probability of classifier exactness, while recall is used to measure the probability of classifier completeness. Unlike precision and recall, the F1-score tries to compare the balance between precision and recall, while support is used to calculate the amount of data.
The implementation of this research is the application of the Logistic Regression method for diabetes classification techniques using streamlit as a library in python-based web development [16]. This section will explain the user interface of the application, the application made with the aim of making a diagnostic prediction whether a person has diabetes or not.
The performance of the logistic regression model in this research measured using four parameters, accuracy, precision, recall, and F1-Score. The higher the value of the four
evaluation parameters, the better the resulting model will be. With the same treatment as previous studies. Table 3 is a table of classification report results obtained from this research.
Table 3. Classification Report | |
class |
Precissi Rec F1- Supp on all Score ort |
0 1 |
0,78 0,86 0,81 125 0,84 0,75 0,79 125 |
Accuracy |
0,80 125 |
From the table 3, the precission value in class 0 is 78%, the precission value in class 1 is 84%, the recall value in class 0 is 86%, the f1-score value in class 0 is 81% and the f1-score value in class 1 by 79%. From this research, an accuracy of 80% is obtained, which means it is better than previous research.
Table 4. Comparison of logistic regression performance results
Evaluation |
Result [7] |
Result |
Accuracy |
75,97 % |
80 % |
Precision |
76,92 % |
84 % |
Recall |
51,72 % |
75 % |
F1-score |
61,86 % |
79 % |
Table 4 is the result of a comparison between this research and previous research [7]. Based on the table, it`s known that this research has better classification reports than previous research. There are several process-engineering features carried out in this research, including handling misising values, feature selection, and data oversampling which in fact have an effect on the research results.
The main display contains several menus, including the data retrieval menu, the preprocessing data menu, the train test split menu, the Logistic Regression process menu, the Logistic Regression method evaluation menu, and the data prediction feature menu as shown in Figure 2. In Figure 3, the data collection menu displays the initial data used in this research, while the data in this menu is the same data as the previous research, "Pima Indian Diabetes Dataset".
Figure 2. Application View
Data Pima Indian Diabetes Datasets
Pregnancies |
Glucose |
BloodPressure |
SkinThickness |
Insulin |
BHI |
Diab | |
0 |
6 |
148 |
72 |
35 |
O |
33.6000 |
I |
1 |
1 |
85 |
66 |
29 |
O |
26.6000 | |
2 |
S |
183 |
64 |
0 |
O |
23.3000 | |
3 |
1 |
89 |
66 |
23 |
94 |
28.1000 | |
4 |
0 |
137 |
40 |
35 |
168 |
43.1000 | |
5 |
5 |
116 |
74 |
0 |
O |
25.6000 | |
6 |
3 |
78 |
50 |
32 |
88 |
31 | |
7 |
10 |
115 |
O |
0 |
O |
35.3Θ0Θ | |
S |
2 |
197 |
70 |
45 |
543 |
30.5000 | |
9 |
S |
125 |
96 |
0 |
0 |
0 | |
10 |
- |
110 |
92 |
0 |
37.6000 | ||
Informasi Data | |||||||
Pregnancies |
Glucose |
ElLoodPressuie |
SkinThickness |
Insulin |
BMI | ||
count |
763 |
763 |
768 |
768 |
763 |
768 | |
mean |
3.8451 |
120.8945 |
69.1055 |
20.5365 |
79. |
7995 31. |
9926 |
std |
3.3696 |
31.9726 |
19.3553 |
15.9522 |
115. |
2440 7. |
8842 |
min |
0 |
0 |
0 |
O |
0 |
0 | |
25« |
1 |
99 |
62 |
O |
0 27. |
300'3 | |
5D⅛ |
3 |
117 |
72 |
23 |
30. |
5000 |
32 |
75S |
6 |
140.2500 |
80 |
32 |
127. |
2500 36. |
5500 |
max |
17 |
199 |
122 |
99 |
846 67. |
1000 | |
Figure 3. Data Retrieval Menu |
The Pre Processing menu contains the steps that are prepared for the Logistic Regression method process, as for the sub menu on the pre-processing menu, namely the mising value menu, the correlation coefficient menu, the data-cleaning menu, and oversampling data menu. The Mising Value menu contains improvements to the independent variable data, which is zero and replaced with the average value of each attribute. The Correlation Coefficient menu contains correlation data for each attribute; in the Correlation Coefficient Menu, you can also see the correlation order between each independent variable and the dependent variable. The Data Cleaning menu contains data through several stages, the data cleaning process with removing attributes no longer needed in this research. Data cleaning is one of most popular step in data mining, such [17] use data cleaning for gathering Tourism object in Bali.The next menu is oversampling, this menu displays additional data with the one point cross over technique, while additional data in this research made with the aim of balancing the number of class 0 and class 1. Figure 4 shows the display of one of the preprocessing menus.
Data Hasil MisingValue
Pregnancies Glucose EloodPressuie |
SkinThickness |
Insulin |
BMI Diab | |||
2 |
S |
183 |
64 |
20 |
79 |
23.3000 |
3 |
1 |
89 |
66 |
23 |
94 |
28.1000 |
4 |
3 |
137 |
4θ |
35 |
168 |
43.1B00 |
5 |
5 |
116 |
74 |
20 |
79 |
25.6000 |
6 |
3 |
78 |
50 |
32 |
88 |
31 |
7 |
10 |
115 |
69 |
20 |
'79 |
35.3000 |
8 |
2 |
197 |
70 |
45 |
543 |
30.5000 |
9 |
8 |
125 |
96 |
20 |
79 |
31.9926 |
lθ |
4 |
llθ |
92 |
20 |
79 |
37.6000 |
11 |
10 |
168 |
74 |
20 |
'79 |
38 |
Q Lihat Informasi Data M |
ising Value | |||||
Pregnancies |
Glucose EloodPresEure |
SkinThickness |
Insulin EMI | |||
count |
768 |
768 |
768 |
768 |
768 768 | |
mean |
4.2786 |
121.6758 |
72 ..2500 |
26.4479 |
118.: |
2708 32.4508 |
std |
3.0215 |
30.4363 |
12.. 1172 |
9.7339 |
93.: |
2438 6.8754 |
min |
1 |
44 |
24 |
7 |
14 18.2000 | |
25S |
2 |
99.7500 |
64 |
20 |
79 27.5000 | |
50% |
3 |
117 |
72 |
23 |
79 32 | |
75% |
6 |
140.2500 |
80 |
32 |
127.: |
2500 36.6000 |
max |
17 |
199 |
122 |
99 |
846 67.1000 | |
Figure 4. |
One of the pre-processing |
menus |
The train test split menu contains the distribution of train and test data, the training data functions to conduct training on machine learning, while the testing data functions to test the Logistic Regression model, in this menu, 75% of training data is used and 25% testing data. The display of the train test spit menu as shown in Figure 5.
Tabel DataTraining
Pregnancies |
Glucose |
Insulin |
BMI |
DiabetesPedigreeFuncti- |
Age |
Outc | |
0 |
3 |
109 |
79 |
32.5000 |
0.2580 |
38 |
I |
1 |
8 |
108 |
79 |
30.5000 |
0.3800 |
33 | |
2 |
2 |
56 |
45 |
24.2Θ00 |
θ.3320 |
22 | |
3 |
4 |
97 |
79 |
28.2000 |
0.4430 |
22 | |
4 |
8 |
120 |
79 |
28.4000 |
0.2590 |
22 | |
5 |
5 |
88 |
23 |
24.4000 |
θ.3420 |
30 | |
6 |
3 |
125 |
79 |
31.6000 |
0.1510 |
24 | |
7 |
8 |
151 |
210 |
42.9Θ00 |
0.5160 |
36 | |
8 |
3 |
67 |
79 |
45.3000 |
Θ.1940 |
46 | |
9 |
7 |
196 |
96 |
39.8000 |
0.5290 |
41 | |
10 |
8 |
176 |
300 |
33.7Θ00 |
θ.4670 |
58 |
I ] Informasi Train Data
-
Figure 5. Train Test Split menu
The Logistic Regression process menu contains the Logistic Regression prediction results; the prediction obtained from the sigmoid function in the Logistics Regression process. The Logistic Regression menu display design shown in Figure 6
Tabel Prediksi Data
Insulin |
BMI |
DiabstesPedigieeFuncti... |
Outcome |
Outcome Piedict | ||
0 |
94 |
33.3000 |
0.2610 |
23 |
0 |
0 I |
1 |
79 |
25 |
0.2530 |
22 |
0 |
0 |
2 |
205 |
30.5000 |
0.8750 |
25 |
1 |
0 |
3 |
105 |
39.7000 |
0..2150 |
29 |
0 |
0 |
4 |
82 |
30.8000 |
0.8210 |
24 |
0 |
O |
5 |
41 |
19.5000 |
0.4820 |
25 |
0 |
0 |
6 |
79 |
21 |
0.2070 |
37 |
0 |
0 |
7 |
130 |
32.7000 |
0.7190 |
36 |
1 |
1 |
8 |
145 |
34.5000 |
0.4030 |
40 |
1 |
O |
9 |
275 |
27.7000 |
1.6000 |
25 |
0 |
1 |
10 |
176 |
30 |
1.3180 |
49 |
1 |
1 |
Keluar Dari Menu Proses
-
Figure 6. Logistic Regression Process Menu
The Logistic Regression Evaluation Menu contains information about the evaluation of the Logistic Regression method in the form of Confusion Matrix and Classification Report tables. Figure 7 is a display image of the Logistic Regression Method Evaluation Menu.
precision recall fl-score support
© |
0.78 |
Θ.86 |
0.81 |
125 |
1 |
0.84 |
Θ.75 |
0.79 |
125 |
accuracy |
0.80 |
250 |
macro avgθ.81 0.80 0.80 250 weighted avg 0.81 0.80 0.80 250
-
Figure 7. Logistic Regression Method Evaluation Menu
The data prediction made with the aim that the user can provide manual input for each variable, and the user can find out the prediction results on the system. The data prediction menu display shown in Figure 8. Data prediction is the main key of machine learning. There are some algorithm can be used in machine learning. One of them is logistic regression. The other one such as support vector regression. Machine learning can solve problem in real life, such as
forecasting the Number of Traffic Accidents Using the Support Vector Regression Method [18] and many others.
Figure 8. Data Prediction Menu
1.
2.
Conclusion
From this whole series of research, several conclusions can, including the following: In this research, the Logistics Regression method used for the classification of diabetes, and software made based on website using streamlit.
To make predictions using the Logistic Regression method consists of several stages, the first is the pre-processing of data, the Logistic Regression Process and the evaluation of the Logistic Regression method.
-
3. There are differences in pre-processing in this research with previous research, in previous reseach including data cleansing and oversampling data. Correlation coefficient process using for data cleaning process. Oversampling data using the One Point Crossover technique.
-
4. The results of the evaluation of the Logistic Regression method in this study obtained a predictive accuracy value of 80%, which means that it is better than previous studies with a predictive accuracy of 75.97%.
[1].
[2].
[3]
[4]
[5].
A. Roihan, P. A. Sunarya and A. S. Rafika, "Pemanfaatan Machine Learning dalam Berbagai Bidang," Indonesian Journal on Computer and Information Technology, vol. Vol. 5 No.1, pp. 75-82, 2020.
Marios Anthimopoulos, stargios chistodoulidis, Lucas ebner, Adreas christe dan Stavroula mougiakakou, “Lung Pattern Classification for Interstitial Luang Disiases Using a Deep Convulutional Neural Network”, IEEE Trans Med Imaging. 2016 May;35(5):1207-1216. doi: 10.1109/TMI.2016.2535865. Epub 2016 Feb 29.
Rifqi Hammad, Julia Kurniasih, Nur Fitrianingsih Hasan, Christin Nandari Dengen, Kusrini. “Prototipe Machine Learning untuk Prognosis Penyakit Demensia” IPTEK-KOM, Vol. 21 No. 1, Juni 2019: 17 - 29
Sana Rebbah, Daniel Delahaye, Stepane Puechmorel, Pierre Marechal, Florenco Nicol. “Classification of Multiple Sclerosis Patients Using a Histogram based KNN Algorithm”. OHBM 2019, pertemuan tahunan ke 25 organisasi pemetaan otak manusia, juni 2019, Roma, Italia.
F. D. Telaumbanua, P. Hulu, T. Z. Nadeak, R. R. Lumbantong and A. Dharma,
"Penggunaan Machine Learning Di Bidang Kesehatan," Jurnal Teknologi dan Ilmu Komputer Prima, vol. 3 No. 1, pp. 57-64, 2018.
-
[6] . D. Margaret F. Schulte, Healthcare Delivery in the U.S.A, New York: CRC Press, 2013.
-
[7] . F. I. Kurniadi dan P. K. Vinnia , “Perbandingan Regresi Linear dengan Heaviside
Activation Function dengan Logistic Regression untuk Klasifikasi Diabetes,” ULTIMATICS, Vol. 1, No. 1, pp. 7-10, 2018.
-
[8] M. Alotaibi and M. Albalawi, "A Mobile Gestational Diabetes Management and," 2018
9th IEEE Control and System Graduate Paper Colloquium (ICSGRC 2018), 3 - 4
August 2018, Shah Alam, Malaysia, pp. 193-196, 2018.
-
[9] . Y.-h. Wang, Y. Ou, X.-d. Deng, L.-r. Zhao dan C.-y. Zhang, “The Ship Collision
Accidents Based on Logistic Regression and Big Data,” The 31th Chinese Control and Decision Conference (2019 CCDC), pp. 4438-4440, 2019.
-
[10] M Soleh1, E R Djuwitaningrum1, M Ramli1 and M Indriasari, ”Feature engineering
strategies based on a One-p oint Crossover for fraud detection on Big Data Analytics” Published under licence by IOP Publishing Ltd, Journal of Physics: Conference Series, Volume 1566, 4th International Conference on Computing and Applied Informatics 2019 (ICCAI 2019) 26-27 November 2019, Medan, Indonesia
-
[11] https://www.kaggle.com/uciml/pima-indians-diabetes-database
-
[12] . O. Somantri dan M. Khambali, “Feature Selection Klasifikasi Kategori Cerita Pendek Menggunakan Naïve Bayes dan Algoritme Genetika,” J NTETI, Vol. %1 dari %26, No.3, pp. 301-306, 2017.
-
[13] J. Suryaputra, C. Lubis and T. Sutrisno, "PEMILIHAN CROSSOVER PADA
ALGORITMA GENETIKA UNTUK PROGRAM APLIKASI PENGENALAN KARAKTER TULISAN TANGAN," Jurnal Ilmu Komputer dan Sistem Informasi, pp. 69-72.
-
[14] A. Chowdhury, . S. Tejas and T. K, "Predicting whether songs will be hit using Logistic Regression," International Journal Of Engineering And Computer Science, vol. 6, p. 22434, 2017.
-
[15] M. Nawawi dan R. Marliansyah, “Klasifikasi Tingkat Popularitas Siswa Berdasarkan
Aktifitas Komunikasi Siswa Menggunakan Smartphone dengan Teknik Logistic
Regression,” Prosiding Annual Paper Seminar 2018 Computer Science and ICT, Vol. %1 dari %24, No.1, pp. 251-254, 2018.
-
[16] www.streamlit.io
-
[17] WIDIARI, Ni Putu Ayu; SUARJAYA, I Made Agus Dwi; GITHA, Dwi Putra. Teknik Data
Cleaning Menggunakan Snowflake untuk Studi Kasus Objek Pariwisata di Bali. Jurnal Ilmiah Merpati (Menara Penelitian Akademika Teknologi Informasi), [S.l.], p. 137-145, july 2020. ISSN 2685-2411
-
[18] APRIYANTI, Ni Putu Ratindia; PUTRA, I Ketut Gede Darma; PUTRA, I Made Suwija. Peramalan Jumlah Kecelakaan Lalu Lintas Menggunakan Metode Support Vector Regression. Jurnal Ilmiah Merpati (Menara Penelitian Akademika Teknologi Informasi), [S.l.], p. 72-80, june 2020. ISSN 2685-2411
Website-Based Application for Classification of Diabetes Using Logistic Regression Method 33
(Muhamad Soleh)
Discussion and feedback