Efficiency Comparison in Prediction of Normalization with Data Mining Classification
Volume 6, Issue 4, Page No 130–137, 2021
Adv. Sci. Technol. Eng. Syst. J. 6(4), 130–137 (2021);
DOI: 10.25046/aj060415
Keywords: Artificial Neural Network, Binary Logistic Regression, Decimal Scaling Normalization, Decision Tree, K-Nearest Neighbor, Naïve Bayes, Statistical Column Normalization, Support Vector Machine, Z-Score Normalization
In research project, efficiency comparison study in prediction of normalization with data mining classification. The purpose of the research was to compare three normalization methods in term of classification accuracy that the normalized data provided: Z-Score, Decimal Scaling and Statistical Column. The six known classifications: K-Nearest Neighbor, Decision Tree, Artificial Neural Network, Support Vector Machine, Naïve Bayes, and Binary Logistic Regression were used to evaluate the normalization methods. The six studied data sets were into two groups. Those data sets were data sets of White wine quality, Pima Indians diabetes, and Vertebral column of which data were 1-5 variables of the outlier coefficient of variation and data sets of Indian liver disease, Working hours, and Avocado of which data were 6-10 variables of the outlier coefficient of variation. The result of comparison White wine quality and Vertebral column, the best efficiency method had many methods in a non-systematic way. For the data set of Pima Indians diabetes and Indian liver disease, Statistical Column and classification by K-Nearest Neighbor was the best efficiency. For the data set of Working hours, Decimal Scaling and classification by K-Nearest Neighbor was the best efficiency. For the data set of Avocado, Statistical Column and classification by K-Nearest Neighbor, Z-Score and Decimal Scaling and classification by Binary Logistic Regression were the best efficiency. All of normalization and classification methods, Statistical Column and classification by K-Nearest Neighbor was the best efficiency by precision.
1. Introduction
Nowadays, advances in information technology have conveyed to the storing of large amount of data. However, most of data usage is still extracting data from database. The knowledge gained from this data analysis can be of great use in organizational operations and decision making. Data mining can be operated in many forms depending on the objective of data mining. Classification is a modeling for categorical data from pre-classified data to use that model to classify new data that has not previously been classified [1]. In addition, data mining is a method of extracting knowledge from different data to utilize that knowledge in decision making. Such knowledge may be used to predict or create models for classifying or displaying relationships between different units, which data mining can be applied in many organizations, for example: finance, insurance, medical, etc. Today there is a lot of interesting research or exploration. In the process of working on those researches, researchers often use statistical methods to analyze data and draw conclusions for those researches in further revision or development. In order to obtain data, the data collected can be disorganized, often causing problems. Each variable has different values ranging from little, medium, and very different. If those data were analyzed, the result would be differed from the truth. As a result, the assumptions were not met and the data could not be used in the best way. One way to manage this problem is transformation or normalization, using a simple mathematical method to adapt the collected data to a new and standardized form, for example: Z-Score, Median, Min-Max, Decimal Scaling and Statistical Column [2].
From the first literature review, in Malaysia, there is investigate the use of three normalizations in prediction of degue, for example: Min-Max, Z-Score and Decima Scaling. These methods in prediction model are consisted of Support Vector Machine (SVM) and Artificial Neural Network (ANN). The comparison results considered the accuracy of prediction and mean square error (MSE). The results show that SVM and ANN had the maximum accuracy and the minimum MSE for Decimal Scaling, Min-Max, and Z-Score respectively. Nevertheless, SVM is a better prediction as compared to the ANN [3]. The second, comparative analysis of K-Nearest Neighbor (KNN) with various k using Min-Max and Z-Score with R programming. The average accuracy was about 88% for Min-Max and 79% for Z-Score [4]. Finally, the efficiency of normalizations was compared. The main objective of this research was to compare four normalization methods in terms of classification accuracy that the normalized data provided. Those methods were the following: Min-Max, Z-Score, Decimal Scaling, and Median. Four data sets and three classifications by K-NN, Naïve Bayes, and ANN were used to evaluate the normalization methods. For the conclusion of the dataset of White wine quality, normalization by Decimal Scaling and classification by K-NN were the best combination. For the dataset of Pima Indians diabetes, normalization by Decimal Scaling and classification by ANN were the best combination. For the dataset of Vertebral column, normalization by Decimal Scaling and classification by K-NN were the best combination. For the dataset of Indian liver patient, normalization by Decimal Scaling and classification by Naïve Bayes were the best combination. We assume that the best normalization method was the Decimal Scaling and classification by K-NN [5].
In this research, three normalizations were studied; Z-Score, Decimal Scaling, and Statistical Column and were carried out with four classification methods which were regularly use; K-NN, Decision Tree, ANN and SVM. The other two proposed classification methods were Naïve Bayes and Binary Logistic Regression to compared the most accuracy efficiency in prediction of normalization with classification by R programming.
2. Experimental Methods
The experimental methods are systematic and scientific approach to research. Here, they consisted of data collection and research procedures [5].
2.1. Data Collection
Data collection is three step methods: gathering, measuring and analyzing the accuracy of the data for research by standard checked methods [5]. Six secondary data sets were collected from website UCI.com, Kaggle.com and Mldata.com as followed:
- White wine quality, total number of data 1,500 values with 1-5 variables of the outlier coefficient of variation [6].
- Pima Indians diabetes, total number of data 768 values with 1-5 variables of the outlier coefficient of variation [7].
- Vertebral column, total number of data 310 values with 1-5 variables of the outlier coefficient of variation [8].
- Indian liver patient, total number of data 575 values with 6-10 variables of the outlier coefficient of variation [9].
- Working hours, total number of data 956 values with 6-10 variables of the outlier coefficient of variatio.n
- Avocado, total number of data 1,149 values with 6-10 variables of the outlier coefficient of variation [11].
The data set consisted of 2 parts: data sets 1-3 contained 1-5 variables of the outlier coefficient of variation and data sets 4-6 contained 6-10 variables of the outlier coefficient of variation.
2.2. Research Procedures
Research procedures are the specific methodology or techniques used to identify, select, process, and analyze information [5]. Here, they consisted of normalization, data sets partitioning method, data analysis and efficiency comparison in prediction of classification.
2.2.1. Normalization
Z-Score using R program, Decimal Scaling and Statistical Column using Excel program were performed normalization.
2.2.2. Data Sets Partitioning Method
Dividing the data set into 2 sets and randomly 5 rounds by specifying the random seed as 10, 20, 30, 40 and 50 in the ratio of 70:30 which is commonly used in the data mining research. Part 1, training data set was applied to build a model by 70 percent. For part 2, testing data set was applied to test a model by 30 percent [12]-[16] as followed in table 1.
Table 1: Result of six data sets partition.
| Data set | Total number of data set | Total number of training data set (70 percent) | Total number of testing data set (30 percent) |
| White wine
Quality
|
1,500 |
1,050 |
450 |
| Pima Indian
diabetes
|
768 |
537 |
231 |
| Vertebral column |
310 |
217 |
93 |
| Indian liver
Patient
|
575 |
402 |
173 |
| Working hours |
956 |
669 |
287 |
| Avocado
|
1,149 |
804 |
345 |
2.2.3. Data Analysis
Data analysis is the method of applying statistical data to describe, explain and appraise data [5].
2.2.3.1. Normalization
Normalization is the method of improving values using measured on the different scale to the same scale. It permits analogy of related values of different data. There are many normalizations, for example: Z-Score, Median, Min-Max, Decimal Scaling and Statistical Column. In this research, we interested in three normalizations as follows [4].
1) Z-Score Normalization
This method, the data are subtracted from the mean and divided by the standard deviation of sample for every style on training data to transform each input style into the new data . The normalization formula is as follows [17];
2) Decimal Scaling Normalization
The decimal scaling normalization method transforms the original value of the data as a decimal number. The decimal position is defined by the maximum absolute value as follows [18].
where j is the number of positions of the largest value.
3) Statistical Column Normalization
The statistical column normalization method transforms every column with a normalized column value, . Compute the normalization of every column by subtracting the data with a normalized column value to a length of one. Then, compute every column by dividing a normalized column value and multiplied by 0.1 which is biased as follows. [17]
Classification is the method of specifying and managing individual values into a set. Then, it is applied to predict a model of testing data after training data as follows [4].
1) K-Nearest Neighbor
K-Nearest Neighbor (KNN) is a very popular method as it is a simple and effective method that can be used to many tasks such as classification and missing value replacement. It uses the IBk algorithm [19]. The first, the data set must be prepared and scaled into a normalized scale. Then, the Euclidean distance is computed between two points [4].
2) Decision Tree
The tree used in decision support is an upside-down tree structure with roots at the top and leaves at the bottom. Within the tree there are nodes, each of which represents a decision based on the attributes. The branches of the tree represent the values or results obtained from the test, and the leaves at the bottom of the decision tree represent class or results. The top node is called the root node. Here, the decision tree decided to use the J48 (C4.5) algorithm [20].
3) Artificial Neural Network
Artificial Neural Network (ANN) is technology developed from artificial intelligence research to calculation of function values from data groups. ANN is the method for machines to learn from a prototype and then train the system to think and solve broader problems. The structure of ANN consists of input and output node. Processing is distributed in a layered structure, namely input, output and hidden layer. ANN processing relies on the transmission of work through the nodes of these layers. Here, the ANN decided to use the Multilayer Perceptron algorithm [21], [22].
4) Support Vector Machine
The goal of this method is a supervised learning that a highly general classiflier can be built. That is, it can be work well with unknown database with the data formatting process from the low dimensional data set on the input space is in the high dimensional data set on the feature space using a function to format the data, known as the kernel function. This capability makes it easier to construct a quadratic data classiflier on a feature space for classification. In addition, a good classiflier should have a linear structure and be able to create the distance area between the classiflier and the closest value of each group to be effective in separating each type of data set from one another. The appropriate line is called the optimal separating hyperplane. Here, the support vector machine decided to use the Sequential Minimal Optimization (SMO) algorithm [18].
5) Naïve Bayes
The first proposed classification method was Naïve Bayes. It will use an analysis of the probability of things that have not happened before, based on the predictions of what has happened before. A simple form of relationships is as follows [23];
From Bayes equation, if one is to predict the class C when attribute A is known, it can be calculated from the probability of attribute A with the class C in training data set and probability of attribute A and class C.
6) Binary Logistic Regression
The second proposed classification method was Binary logistic regression. It is a regression analysis in which the dependent variable is a qualitative variable with only two values while the independent variable can be either a quantitative or a qualitative variable, or may be both a quantitative and qualitative variable. The binary logistic regression analysis method has no distribution conditions for independent variables, and there is no conditions of the variance and covariance matrix for each group, and this method predicts probabilities that each unit is in a specific group [24];
From the above equation, the relationship between the independent and the dependent variables is nonlinear. Therefore, the relationship is adjusted in a linear form as follows:
If the odd ratio is greater than 1, then the probability of an event of success is greater than an event of failure.
The right hand side of the above equation is in a linear form, called the logit response function.
2.2.4. Efficiency Comparison in Prediction of Classifications
The analysis results of three normalization methods were used by six classifications to compare the efficiency in prediction from the accuracy as follows:
where True Positive (TP) is the number of exactly classified as positive, the real value is positive. True Negative (TN) is the number of exactly classified as negative, the real value is negative. False Positive (FP) is the number of mistakenly classified as positive, the real value is negative and False Negative (FN) is the number of mistakenly classified as negative, the real value is positive [25].
Flowchart showed the step of experimental methods as follows in figure 1. The process started from six secondary data sets were collected from website. Therefore, normalization is the method of improving values using measured on the different scale to the same scale. There are three normalizations, for example: Z-Score, Decimal Scaling and Statistical Column. After that, data set were divided into 2 sets and randomly 5 rounds by specifying the random seed as 10, 20, 30, 40 and 50 in the ratio of 70:30. Part 1 the training data was applied to built a model using 70 percent. For part 2 the testing data was applied to test a model using 30 percent. Then, classification was applied to predict a model of testing data after training data. Classification consisted of six methods, for example: K-Nearest Neighbor, Decision Tree, Artificial Neural Network, Support Vector Machine, Naïve Bayes and Binary Logistic Regression. Finally, the analysis results of three normalization methods were used by six classifications to compare the efficiency in prediction from the accuracy.
Figure 1: Flowchart of Experimental Methods
K-NN = K-Nearest Neighbor
DT = Decision Tree
ANN = Artificial Neural Network
SVM = Support Vector Machine
NB = Naïve Bayes
BLR = Binary Logistic Regression
3. Results and Discussions
3.1. White Wine Quality Data Set
As shown in Table 2, if Z-Score is used, classification by Decision Tree, Artificial Neural Network, Support Vector Machine and Binary Logistic Regression had the maximum accuracy at 100 percent. But if Decimal Scaling is used, classification by K-Nearest Neighbor, Decision Tree, Artificial Neural Network, Support Vector Machine and Binary Logistic Regression had the maximum accuracy at 100 percent. If Statistical Column is used, classification by K-Nearest Neighbor, Support Vector Machine and Binary Logistic Regression Binary Logistic Regression had the maximum accuracy at 100 percent.
Table 2: The results of efficiency comparison in white wine quality data using Z-Score, Decimal Scaling and Statistical Column with Classification for K-Nearest Neighbor, Decision Tree, Artificial Neural Network, Support Vector Machine, Naïve Bayes and Binary Logistic Regression.
| Classification | Normalization | ||
| Z-Score | Decimal Scaling | Statistical Column | |
| K-Nearest Neighbor | 93.4222 | 100 | 100 |
| Decision Tree
|
100 | 100 | 92 |
| Artificial Neural Network | 100 | 100 | 56.1231 |
| Support Vector Machine | 100 | 100 | 100 |
| Naïve Bayes
|
98.8446 | 99.0235 | 69.8728 |
| Binary Logistic Regression | 100 | 100 | 100 |
3.2. Pima Indians Diabetes Data Set
As shown in Table 3, if Z-Score is used, classification by Binary Logistic Regression had the maximum accuracy at 77.7320 percent. But if Decimal Scaling is used, classification by Decision Tree had the maximum accuracy at 79.2208 percent. If Statistical Column is used, classification by K-Nearest Neighbor had the maximum accuracy at 81.7316 percent. All the normalization and classification are compared, the Statistical Column Normalization and K-Nearest Neighbor classification had the maximum accuracy.
Table 3: The results of efficiency comparison in Pima Indians diabetes data using Z-Score, Decimal Scaling and Statistical Column with Classification for K-Nearest Neighbor, Decision Tree, Artificial Neural Network, Support Vector Machine, Naïve Bayes and Binary Logistic Regression.
| Classification | Normalization | ||
| Z-Score | Decimal Scaling | Statistical Column | |
| K-Nearest Neighbor | 68.4848 | 69.4373 | 81.7316 |
| Decision Tree
|
74.4589 | 79.2208 | 69.6969 |
| Artificial Neural Network | 77.0043 | 77.2824 | 65.3877 |
| Support Vector Machine | 76.9500 | 76.9500 | 67.8400 |
| Naïve Bayes
|
73.6111 | 64.4787 | 66.3375 |
| Binary Logistic Regression | 77.7320 | 72.2247 | 69.0573 |
3.3. Vertebral Column Data Set
As shown in Table 4, if Z-Score is used, classification by Binary Logistic Regression had the maximum accuracy at 86.5807 percent. But if Decimal Scaling and Statistical Column are used, classification by K-Nearest Neighbor and Decision Tree had the maximum accuracy at 100 percent. All the normalization and classification are compared, Decimal Scaling, Statistical Column Normalization and K-Nearest Neighbor classification or Decimal Scaling, Statistical Column Normalization and Decision Tree classification had the maximum accuracy.
Table 4: The results of efficiency comparison in Vertebral column data using Z-Score, Decimal Scaling and Statistical Column with Classification for K-Nearest Neighbor, Decision Tree, Artificial Neural Network, Support Machine, Naïve Bayes and Binary Logistic Regression.
| Classification | Normalization | ||
| Z-Score | Decimal Scaling | Statistical Column | |
| K-Nearest Neighbor | 81.9355 | 100 | 100 |
| Decision Tree
|
67.7419 | 100 | 100 |
| Artificial Neural Network | 83.4513 | 83.9367 | 80.5782 |
| Support Vector Machine | 76.9500 | 76.9500 | 67.8400 |
| Naïve Bayes
|
74.5348 | 76.6055 | 81.9705 |
| Binary Logistic Regression | 86.5807 | 73.9175 | 83.6670 |
3.4. Indian Liver Disease Data Set
As shown in Table 5, if Z-Score and Decimal Scaling are used, classification by Binary Logistic Regression had the maximum accuracy at 73.1029 and 73.1054 percent respectively. But if Statistical Column is used, classification by K-Nearest Neighbor had the maximum accuracy at 99.6531 percent. All the normalization and classification are compared, Statistical Column Normalization and K-Nearest Neighbor classification had the maximum accuracy.
Table 5: The results of efficiency comparison in Indian liver disease data using Z-Score, Decimal Scaling and Statistical Column with Classification for K-Nearest Neighbor, Decision Tree, Artificial Neural Network, Support Vector Machine, Naïve Bayes and Binary Logistic Regression
| Classification | Normalization | ||
| Z-Score | Decimal Scaling | Statistical Column | |
| K-Nearest Neighbor | 64.1619 | 61.3341 | 99.6531 |
| Decision Tree
|
68.2080 | 67.0520 | 69.3641 |
| Artificial Neural Network | 65.7435 | 70.2721 | 73.7657 |
| Support Vector Machine | 70.9700 | 70.9700 | 70.9700 |
| Naïve Bayes
|
63.2700 | 71.1385 | 60.9860 |
| Binary Logistic Regression | 73.1029 | 73.1054 | 72.9447 |
3.5. Working Hours Data Set
As shown in Table 6, if Z-Score and Statistical Column are used, classification by Naïve Bayes had the maximum accuracy at 79.5518 and 99.7138 percent respectively. But if Decimal Scaling is used, classification by K-Nearest Neighbor had the maximum accuracy at 100 percent. All the normalization and classification are compared, Decimal Scaling Normalization and K-Nearest Neighbor classification had the maximum accuracy.
Table 6: The results of efficiency comparison in Working hours data using Z-Score, Decimal Scaling and Statistical Column with Classification for K- Nearest Neighbor, Decision Tree, Artificial Neural Network, Support Vector Machine, Naïve Bayes and Binary Logistic Regression.
| Classification | Normalization | ||
| Z-Score | Decimal Scaling | Statistical Column | |
| K-Nearest Neighbor | 71.6376 | 100 | 99.5818 |
| Decision Tree
|
73.5191 | 73.5191 | 72.8223 |
| Artificial Neural Network | 78.1346 | 78.0755 | 54.1547 |
| Support Vector Machine | 74.6300 | 74.4800 | 65.5200 |
| Naïve Bayes
|
79.5518 | 78.8154 | 99.7138 |
| Binary Logistic Regression | 74.6093 | 74.9622 | 73.5478 |
3.6. Avocado Data Set
As shown in Table 7, if Z-Score and Decimal Scaling are used, classification by Binary Logistic Regression had the maximum accuracy at the same 100 percent. If Statistical Column is used, classification by K-Nearest Neighbor had the maximum accuracy at 100 percent. All the normalization and classification are compared, Statistical Column Normalization and K-Nearest Neighbor classification or Z-Score, Decimal Scaling Normalization and Binary Logistic Regression classification had the maximum accuracy.
Table 7: The results of efficiency comparison in Avocado data using Z-Score, Decimal Scaling and Statistical Column with Classification for K- Nearest Neighbor, Decision Tree, Artificial Neural Network, Support Vector Machine, Naïve Bayes and Binary Logistic Regression.
| Classification | Normalization | ||
| Z-Score | Decimal Scaling | Statistical Column | |
| K-Nearest Neighbor | 99.7101 | 85.3333 | 100 |
| Decision Tree
|
66.3768 | 66.3768 | 66.3768 |
| Artificial Neural Network | 99.2691 | 99.3303 | 99.4265 |
| Support Vector Machine | 96.0300 | 96.0300 | 99.6300 |
| Naïve Bayes
|
90.0744 | 89.4525 | 99.5136 |
| Binary Logistic Regression | 100 | 100 | 99.9748 |
The result of study of efficiency comparison in prediction of normalization with data mining classification for data set with 1-5 variables of the outlier coefficient of variation were White wine quality, Pima Indians diabetes and Vertebral column. The best efficiency method was Statistical Column Normalization and classification by K-Nearest Neighbor, and Decimal Scaling Normalization and classification by Decision Tree and K-Nearest Neighbor respectively. For the dataset with 6-10 variables of the outlier coefficient of variation were Indian liver disease, Working hours and Avocado. The best efficiency method is Statistical Column Normalization and classification by K-Nearest Neighbor and Decimal Scaling Normalization and classification by K-Nearest Neighbor respectively which was similar to the research of T. Malai et al. (2021) found that the best method was Decimal Scaling Normalization and classification by K-Nearest Neighbor.
Table 8: The results of efficiency comparison all data using Z-Score, Decimal Scaling and Statistical Column with Classification for K-Nearest Neighbor, Decision Tree, Artificial Neural Network, Support Vector Machine, Naïve Bayes and Binary Logistic Regression.
| Data | Classification | Normalization | ||
| Z-Score | Decimal Scaling | Statistical Column | ||
| White Wine Quality | – K-Nearest Neighbor
– Decision Tree – Artificial Neural Network – Support Vector Machine – Naïve Bayes – Binary Logistic Regression |
ü ü
ü
ü |
ü
ü ü
ü
ü |
ü
ü
ü |
| Pima Indians Diabetes | – K-Nearest Neighbor
– Decision Tree – Artificial Neural Network – Support Vector Machine – Naïve Bayes – Binary Logistic Regression |
ü | ||
| Vertebral Column | – K-Nearest Neighbor
– Decision Tree – Artificial Neural Network – Support Vector Machine – Naïve Bayes – Binary Logistic Regression |
ü
ü |
ü
ü |
|
| Indian Liver Disease | – K-Nearest Neighbor
– Decision Tree – Artificial Neural Network – Support Vector Machine – Naïve Bayes – Binary Logistic Regression |
ü | ||
| Working Hours
|
– K-Nearest Neighbor
– Decision Tree – Artificial Neural Network – Support Vector Machine – Naïve Bayes – Binary Logistic Regression |
ü | ||
| Avocado | – K-Nearest Neighbor
– Decision Tree – Artificial Neural Network – Support Vector Machine – Naïve Bayes – Binary Logistic Regression |
ü |
ü |
ü |
ü = the best accuracy for each data set
As shown in Table 8, White wine quality data set, the highest efficiency methods were Decimal Scaling, Statistical Column and classification by K-Nearest Neighbor; Z-Score, Decimal Scaling and classification by Decision Tree and Artificial Neural Network and Z-Score, Decimal Scaling, Statistical Column and classification by Support Vector Machine and Binary Logistic Regression. Pima Indians diabetes data set, the maximum efficiency method was Statistical Column and classification by K-Nearest Neighbor. Vertebral column data set, the maximum efficiency method was Decimal Scaling, Statistical Column and classification by K-Nearest Neighbor and Decision Tree. Indian liver disease data set, the maximum efficiency method was Statistical Column and classification by K-Nearest Neighbor. Working hours data set, the maximum efficiency method was Decimal Scaling and classification by K-Nearest Neighbor. Avocado data set, the maximum efficiency method was Statistical Column and classification by K-Nearest Neighbor. The another maximum efficiency methods were Z-Score, Decimal Scaling and classification by Binary Logistic Regression.
4. Conclusion
In summary, White wine quality data and Vertebral column data, the maximum efficiency method have many methods in a non-systematic way. Pima Indians diabetes data and Indian liver data, the maximum efficiency method was Statistical Column and classification by K-Nearest Neighbor. Vertebral column data, the maximum efficiency method was Decimal Scaling, Statistical Column and classification by K-Nearest Neighbor and Decision Tree. Working hours data, the maximum efficiency method was Decimal Scaling and classification by K-Nearest Neighbor. Avocado data, the maximum efficiency method was Statistical Column and classification by K-Nearest Neighbor and the another maximum efficiency methods were Z-Score, Decimal Scaling and classification by Binary Logistic Regression. All of normalization and classification methods, Statistical Column and classification by K-Nearest Neighbor was the best efficiency by precision. This finding of Statistical Column and classification by K-Nearest Neighbor can be applied in many fields of medical, public health and science in real world problem.
Conflict of Interest
The author announce no conflict of interest.
Acknowledgment
I thank the School of Science, King Mongkut’s Institute of Technology Ladkrabang for funding research project, help and support on efficiency comparison in prediction of normalization with data mining classification.
- S. Euawattanamongkol, Data mining, National Institute of Development Administration Publisher, 2016.
- N. Kratethong, Transformation to normal distribution, Master’s Degree Thesis in Statistics, Department of Statistics, Faculty of Commerce and Account, Chulalongkorn University, 1999.
- Z. Mustaffa, Y.A. Yusof, “Comparison of normalization techniques in predicting dengue outbreak,” in 2010 International Conference on Business and Economics Research, 1, 345-349, IACSIT Press, Kuala Lumpur, Malaysia, 2011.
- A. Pandey, A. Jain, “Comparative analysis of KNN algorithm using various normalization techniques,” International Journal Computer Network and Information Security, 11, 36-42, 2017, doi:10.5815/ijcnis.2017.11.04.
- T. Malai, P. Ninthanom, S. Sinsomboonthong, “Performance comparison of transformation methods in data mining classification technique,” Thai Journal of Science and Technology, 10(1), 510-522, 2021.
- P. Cortez, A. Cerdeira, F. Almeida, T. Matos, J. Reis, Wine quality data set, [Online], Available : https://archive.ics.uci.edu/ml/datasets/Wine+Quality, 2009.
- J.W. Smith, J.E. Everhart, W.C. Dickson, W.C. Knowler, R.S. Johannes, Pima Indians diabetes database, [Online], Available : https://www.kaggle.com/uciml/pima-indians-diabetes-database, 1988.
- H.D. Mota, Vertebral column data set, [Online], Available : https://www.kaggle.com/caesarlupum/vertebralcolumndataset, 2011.
- B.V. Ramana, Indian liver patient, [Online], Available : https://www.mldata.io/dataset-details/indian_liver_patient/, 2012.
- L. Myoung, Working hours, [Online], Available : https://rdrr.io/rforge/Ecdat/man/Workinghours.html, 1995.
- J. Kiggins, Avocado prices, [Online], Available : https://www.kaggle.com/neuromusic/avocado-prices, 2018.
- R. Shams, Creating training, validation and test sets (data preprocessing), [Online], Available : https://www.youtube.com/watch?v=uiDFa7iY9yo, 2014.
- P. Thongpool, P. Jamrueng, R. Boonrit, S. Sinsomboonthong, “Performance comparison in prediction of imbalanced data in data mining classification,” Thai Journal of Science and Technology, 8(6), 565-584, 2019.
- S. Sinsomboonthong, “An efficiency comparison in prediction of imbalanced data classification with data mining techniques,” Thai Journal of Science and Technology, 8(3), 383-393, 2019.
- N. Phonchan, P. Jaimeetham, S. Sinsomboonthong, “Clustering efficiency comparison of outliers data in data mining,” Thai Journal of Science and Technology, 9(5), 589-602, 2020.
- S. Sinsomboonthong, “An efficiency comparison in prediction of outlier six classifications,” Thai Journal of Science and Technology, 9(3), 255-268, 2020.
- T. Jayalakshmi, A. Santhakumaran, “Statistical normalization and back propagation for classification,” International Journal of Computer Theory and Engineering, 3(1), 89-93, 2011.
- J. Han, M. Kamber, Data mining concepts and techniques, 2nd ed, Morgan Kaufmann, 2006.
- O.G. Troyanskaya, M. Cantor, G. Sherlock, O. Patrick, P.O. Brown, “Missing value estimation methods for DNA microarrays,” Bioinformatics, 17(6), 520-525, 2011.
- R. Thammasombat, Decision support system for mobile internet package selection using decision tree, Ph. D Thesis, Business Computer, Faculty of Business Administration, Ratchapruek College, 2012.
- K. Waiyamai, C. Songsiri, T. Rakthammanon, “Using data mining techniques to improve the quality of education for students of the faculty of engineering,” The NECTEC Technical Journal, 11(3), 134-142, 2011.
- D.T. Larose, Discovering knowledge in data : an introduction to data mining, John Wiley & Sons, 2005.
- D.T. Larose, Data mining methods and models, John Wiley & Sons, 2005.
- K. Wanichbancha, Multivariate data analysis, Thammasarn Co Ltd, 2009.
- S. Sripaaraya, S. Sinsomboonthong, “Efficiency comparison of classifications for chronic kidney disease : a case study hospital in India,” Journal of Science and Technology, 25(5), 839-853, 2017.
- John Tsiligaridis, "Tree-Based Ensemble Models, Algorithms and Performance Measures for Classification", Advances in Science, Technology and Engineering Systems Journal, vol. 8, no. 6, pp. 19–25, 2023. doi: 10.25046/aj080603
- Mario Cuomo, Federica Massimi, Francesco Benedetto, "Detecting CTC Attack in IoMT Communications using Deep Learning Approach", Advances in Science, Technology and Engineering Systems Journal, vol. 8, no. 2, pp. 130–138, 2023. doi: 10.25046/aj080215
- Temsamani Khallouk Yassine, Achchab Said, Laouami Lamia, Faridi Mohammed, "Hybrid Discriminant Neural Networks for Performance Job Prediction", Advances in Science, Technology and Engineering Systems Journal, vol. 8, no. 2, pp. 116–122, 2023. doi: 10.25046/aj080213
- Mohammed Al-Shanfari, "The First Application of the Multistage One-Shot Decision-Making Approach to Reevaluate a Technology Project Decision Problem", Advances in Science, Technology and Engineering Systems Journal, vol. 8, no. 2, pp. 97–107, 2023. doi: 10.25046/aj080211
- Lu Xiong, Spendylove Duncan-Williams, "Generalized Linear Model for Predicting the Credit Card Default Payment Risk", Advances in Science, Technology and Engineering Systems Journal, vol. 7, no. 3, pp. 51–61, 2022. doi: 10.25046/aj070306
- Hanae Naoum, Sidi Mohamed Benslimane, Mounir Boukadoum, "Encompassing Chaos in Brain-inspired Neural Network Models for Substance Identification and Breast Cancer Detection", Advances in Science, Technology and Engineering Systems Journal, vol. 7, no. 3, pp. 32–43, 2022. doi: 10.25046/aj070304
- Mostafa Yacoub, Amira Rezk, Mohamed Senousy, "Enhancing Decision Trees for Data Stream Mining", Advances in Science, Technology and Engineering Systems Journal, vol. 6, no. 5, pp. 330–334, 2021. doi: 10.25046/aj060537
- Osaretin Eboya, Julia Binti Juremi, "iDRP Framework: An Intelligent Malware Exploration Framework for Big Data and Internet of Things (IoT) Ecosystem", Advances in Science, Technology and Engineering Systems Journal, vol. 6, no. 5, pp. 185–202, 2021. doi: 10.25046/aj060521
- Radwan Qasrawi, Stephanny VicunaPolo, Diala Abu Al-Halawa, Sameh Hallaq, Ziad Abdeen, "Predicting School Children Academic Performance Using Machine Learning Techniques", Advances in Science, Technology and Engineering Systems Journal, vol. 6, no. 5, pp. 08–15, 2021. doi: 10.25046/aj060502
- Kwun-Ping Lai, Jackie Chun-Sing Ho, Wai Lam, "Exploiting Domain-Aware Aspect Similarity for Multi-Source Cross-Domain Sentiment Classification", Advances in Science, Technology and Engineering Systems Journal, vol. 6, no. 4, pp. 01–12, 2021. doi: 10.25046/aj060401
- Basavaraj Madagouda, R. Sumathi, "Artificial Neural Network Approach using Mobile Agent for Localization in Wireless Sensor Networks", Advances in Science, Technology and Engineering Systems Journal, vol. 6, no. 1, pp. 1137–1144, 2021. doi: 10.25046/aj0601127
- Mohammed Hamim, Ismail El Moudden, Hicham Moutachaouik, Mustapha Hain, "Gene Selection for Cancer Classification: A New Hybrid Filter-C5.0 Approach for Breast Cancer Risk Prediction", Advances in Science, Technology and Engineering Systems Journal, vol. 6, no. 1, pp. 871–878, 2021. doi: 10.25046/aj060196
- Dionisius Saviordo Thenuardi, Benfano Soewito, "Indoor Positioning System using WKNN and LSTM Combined via Ensemble Learning", Advances in Science, Technology and Engineering Systems Journal, vol. 6, no. 1, pp. 242–249, 2021. doi: 10.25046/aj060127
- Sara Ftaimi, Tomader Mazri, "Handling Priority Data in Smart Transportation System by using Support Vector Machine Algorithm", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 6, pp. 1422–1427, 2020. doi: 10.25046/aj0506172
- Mark Renier M. Bailon, Lawrence Materum, "Comparison of Support Vector Machine-Based Equalizer and Code-Aided Expectation Maximization on Fiber Optic Nonlinearity Compensation Using a Proposed BER Normalized by Power and Distance Index", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 6, pp. 738–743, 2020. doi: 10.25046/aj050689
- Khalid Chennoufi, Mohammed Ferfra, "Fast and Efficient Maximum Power Point Tracking Controller for Photovoltaic Modules", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 6, pp. 606–612, 2020. doi: 10.25046/aj050674
- Rafael Mellado-Silva, Antonio Faúndez-Ugalde, María Blanco-Lobos, "Effective Learning of Tax Regulations using Different Chatbot Techniques", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 6, pp. 439–446, 2020. doi: 10.25046/aj050652
- Suni S S, K Gopakumar, "Dense SIFT–Flow based Architecture for Recognizing Hand Gestures", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 5, pp. 944–954, 2020. doi: 10.25046/aj0505115
- Rajesh Kumar, Geetha S, "Malware Classification Using XGboost-Gradient Boosted Decision Tree", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 5, pp. 536–549, 2020. doi: 10.25046/aj050566
- Jajam Haerul Jaman, Rasdi Abdulrohman, Aries Suharso, Nina Sulistiowati, Indah Purnama Dewi, "Sentiment Analysis on Utilizing Online Transportation of Indonesian Customers Using Tweets in the Normal Era and the Pandemic Covid-19 Era with Support Vector Machine", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 5, pp. 389–394, 2020. doi: 10.25046/aj050549
- Haytham Azmi, "FPGA Acceleration of Tree-based Learning Algorithms", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 5, pp. 237–244, 2020. doi: 10.25046/aj050529
- Amar Choudhary, Deependra Pandey, Saurabh Bhardwaj, "Overview of Solar Radiation Estimation Techniques with Development of Solar Radiation Model Using Artificial Neural Network", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 4, pp. 589–593, 2020. doi: 10.25046/aj050469
- Katleho Moloi, Yskandar Hamam, Jacobus Andries Jordaan, "A Support Vector Machine Based Technique for Fault Detection in A Power Distribution Integrated System with Renewable Energy Distributed Generation", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 4, pp. 577–588, 2020. doi: 10.25046/aj050468
- Deborah Ooi Yee Hui, Syaheerah Lebai Lutfi, Syibrah Naim, Zahid Akhtar, Ahmad Sufril Azlan Mohamed, Kamran Siddique, "The Sound of Trust: Towards Modelling Computational Trust using Voice-only Cues at Zero-Acquaintance", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 4, pp. 469–476, 2020. doi: 10.25046/aj050456
- Nurhafifah Matondang, Nico Surantha, "Effects of Oversampling SMOTE in the Classification of Hypertensive Dataset", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 4, pp. 432–437, 2020. doi: 10.25046/aj050451
- Mba Obasi Odim, Adewale Opeoluwa Ogunde, Bosede Oyenike Oguntunde, Samuel Ayodele Phillips, "Exploring the Performance Characteristics of the Naïve Bayes Classifier in the Sentiment Analysis of an Airline’s Social Media Data", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 4, pp. 266–272, 2020. doi: 10.25046/aj050433
- Jesuretnam Josemila Baby, James Rose Jeba, "A Hybrid Approach for Intrusion Detection using Integrated K-Means based ANN with PSO Optimization", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 3, pp. 317–323, 2020. doi: 10.25046/aj050341
- Md. Imdadul Hoque, Abul kalam Azad, Mohammad Abu Hurayra Tuhin, Zayed Us Salehin, "University Students Result Analysis and Prediction System by Decision Tree Algorithm", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 3, pp. 115–122, 2020. doi: 10.25046/aj050315
- Yeji Shin, Youngone Cho, Hyun Wook Kang, Jin-Gu Kang, Jin-Woo Jung, "Neural Network-based Efficient Measurement Method on Upside Down Orientation of a Digital Document", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 2, pp. 697–702, 2020. doi: 10.25046/aj050286
- Jude B. Rola, Cherry Lyn C. Sta. Romana, Larmie S. Feliscuzo, Ivy Fe M. Lopez, Cherry N. Rola, "A Comparative Analysis of ARIMA and Feed-Forward Neural Network Prognostic Model for Bull Services", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 2, pp. 411–418, 2020. doi: 10.25046/aj050253
- Halima Begum, Muhammed Mazharul Islam, "A Study on the Effects of Combining Different Features for the Recognition of Handwritten Bangla Characters", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 2, pp. 197–203, 2020. doi: 10.25046/aj050225
- Audrey Huong, Xavier Ngu, "Skin Tissue Oxygen Saturation Prediction: A Comparison Study of Artificial Intelligence Techniques", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 1, pp. 334–339, 2020. doi: 10.25046/aj050142
- Ivan P. Yamshchikov, Alexey Tikhonov, "Learning Literary Style End-to-end with Artificial Neural Networks", Advances in Science, Technology and Engineering Systems Journal, vol. 4, no. 6, pp. 115–125, 2019. doi: 10.25046/aj040614
- Mohamad Faiz Ahmad, Syed Sahal Nazli Alhady, Ooi Zhu Oon, Wan Amir Fuad Wajdi Othman, Aeizaal Azman Abdul Wahab, Ahmad Afiq Muhammad Zahir, "Embedded Artificial Neural Network FPGA Controlled Cart", Advances in Science, Technology and Engineering Systems Journal, vol. 4, no. 4, pp. 509–516, 2019. doi: 10.25046/aj040461
- Maryam Butt, Golshah Naghdy, Fazel Naghdy, Geoffrey Murray, Haiping Du, "Investigating The Detection of Intention Signal During Different Exercise Protocols in Robot-Assisted Hand Movement of Stroke Patients and Healthy Subjects Using EEG-BCI System", Advances in Science, Technology and Engineering Systems Journal, vol. 4, no. 4, pp. 300–307, 2019. doi: 10.25046/aj040438
- Md Nasimuzzaman Chowdhury, Ken Ferens, "A Support Vector Machine Cost Function in Simulated Annealing for Network Intrusion Detection", Advances in Science, Technology and Engineering Systems Journal, vol. 4, no. 3, pp. 260–277, 2019. doi: 10.25046/aj040334
- Yuliana Tanulia, Abba Suganda Girsang, "Sentiment Analysis on Twitter for Predicting Stock Exchange Movement", Advances in Science, Technology and Engineering Systems Journal, vol. 4, no. 3, pp. 244–250, 2019. doi: 10.25046/aj040332
- Eris Riso, Abba Suganda Girsang, "Talk Show’s Business Intelligence on Television by Using Social Media Data in Indonesia", Advances in Science, Technology and Engineering Systems Journal, vol. 4, no. 1, pp. 311–316, 2019. doi: 10.25046/aj040130
- Erwin Bezembinder, Luc Wismans, Eric van Berkum, "An Approach for Determining Rules used to Select Viable Junction Design Alternatives Based on Multiple Objectives", Advances in Science, Technology and Engineering Systems Journal, vol. 3, no. 5, pp. 407–420, 2018. doi: 10.25046/aj030547
- Tamarafinide Victory Dittimi, Ching Yee Suen, "Modified HOG Descriptor-Based Banknote Recognition System", Advances in Science, Technology and Engineering Systems Journal, vol. 3, no. 5, pp. 354–364, 2018. doi: 10.25046/aj030541
- Shin-ichi Ito, Momoyo Ito, Minoru Fukumi, "An Electroencephalogram Analysis Method to Detect Preference Patterns Using Gray Association Degrees and Support Vector Machines", Advances in Science, Technology and Engineering Systems Journal, vol. 3, no. 5, pp. 105–108, 2018. doi: 10.25046/aj030514
- Alaa Hamza Omran, Yaser Muhammad Abid, "Design of smart chess board that can predict the next position based on FPGA", Advances in Science, Technology and Engineering Systems Journal, vol. 3, no. 4, pp. 193–200, 2018. doi: 10.25046/aj030417
- Rasel Ahmmed, Md. Asadur Rahman, Md. Foisal Hossain, "An Advanced Algorithm Combining SVM and ANN Classifiers to Categorize Tumor with Position from Brain MRI Images", Advances in Science, Technology and Engineering Systems Journal, vol. 3, no. 2, pp. 40–48, 2018. doi: 10.25046/aj030205
- Sougata Sheet, Anupam Ghosh, Sudhindu Bikash Mandal, "Cancer Mediating Genes Recognition using Multilayer Perceptron Model- An Application on Human Leukemia", Advances in Science, Technology and Engineering Systems Journal, vol. 3, no. 2, pp. 8–20, 2018. doi: 10.25046/aj030202
- Marwa Farouk Ibrahim Ibrahim, Adel Ali Al-Jumaily, "Auto-Encoder based Deep Learning for Surface Electromyography Signal Processing", Advances in Science, Technology and Engineering Systems Journal, vol. 3, no. 1, pp. 94–102, 2018. doi: 10.25046/aj030111
- Ruijian Zhang, Deren Li, "Applying Machine Learning and High Performance Computing to Water Quality Assessment and Prediction", Advances in Science, Technology and Engineering Systems Journal, vol. 2, no. 6, pp. 285–289, 2017. doi: 10.25046/aj020635
- M. Raihan, Saikat Mondal, Pritam Khan Boni, Arun More, Md. Omar Faruqe Sagor, "Smartphone Based Heart Attack Risk Prediction System with Statistical Analysis and Data Mining Approaches", Advances in Science, Technology and Engineering Systems Journal, vol. 2, no. 3, pp. 1815–1822, 2017. doi: 10.25046/aj0203221
- Diego Peluffo-Ordóñez, Paul Rosero-Montalvo, Ana Umaquinga-Criollo, Luis Suárez-Zambrano, Hernan Domínguez-Limaico, Omar Oña-Rocha, Stefany Flores-Armas, Edgar Maya-Olalla, "Theoretical developments for interpreting kernel spectral clustering from alternative viewpoints", Advances in Science, Technology and Engineering Systems Journal, vol. 2, no. 3, pp. 1670–1676, 2017. doi: 10.25046/aj0203208
- Muhammad Asif Manzoor, Yasser Morgan, "Support Vector Machine based Vehicle Make and Model Recognition System", Advances in Science, Technology and Engineering Systems Journal, vol. 2, no. 3, pp. 1080–1085, 2017. doi: 10.25046/aj0203137
- Jawad Ahmad, Ammar Mohsin Butt, Muhammad Tanveer Riaz, Shoaib Bhutta, Muhammad Zeeshan Khan, Inam-Ul-Haq, "Multiclass Myoelectric Identification of Five Fingers Motion using Artificial Neural Network and Support Vector Machine", Advances in Science, Technology and Engineering Systems Journal, vol. 2, no. 3, pp. 1026–1033, 2017. doi: 10.25046/aj0203130
- Muhammad Asif Manzoor, Yasser Morgan, "Network Intrusion Detection System using Apache Storm", Advances in Science, Technology and Engineering Systems Journal, vol. 2, no. 3, pp. 812–818, 2017. doi: 10.25046/aj0203102
- Veena Divya K, Anand Jatti, Revan Joshi P, Sabah Meharaj, "Computer Aided Classification using Support Vector Machines in Detecting Cysts of Jaws", Advances in Science, Technology and Engineering Systems Journal, vol. 2, no. 3, pp. 674–677, 2017. doi: 10.25046/aj020386
- Mazen Ghandour, Hui Liu, Norbert Stoll, Kerstin Thurow, "Human Robot Interaction for Hybrid Collision Avoidance System for Indoor Mobile Robots", Advances in Science, Technology and Engineering Systems Journal, vol. 2, no. 3, pp. 650–657, 2017. doi: 10.25046/aj020383
- Marwa Farouk Ibrahim Ibrahim, Adel Ali Al-Jumaily, "Self-Organizing Map based Feature Learning in Bio-Signal Processing", Advances in Science, Technology and Engineering Systems Journal, vol. 2, no. 3, pp. 505–512, 2017. doi: 10.25046/aj020365
- Turgay Yalcin, Muammer Ozdemir, "Computational Intelligence Methods for Identifying Voltage Sag in Smart Grid", Advances in Science, Technology and Engineering Systems Journal, vol. 2, no. 3, pp. 412–419, 2017. doi: 10.25046/aj020353
- Estefanía D. Avalos-Rivera, Alberto de J. Pastrana-Palma, "Classifying region of interests from mammograms with breast cancer into BIRADS using Artificial Neural Networks", Advances in Science, Technology and Engineering Systems Journal, vol. 2, no. 3, pp. 233–240, 2017. doi: 10.25046/aj020332
- Adolfo J. Cangas, Noelia Navarro, Juan J. Ojeda, Diego Cangas, Jose A. Piedra, José Gallego, "Assessment of the usefulness and appeal of stigma-stop by psychology students: a serious game designed to reduce the stigma of mental illness", Advances in Science, Technology and Engineering Systems Journal, vol. 2, no. 3, pp. 186–190, 2017. doi: 10.25046/aj020324
- Nabil Nassif, Jordan Gooden, "Development and Validation of a Heat Pump System Model Using Artificial Neural Network", Advances in Science, Technology and Engineering Systems Journal, vol. 2, no. 3, pp. 182–185, 2017. doi: 10.25046/aj020323
- Sara Belarouci, Mohammed Amine Chikh, "Medical imbalanced data classification", Advances in Science, Technology and Engineering Systems Journal, vol. 2, no. 3, pp. 116–124, 2017. doi: 10.25046/aj020316
- R. Manju Parkavi, M. Shanthi, M.C. Bhuvaneshwari, "Recent Trends in ELM and MLELM: A review", Advances in Science, Technology and Engineering Systems Journal, vol. 2, no. 1, pp. 69–75, 2017. doi: 10.25046/aj020108
- Adewale Opeoluwa Ogunde, Ajibola Rasaq Olanbo, "A Web-Based Decision Support System for Evaluating Soil Suitability for Cassava Cultivation", Advances in Science, Technology and Engineering Systems Journal, vol. 2, no. 1, pp. 42–50, 2017. doi: 10.25046/aj020105
- Ahmet Kayabasi, Ali Akdagli, "An Application of ANN Model with Bayesian Regularization Learning Algorithm for Computing the Operating Frequency of C-Shaped Patch Antennas", Advances in Science, Technology and Engineering Systems Journal, vol. 1, no. 5, pp. 1–5, 2016. doi: 10.25046/aj010501