Enhancing Decision Trees for Data Stream Mining

Mostafa Yacoub; Amira Rezk; Mohamed Senousy

doi:10.25046/aj060537

Open AccessArticle

Enhancing Decision Trees for Data Stream Mining

Volume 6, Issue 5, Page No 330–334, 2021

Author’s Name: Mostafa Yacoub^* ¹, Amira Rezk ², Mohamed Senousy ³

¹ Faculty of Computers and Information, Information System Department, Mansoura University, Mansoura, 35511, Egypt

² Faculty of Computers and Information, Information System Department, Mansoura University, Manso

³ Faculty of Management Sciences, Computer and Information system Departments, Sadat Academy for Management Sciences, Cairo, 00202, Egypt

^*whom correspondence should be addressed. E-mail: mostafayacoub3@gmail.com

Adv. Sci. Technol. Eng. Syst. J. 6(5), 330–334 (2021); DOI: 10.25046/aj060537

Keywords: Data Stream Mining, Classification, Decision Trees, VFDT

Received: 29 July 2021, Accepted: 15 October 2021, Published Online: 23 October 2021

(This article belongs to Section Artificial Intelligence in Computer Science (CAI))

Download Now!

371 Downloads

Export Citations

Abstract

Data stream gained obvious attention by research for years. Mining this type of data generates special challenges because of their unusual nature. Data streams flows are continuous, infinite and with unbounded size. Because of its accuracy, decision tree is one of the most common methods in classifying data streams. The aim of classification is to find a set of models that can be used to differentiate and label different classes of objects. The discovered models are used to predict the class membership of objects in a data set. Although many efforts were done to classify the stream data using decision trees, it still needs a special attention to enhance its performance, especially regarding time which is an important factor for data streams. This fast type of data requires the shortest possible processing time. This paper presents VFDT-S1.0 as an extension of VFDT (Very Fast Decision Trees). Bagging and sampling techniques are used for enhancing the algorithm time and maintaining accuracy. The experimental result proves that the proposed modification reduces time of the classification by more than 20% in more than one dataset. Effect on accuracy was less than 1% in some datasets. Time results proved the suitability of the algorithm for handling fast stream mining.

Full Text

1. Introduction

Recently, information played a major role in our world. Subsequently, the process of extracting knowledge is becoming very important. New applications that depend on data streams became more popular with time. Stream data are clear in sensors, telephone call records, click streams, social media, and stock market.

Contrary to traditional data mining, which analyses a stored data set, the stream mining analyses a data stream which cannot be saved as it’s infinite and needs expensive storage capabilities. Data streams arrive continuously and with fast pace, this prevents multiple passes of the data. So, processing time is more constrained in data streams.

Classification is a mining technique used to build a classification model based on the training data set which used to predict the class label of a new undefined data. Decision trees, neural networks, Bayesian networks, and Support Vector machines (SVM) are considered the most effective methods of classification. Decision trees are data structures organized hierarchically by splitting input space into local zones to predict the dependent variable.

Decision trees are hierarchical data structures for supervised learning by which the input space is split into local regions to predict the dependent variable [1]. It is classified as greedy algorithms which try to find a decision at each step of small steps. Decision trees consist of nodes and edges (branches). Root node has no incoming edge. Leaves or terminal nodes have no outgoing edges. All other nodes – besides root – have exactly one ingoing edge. Internal or test nodes are the nodes with outgoing edges. Each internal node splits the instance space into two or more instance sub-space. These splits are done according to a specific splitting discrete function of attribute values (inputs). Classes are assigned to leaf nodes.

Decision trees are characterized by simplicity, understandability, flexibility, adaptability and higher accuracy [2], [3]. The ability to handle both categorical and continuous data is an important advantage of decision trees. So, there is no need to normalize the data before running the decision tree model, that means fewer preprocessing processes. Being easier to construct and understand is another important factor for preferring decision trees over other data mining techniques. In addition, decision trees are interpretable as it can be expressed as a logical expression. Missing values in data are considered issues need to be handled before running data mining techniques in order not to affect the results. Decision trees can handle data with missing values successfully.

Traditional decision tree learners like ID3 (Iterative Dichotomiser 3) and C4.5 (Classification 4.5) have problems in handling data streams. It presumes that the whole training examples can be stored concurrently in main memory, which is not valid in data streams [4].

Very Fast Decision Trees (VFDT) was introduced by Domingos and Hulten in 2000[5]. VFDT uses the Hoeffding bound for node splitting and creating Hoeffding trees. The basis of Hoeffding trees is “a small sample can often be enough to choose an optimal splitting attribute”. Hoeffding bound gives a mathematical support to that basis quantifying the number of examples needed to estimate some statistics within a prescribed accuracy [6].

According to Hoeffding bounds, with probability 1 – δ, the true mean of r is at least r ̅ – ε, where

In equation (1), r represents continuous random variables whose range is R. is the observed mean of the samples after n independent observations. [7]. The VFDT defines the two attributes t₁, t₂ with highest information gain G_t1 and G_t2. If rG = G_t1 – G_t2 is higher than Ԑ (equation 1), then G_t1 is the best split attribute with probability of 1 – δ and the split is done. (Algorithm 1: VFDT)

In VFDT, leaves are replaced with decision nodes recursively. Statistics about attributes values are saved in each leaf. Based on these statistics, a heuristic function calculates the value of split tests. Each new instance passes from root to a leaf. At each leaf, attribute evaluation is done and follow the branch according to evaluation result. An important step must be done, which is updating the enough statistics of the leaf [8].

VFDT can address the research issues of data streams such as ties of attributes, bounded memory, efficiency and accuracy[9]. VFDT is known for having decent memory management. It can save memory by deactivating less promising leaves when memory reaches a limit then it turns back to normal when memory is free[10]. Also, it monitors the available memory and prunes leaves (where sufficient statistics are stored) depending on recent accuracy [11], [12].

The rest of this paper will discuss the related work in section two, the proposed modification on VFDT in section three, the evaluation of the proposed modification in section four and finally the conclusion and future work in section five.

Algorithm 1: VFDT
Result: very fast decision tree
begin
Let T be a tree with a one leaf (the root)
for all training examples do
	Update sufficient statistics in l
	Increment n₁, the number of examples seen at l
	if n₁ mod n_min = 0 and all examples seen at l not all same class then
		Compute Ḡ_l(Xi) for each attribute
		Let X_a be the attribute with highest Ḡ_l
		Let X_b be the attribute with the second highest Ḡ_l
		Compute Hoeffding bound Ԑ=
		if X_a ≠ X_Φ and (Ḡ_l(X_a) – Ḡ_l(X_b) > Ԑ or Ԑ < ) then
				Replace l with an internal node that splits on X_a
				for all branches of the split do
					Add new leaf with initialized sufficient statistics
				end
			end
	end
end

2. Related Work

Although decision trees have more than accepted results in data stream mining, there have been many trials of modification to enhance results. For being one of the noticeable algorithms in decision trees, VFDT has share in these studies. Following studies present VFDT modifications to achieve higher accuracy, less time, or both. Next section summarizes these studies followed by a table to show impact on time and accuracy.

2.1. Bagging

In [13], the author proposed VFDTc and VFDTcNB, which can include and classify new data online with a one scan of the data for medium and large size datasets. VFDTc can deal with numerical attributes heterogeneous data, while VFDTcNB can apply naive Bayes classifiers in tree leaves and reinforces the anytime characteristic. In [14], the authors presented GVFDT, an employment of the VFDT used for creating random forests that use VFDTs for GPUs data streams. This technique takes advantage of the huge parallel architecture of GPUs. Furthermore, GVFDT algorithm reduces the communication between CPU and GPU by constructing the trees inside the GPU.

2.2. Adaptability

In [15], the authors proposed Strict VFDT in two versions; SVFDT-I and SVFDT-II. Both are seeking reducing tree growth and decreasing memory usage. Both algorithms produce trees much smaller than those produced by the original VFDT algorithm. Testing them on eleven datasets, SVFDT-II produced better accuracy than the SVFDT-I, together with significantly reducing tree size.

In [16], the authors presented ODR-ioVFDT (Outlier Detection incremental optimized VFDT) as an extension of VFDT to handle outliers in continuous data learning. The new algorithm was applied onto bioinformatics data streams–loaded by sliding windows – to diagnose and treat disease more efficiently. The ODR model chooses the outlier, which is stored into misclassified database. Clean data will be passed through ioVFDT classifier for decision tree building. The lower performance will send response to outlier and classifier model, the model update will be needed. In [17], the authors proposed an optimization of VFDT algorithm to decrease the effect of concept drift by utilizing sliding windows and fuzzy technology. Results showed improvements in accuracy results.

Table 1: Summary of related work

Title	Year	Algorithm Name	Algorithm Idea	Time Results	Accuracy Results
Speeding up Very Fast Decision Tree with Low Computational Cost	2020	IMAC (Incremental Measure Algorithm Based on Candidate Attributes)	The algorithm calculates the heuristic measure of an attribute with lower computational cost. Possible split timing is found by selecting subset of attributes precisely.	Decreased in most datasets except two with minor increase	No loss in some datasets and minor loss of accuracy in few datasets
A VFDT algorithm optimization and application thereof in data stream classification	2020	Optimized VFDT	an optimization of VFDT algorithm to decrease the effect of concept drift by utilizing sliding windows and fuzzy technology	Lower Time	Higher Accuracy
Enhancing Very Fast Decision Trees with Local Split-Time Predictions	2018	OSM (One-sided minimum)	replaced the global splitting scheme with local statistics to predict the split time which leads to lower computational cost by avoiding excessive split tries.	Decreased run-time	Same accuracy
Strict Very Fast Decision Tree: a memory conservative algorithm for data stream mining Victor	2018	Strict VFDT: SVFDT-I & SVFDT-II	Both are seeking reducing tree growth and decreasing memory usage. Both algorithms produce trees much smaller than those produced by the original VFDT algorithm.	Decreased in 3 datasets, and higher in the other 8 datasets	Decreased in 5 datasets, same accuracy in 3 datasets, and higher accuracy in 3 more
Robust High-dimensional Bioinformatics Data Streams Mining by ODR-ioVFDT	2017	ODR-ioVFDT	The ODR model chooses the outlier, which is stored into misclassified database. Clean data will be passed through ioVFDT classifier for decision tree building. The lower performance will send response to outlier and classifier model, the model update will be needed.	Higher in all datasets	Higher in all datasets with small percentage
Random Forests of Very Fast Decision Trees on GPU for Mining Evolving Big Data Streams	2014	GVFDT: Very Fast Decision Trees for GPU	This technique takes advantage of the huge parallel architecture of GPUs. Furthermore, GVFDT algorithm reduces the communication between CPU and GPU by constructing the trees inside the GPU.	Lower time in the three datasets	Lower Accuracy in two datasets and same accuracy in one.
Accurate Decision Trees for Mining High-speed Data Streams	2003	VFDTc & VFDTcNB	VFDTc: can deal with numerical attributes. VFDTcNB: apply naive Bayes classifiers in tree leaves	Decrease with more than 50%	Increase by 2% (average)

2.3. Split Function

In [18], the authors replaced the global splitting scheme with local statistics to predict the split time which leads to lower computational cost by avoiding excessive split tries. Results showed decreased run-time with no loss in accuracy. In [19], the authors introduced IMAC (Incremental Measure Algorithm Based on Candidate Attributes) an online incremental algorithm with a much lower computational cost. The algorithm calculates the heuristic measure of an attribute with lower computational cost. Possible split timing is found by selecting subset of attributes precisely. The algorithm showed faster and more accurate results by decreasing split attempts with much lower split delay.

Table 1 summarizes efforts in this area, but the time still a challenge that face the algorithms that applied to the stream data. All mentioned studies achieved better time results except on research. From accuracy side, only three studies achieved higher accuracy and another two achieved less accuracy. So, this paper will try to propose a modification to reduce the time of the decision tree in stream data.

3. The proposed VFDT-S1.0

The proposed VFDT-S1.0 aims to modify the original VFDT algorithm to reduce the time of classification. The idea of the modification is based on two main factors. First is bagging more than one algorithm to improve performance and second factor is using random sampling with fixed percentage from the whole data.

Algorithm 2: VFDT-S1.0
Result: M: Model with the highest accuracy
Begin
Load Data Stream S
For every record in S:
Delete record if contains null value
Let S_train = S * 0.8 S_test = S – S_train
S_train = SimpleRandomSample(S_train)
HT=HoeffdingTree(S_train)
HT_Pred=Predict(HT,S_test)
HT_Acc=mean(HT_Pred , S_testClass)*100
HOT=HoeffdingOptionTree(S_train)
HOT_Pred=Predict(HOT,S_test)
HOT_Acc=mean(HOT_Pred , S_testClass)*100
HAT=HoeffdingAdaptiveTree(S_train)
if (HTAcc > HOTAcc and HTAcc > HATAcc) then
	M = HT
Else
	if (HOT_Acc > HT_Acc & HOT_Acc > HAT_Acc) then
		M = HOT
	Else
		if (HAT_Acc > HT_Acc & HAT_Acc > HOT_Acc) then
		M = HAT
		end
	End
End
End

The three algorithms are run sequentially to find the one with more accurate results. Accuracy is measured for the three models generated by the three algorithms. The algorithm with highest accuracy is used on the rest of data.

Figure 1: VFDT-S1.0 Framework

Sampling is used to compensate using three different algorithms sequentially. Using sampling in data streams has been discussed in many studies. Three sampling techniques related to data streams are reservoir sampling, AMS-sampling, and Sliding window sampling. In [20], random sampling was used to challenge time constraint. As shown in figure 1, the three algorithms were trained using the same sample. As we choose the best accuracy of the three to use and compare with original VFDT algorithm. Figure 1 displays VFDT-S01 framework, explaining the four basic stages of it.

4. Implementation and Evaluation

To examine the proposed algorithm, it is tested and compared to the original VFDT algorithm. Coding and evaluation were done using Java and R languages working on Microsoft Windows 10 environment on core i5-5200U processor machine. Source code of algorithms is written in Java in Massive Online Analytics (MOA) tool, employing MOA codes in R is done by using RMOA package. RMOA is connecting R with MOA to build classification and regression models on streaming data.

The test is done using 7 different real classification datasets; covType[21], Airlines[22], KDD99[23], Elecnorm[24], MplsStops[25], Chess[26], and Income[27]. Table 2 summarizes the seven datasets and comparing them according to number of instances, attributes, and classes.

Table 2: Sample Table

Dataset	Number of Instances	Number of attributes	Number of Classes
covType	581,012	55	7
Airlines	539,383	8	2
KDD99	494,020	42	23
Electricity	45,312	9	2
MplsStops	51,920	15	2
Chess	28,056	7	18
Income	48,842	15	2

Each dataset was divided into training and test set. Training set is 80% the whole data and the reminder was the test set for prediction. Both algorithms were tested using the same test set to get more accurate comparison results. Accuracy was calculated as number of true predictions divided by test set size.

Time was calculated by using built-in time function in R at the start and end of code. Both accuracy and time were calculated as an average of three runs of both algorithms on every dataset.

Table 3 compares the proposed VFDT-S1.0 and VFDT based on the accuracy and time. Also shows that the original algorithm achieves higher accuracy in all seven datasets.

Table 3: Algorithms Comparison

	VFDT			VFDT-S1.0			Difference Percentage
Data set	Accuracy%	Time (sec)	Accuracy%		Time (sec)	Accuracy%		Time %
CovType	72.86 %	816.00	69.95 %		620.74	-4.00 %		-23.93 %
Airline	65.06 %	635.10	60.93 %		539.86	-6.34 %		-15.00 %
KDD99	99.79 %	638.39	99.55 %		492.65	-0.24 %		-22.83 %
Elec.	77.11 %	52.70	76.38 %		45.11	-0.94 %		-14.40 %
MplsStop	79.53 %	20.12	77.91 %		18.60	-2.04 %		-7.54 %
Chess	33.70 %	29.71	32.29 %		27.06	-4.18 %		-8.92 %
Income	83.94 %	53.18	81.92 %		46.51	-2.40 %		-12.53 %

Differences between VFDT accuracy and VFDT-S1.0 accuracy varies from 0.24% at KDD99 dataset to 4.13% at Airline dataset. Figure 2 displays the accuracy between the two algorithms.

Figure 2: Accuracy Comparison on all datasets

Figure 3: Time Comparison on datasets (covType, Airline and KDD99)

Figure 4: Time Comparison on datasets (elec, MplsStops, Chess, and Income)

Figure 3 represents processing time of both algorithms on largest three datasets and figure 4 displays time on the reminder datasets. Time was always better with VFDT-S1.0 at all datasets. 1.52 seconds was the minimum difference between two algorithms on MplsStops dataset. CovType dataset had the major difference with 195.26 seconds. At KDD99 dataset, which had the highest accuracy difference, the time was less by 145.74 seconds.

5. Conclusion

This paper proposed the VFDT-S1.0; a modified VFDT algorithm that uses bagging techniques to achieve most possible accuracy. In time factor, we used random sampling to achieve better processing time. We tested the new algorithm using seven real classification datasets and compared results with VFDT algorithm. Improvements have been noticed in time as VFDT-S1.0 took much less time with all datasets. Biggest time difference was 24% in CovType dataset. In KDD dataset the time dropped by 23% with -0.2% in accuracy. This time difference shows potential for scaling VFDT. As it can be processed by much lower processing resources. Also, the ability to handle very fast data streams with dependable accuracy.

6. Future Work

In future work, tree size, Kappa, sensitivity, and specificity will be measured for both algorithms. Accuracy can be enhanced with bagging more models and choosing a sample with the same class representation in dataset. Also, parallel processing is considered for much time improvement. Change detection techniques are going to be added to deal with concept drifts.

Conflict of Interest

The authors declare no conflict of interest.

References (27)

E. Alpayd?n, “Introduction to machine learning,” Methods in Molecular Biology, 1107, 105–128, 2014, doi:10.1007/978-1-62703-748-8-7.
Z. Çetinkaya, F. Horasan, “Decision Trees in Large Data Sets,” International Journal of Engineering Research and Development, 13(1), 140–151, 2021, doi:10.29137.
S. Moral-garcía, J.G. Castellano, C.J. Mantas, A. Montella, J. Abellán, “Decision Tree Ensemble Method for Analyzing Traffic Accidents of Novice Drivers in Urban Areas,” 1–15, 2019, doi:10.3390/e21040360.
F.M.J.M. Shamrat, R. Ranjan, A. Yadav, A.H. Siddique, S. Engineering, C. Neusoft, C.C. Officer, “Performance Evaluation among ID3 , C4 . 5 , and CART Decision Tree Algorithms,” International Conference on Pervasive Computing and Social Networking, 2021.
P. Domingos, G. Hulten, “Mining high-speed data streams,” Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining – KDD ’00, 71–80, 2000, doi:10.1145/347090.347107.
M. Yacoub, A. Rezk, M. Senousy, “Adaptive classification in data stream mining,” Journal of Theoretical and Applied Information Technology, 98(13), 2637–2645, 2020.
W. Zang, P. Zhang, C. Zhou, L. Guo, “Comparative study between incremental and ensemble learning on data streams: Case study,” Journal of Big Data, 1(1), 1–16, 2014, doi:10.1186/2196-1115-1-5.
J. Gama, P.P. Rodrigues, An Overview on Mining Data Streams, Springer-Verlag Berlin Heidelberg: 38–54, 2009, doi:10.1007/978-3-642-01091-0.
C.C. Aggarwal, Data streams: Models and Algorithms, 1st ed., Springer-Verlag US, 2010, doi:10.1007/978-0-387-47534-9.
E. Ikonomovska, J. Gama, S. Džeroski, “Learning model trees from evolving data streams,” Data Mining and Knowledge Discovery, 23(1), 128–168, 2011, doi:10.1007/s10618-010-0201-y.
A. Muallem, S. Shetty, J.W. Pan, J. Zhao, B. Biswal, “Hoeffding Tree Algorithms for Anomaly Detection in Streaming Datasets?: A Survey,” Journal of Information Security, 8(4), 339–361, 2017, doi:10.4236/jis.2017.84022.
D.H. Han, X. Zhang, G.R. Wang, “Classifying Uncertain and Evolving Data Streams with Distributed Extreme Learning Machine,” Journal of Computer Science and Technology, 30(4), 874–887, 2015, doi:10.1007/s11390-015-1566-6.
J. Gama, R. Rocha, P. Medas, “Accurate Decision Trees for Mining High-speed Data Streams,” Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 523–528, 2003, doi:10.1145/956750.956813.
D. Marron, A. Bifet, G. De Francisci Morales, “Random forests of very fast decision trees on GPU for mining evolving big data streams,” Frontiers in Artificial Intelligence and Applications, 263, 615–620, 2014, doi:10.3233/978-1-61499-419-0-615.
V. Guilherme, A. Carvalho, S. Barbon, “Strict Very Fast Decision Tree?: a memory conservative algorithm for data stream mining,” Pattern Recognition Letters, 1–7, 2018.
D. Wang, S. Fong, R.K. Wong, S. Mohammed, J. Fiaidhi, K.K.L. Wong, “Robust high-dimensional bioinformatics data streams mining by ODR-ioVFDT,” Scientific Reports, 7, 1–12, 2017, doi:10.1038/srep43167.
S. Jia, “A VFDT algorithm optimization and application thereof in data stream classification A VFDT algorithm optimization and application thereof in data stream classification,” Journal of Physics: Conference Series, 1–7, 2020, doi:10.1088/1742-6596/1629/1/012027.
V. Losing, H. Wersing, B. Hammer, “Enhancing Very Fast Decision Trees with Local Split-Time Predictions,” IEEE International Conference on Data Mining (ICDM), 287–296, 2018, doi:10.1109/ICDM.2018.00044.
J. Sun, H. Jia, B. Hu, X. Huang, H. Zhang, H. Wan, X. Zhao, “Speeding up Very Fast Decision Tree with Low Computational Cost,” International Joint Conferences on Artificial Intelligence, 1272–1278, 2020.
E. Ikonomovska, M. Zelke, Algorithmic Techniques for Processing Data Streams, 2013.
J A Blackard D J Dean, “Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables,” Computers and Electronics in Agriculture, 24, 131–151, 1999.
E. Ikonomovska, Airline, 2009.
S.D. Hettich, S. and Bay, The UCI KDD Archive, 1999.
M. Harries, Electricity, Aug. 2019.
M. GIS, Police Stop Data, 2017.
M. J, Chess Game Dataset, 2017.
W. Liu, Adult income dataset, 2016.

Cited By

Citations by Dimensions

Citations by PlumX

Google Scholar

(Click to view)

Crossref Citations

Mertics

No. of Downloads Per Month

No. of Downloads Per Country

Surapol Vorapatratorn, Nontawat Thongsibsong, "AI-Based Photography Assessment System using Convolutional Neural Networks", Advances in Science, Technology and Engineering Systems Journal, vol. 10, no. 2, pp. 28–34, 2025. doi: 10.25046/aj100203
Joshua Carberry, Haiping Xu, "GPT-Enhanced Hierarchical Deep Learning Model for Automated ICD Coding", Advances in Science, Technology and Engineering Systems Journal, vol. 9, no. 4, pp. 21–34, 2024. doi: 10.25046/aj090404
Pui Ching Wong, Shahrum Shah Abdullah, Mohd Ibrahim Shapiai, "Double-Enhanced Convolutional Neural Network for Multi-Stage Classification of Alzheimer’s Disease", Advances in Science, Technology and Engineering Systems Journal, vol. 9, no. 2, pp. 09–16, 2024. doi: 10.25046/aj090202
John Tsiligaridis, "Tree-Based Ensemble Models, Algorithms and Performance Measures for Classification", Advances in Science, Technology and Engineering Systems Journal, vol. 8, no. 6, pp. 19–25, 2023. doi: 10.25046/aj080603
Sutham Satthamsakul, Ari Kuswantori, Witsarut Sriratana, Worapong Tangsrirat, Taweepol Suesut, "Landmarking Technique for Improving YOLOv4 Fish Recognition in Various Background Conditions", Advances in Science, Technology and Engineering Systems Journal, vol. 8, no. 3, pp. 100–107, 2023. doi: 10.25046/aj080312
Sathyabama Kaliyapillai, Saruladha Krishnamurthy, Thiagarajan Murugasamy, "An Ensemble of Voting- based Deep Learning Models with Regularization Functions for Sleep Stage Classification", Advances in Science, Technology and Engineering Systems Journal, vol. 8, no. 1, pp. 84–94, 2023. doi: 10.25046/aj080110
Fatima-Zahra Elbouni, Aziza EL Ouaazizi, "Birds Images Prediction with Watson Visual Recognition Services from IBM-Cloud and Conventional Neural Network", Advances in Science, Technology and Engineering Systems Journal, vol. 7, no. 6, pp. 181–188, 2022. doi: 10.25046/aj070619
Bougar Marieme, Ziyati El Houssaine, "Analysis Methods and Classification Algorithms with a Novel Sentiment Classification for Arabic Text using the Lexicon-Based Approach", Advances in Science, Technology and Engineering Systems Journal, vol. 7, no. 3, pp. 12–18, 2022. doi: 10.25046/aj070302
Valerii Dmitrienko, Serhii Leonov, Aleksandr Zakovorotniy, "New Neural Networks for the Affinity Functions of Binary Images with Binary and Bipolar Components Determining", Advances in Science, Technology and Engineering Systems Journal, vol. 6, no. 4, pp. 91–99, 2021. doi: 10.25046/aj060411
Susanto Kumar Ghosh, Mohammad Rafiqul Islam, "Convolutional Neural Network Based on HOG Feature for Bird Species Detection and Classification", Advances in Science, Technology and Engineering Systems Journal, vol. 6, no. 2, pp. 733–745, 2021. doi: 10.25046/aj060285
Alisson Steffens Henrique, Anita Maria da Rocha Fernandes, Rodrigo Lyra, Valderi Reis Quietinho Leithardt, Sérgio D. Correia, Paul Crocker, Rudimar Luis Scaranto Dazzi, "Classifying Garments from Fashion-MNIST Dataset Through CNNs", Advances in Science, Technology and Engineering Systems Journal, vol. 6, no. 1, pp. 989–994, 2021. doi: 10.25046/aj0601109
Mandlenkosi Shezi, Abejide Ade-Ibijola, "Deaf Chat: A Speech-to-Text Communication Aid for Hearing Deficiency", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 5, pp. 826–833, 2020. doi: 10.25046/aj0505100
Mohammed Hamim, Ismail El Moudden, Hicham Moutachaouik, Mustapha Hain, "Gene Selection for Cancer Classification: A New Hybrid Filter-C5.0 Approach for Breast Cancer Risk Prediction", Advances in Science, Technology and Engineering Systems Journal, vol. 6, no. 1, pp. 871–878, 2021. doi: 10.25046/aj060196
Reem Bayari, Ameur Bensefia, "Text Mining Techniques for Cyberbullying Detection: State of the Art", Advances in Science, Technology and Engineering Systems Journal, vol. 6, no. 1, pp. 783–790, 2021. doi: 10.25046/aj060187
Anass Barodi, Abderrahim Bajit, Taoufiq El Harrouti, Ahmed Tamtaoui, Mohammed Benbrahim, "An Enhanced Artificial Intelligence-Based Approach Applied to Vehicular Traffic Signs Detection and Road Safety Enhancement", Advances in Science, Technology and Engineering Systems Journal, vol. 6, no. 1, pp. 672–683, 2021. doi: 10.25046/aj060173
Inna Valieva, Iurii Voitenko, Mats Björkman, Johan Åkerberg, Mikael Ekström, "Multiple Machine Learning Algorithms Comparison for Modulation Type Classification Based on Instantaneous Values of the Time Domain Signal and Time Series Statistics Derived from Wavelet Transform", Advances in Science, Technology and Engineering Systems Journal, vol. 6, no. 1, pp. 658–671, 2021. doi: 10.25046/aj060172
Hendro Arieyanto, Andry Chowanda, "Classification of Wing Chun Basic Hand Movement using Virtual Reality for Wing Chun Training Simulation System", Advances in Science, Technology and Engineering Systems Journal, vol. 6, no. 1, pp. 250–256, 2021. doi: 10.25046/aj060128
Ndiatenda Ndou, Ritesh Ajoodha, Ashwini Jadhav, "A Case Study to Enhance Student Support Initiatives Through Forecasting Student Success in Higher-Education", Advances in Science, Technology and Engineering Systems Journal, vol. 6, no. 1, pp. 230–241, 2021. doi: 10.25046/aj060126
Arwa Alshamsi, Reem Bayari, Said Salloum, "Sentiment Analysis in English Texts", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 6, pp. 1683–1689, 2020. doi: 10.25046/aj0506200
Rewan Kumar Dahal, Ganesh Bhattarai, Dipendra Karki, "Determinants of Technological and Innovation Performance of the Nepalese Cellular Telecommunications Industry from the Customers’ Perspective", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 6, pp. 1013–1020, 2020. doi: 10.25046/aj0506122
Jinwon Cheon, Sunwoong Choi, "Hand Gesture Classification using Inaudible Sound with Ensemble Method", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 6, pp. 967–971, 2020. doi: 10.25046/aj0506115
Hao Tuan Huynh, Nghia Duong-Trung, Dinh Quoc Truong, Hiep Xuan Huynh, "Vietnamese Text Classification with TextRank and Jaccard Similarity Coefficient", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 6, pp. 363–369, 2020. doi: 10.25046/aj050644
Munaf Salim Najim Al-Din, "Real-Time Identification and Classification of Driving Maneuvers using Smartphone", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 6, pp. 193–205, 2020. doi: 10.25046/aj050623
Fei Gao, Jiangjiang Liu, "Effective Segmented Face Recognition (SFR) for IoT", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 6, pp. 36–44, 2020. doi: 10.25046/aj050605
Daniyar Nurseitov, Kairat Bostanbekov, Maksat Kanatov, Anel Alimova, Abdelrahman Abdallah, Galymzhan Abdimanap, "Classification of Handwritten Names of Cities and Handwritten Text Recognition using Various Deep Learning Models", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 5, pp. 934–943, 2020. doi: 10.25046/aj0505114
Gökalp Çınarer, Bülent Gürsel Emiroğlu, Recep Sinan Arslan, Ahmet Haşim Yurttakal, "Brain Tumor Classification Using Deep Neural Network", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 5, pp. 765–769, 2020. doi: 10.25046/aj050593
Lana Abdulrazaq Abdullah, Muzhir Shaban Al-Ani, "CNN-LSTM Based Model for ECG Arrhythmias and Myocardial Infarction Classification", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 5, pp. 601–606, 2020. doi: 10.25046/aj050573
Alami Hamza, Noureddine En-Nahnahi, Said El Alaoui Ouatik, "Contextual Word Representation and Deep Neural Networks-based Method for Arabic Question Classification", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 5, pp. 478–484, 2020. doi: 10.25046/aj050559
Nittaya Kerdprasop, Kittisak Kerdprasop, Paradee Chuaybamroong, "Economic and Environmental Analysis of Life Expectancy in China and India: A Data Driven Approach", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 5, pp. 308–313, 2020. doi: 10.25046/aj050539
Martin Marinov, Alexander Efremov, "Four-Dimensional Sparse Data Structures for Representing Text Data", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 5, pp. 154–166, 2020. doi: 10.25046/aj050521
Tran Thanh Dien, Nguyen Thanh-Hai, Nguyen Thai-Nghe, "Deep Learning Approach for Automatic Topic Classification in an Online Submission System", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 4, pp. 700–709, 2020. doi: 10.25046/aj050483
Panida Lorwongtrakool, Phayung Meesad, "Correlation-Based Incremental Learning Network for Gas Sensors Drift Compensation Classification", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 4, pp. 660–666, 2020. doi: 10.25046/aj050479
Anouar Bachar, Noureddine El Makhfi, Omar EL Bannay, "Machine Learning for Network Intrusion Detection Based on SVM Binary Classification Model", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 4, pp. 638–644, 2020. doi: 10.25046/aj050476
Roberta Avanzato, Francesco Beritelli, "A CNN-based Differential Image Processing Approach for Rainfall Classification", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 4, pp. 438–444, 2020. doi: 10.25046/aj050452
Rizki Jaka Maulana, Gede Putra Kusuma, "Malware Classification Based on System Call Sequences Using Deep Learning", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 4, pp. 207–216, 2020. doi: 10.25046/aj050426
Krina B. Gabani, Mayuri A. Mehta, Stephanie Noronha, "Racial Categorization Methods: A Survey", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 3, pp. 388–401, 2020. doi: 10.25046/aj050350
Jan Sikora, David Fojtík, "Classification of Timber Load on Trucks", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 2, pp. 683–687, 2020. doi: 10.25046/aj050284
Bokyoon Na, Geoffrey C Fox, "Object Classifications by Image Super-Resolution Preprocessing for Convolutional Neural Networks", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 2, pp. 476–483, 2020. doi: 10.25046/aj050261
Lenin G. Falconi, Maria Perez, Wilbert G. Aguilar, Aura Conci, "Transfer Learning and Fine Tuning in Breast Mammogram Abnormalities Classification on CBIS-DDSM Database", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 2, pp. 154–165, 2020. doi: 10.25046/aj050220
Michael Wenceslaus Putong, Suharjito, "Classification Model of Contact Center Customers Emails Using Machine Learning", Advances in Science, Technology and Engineering Systems Journal, vol. 5, no. 1, pp. 174–182, 2020. doi: 10.25046/aj050123
Bui Thanh Hung, "Integrating Diacritics Restoration and Question Classification into Vietnamese Question Answering System", Advances in Science, Technology and Engineering Systems Journal, vol. 4, no. 5, pp. 207–212, 2019. doi: 10.25046/aj040526
Michael Santacroce, Daniel Koranek, Rashmi Jha, "Detecting Malicious Assembly using Convolutional, Recurrent Neural Networks", Advances in Science, Technology and Engineering Systems Journal, vol. 4, no. 5, pp. 46–52, 2019. doi: 10.25046/aj040506
Antonio Fuduli, Pierangelo Veltri, Eugenio Vocaturo, Ester Zumpano, "Melanoma detection using color and texture features in computer vision systems", Advances in Science, Technology and Engineering Systems Journal, vol. 4, no. 5, pp. 16–22, 2019. doi: 10.25046/aj040502
Tlija Amira, Istrate Dan, Badii Atta, Gattoufi Said, Bennani Az-eddine, Wegrzyn-Wolska Katarzyna, "Stress Level Classification Using Heart Rate Variability", Advances in Science, Technology and Engineering Systems Journal, vol. 4, no. 3, pp. 38–46, 2019. doi: 10.25046/aj040306
Abba Suganda Girsang, Andi Setiadi Manalu, Ko-Wei Huang, "Feature Selection for Musical Genre Classification Using a Genetic Algorithm", Advances in Science, Technology and Engineering Systems Journal, vol. 4, no. 2, pp. 162–169, 2019. doi: 10.25046/aj040221
Bayan AlSaaidah, Waleed Al-Nuaimy, Mohammed Rasoul Al-Hadidi, Iain Young, "Zebrafish Larvae Classification based on Decision Tree Model: A Comparative Analysis", Advances in Science, Technology and Engineering Systems Journal, vol. 3, no. 4, pp. 347–353, 2018. doi: 10.25046/aj030435
Sehla Loussaief, Afef Abdelkrim, "Machine Learning framework for image classification", Advances in Science, Technology and Engineering Systems Journal, vol. 3, no. 1, pp. 1–10, 2018. doi: 10.25046/aj030101
Ruijian Zhang, Deren Li, "Applying Machine Learning and High Performance Computing to Water Quality Assessment and Prediction", Advances in Science, Technology and Engineering Systems Journal, vol. 2, no. 6, pp. 285–289, 2017. doi: 10.25046/aj020635
Mohamed El Beqqal, Mostafa Azizi, "Review on security issues in RFID systems", Advances in Science, Technology and Engineering Systems Journal, vol. 2, no. 6, pp. 194–202, 2017. doi: 10.25046/aj020624
Muhammad Asif Manzoor, Yasser Morgan, "Support Vector Machine based Vehicle Make and Model Recognition System", Advances in Science, Technology and Engineering Systems Journal, vol. 2, no. 3, pp. 1080–1085, 2017. doi: 10.25046/aj0203137
Mohamed Salim El Bazzi, Driss Mammass, Abdelatif Ennaji, Taher Zaki, "Features based approach for indexation and representation of unstructured Arabic documents", Advances in Science, Technology and Engineering Systems Journal, vol. 2, no. 3, pp. 900–905, 2017. doi: 10.25046/aj0203112
Zulfiqar Ali, Waseem Shahzad, "Performance Evaluation of Associative Classifiers in Perspective of Discretization Methods", Advances in Science, Technology and Engineering Systems Journal, vol. 2, no. 3, pp. 845–854, 2017. doi: 10.25046/aj0203105
Su-Ping Deng, Wenxing Hu, Vince D. Calhoun, Yu-Ping Wang, "Schizophrenia Prediction Using Integrated Imaging Genomic Networks", Advances in Science, Technology and Engineering Systems Journal, vol. 2, no. 3, pp. 702–710, 2017. doi: 10.25046/aj020390
Turgay Yalcin, Muammer Ozdemir, "Computational Intelligence Methods for Identifying Voltage Sag in Smart Grid", Advances in Science, Technology and Engineering Systems Journal, vol. 2, no. 3, pp. 412–419, 2017. doi: 10.25046/aj020353
Adewale Opeoluwa Ogunde, Ajibola Rasaq Olanbo, "A Web-Based Decision Support System for Evaluating Soil Suitability for Cassava Cultivation", Advances in Science, Technology and Engineering Systems Journal, vol. 2, no. 1, pp. 42–50, 2017. doi: 10.25046/aj020105
Hocine Chebi, Dalila Acheli, Mohamed Kesraoui, "Dynamic detection of abnormalities in video analysis of crowd behavior with DBSCAN and neural networks", Advances in Science, Technology and Engineering Systems Journal, vol. 1, no. 5, pp. 56–63, 2016. doi: 10.25046/aj010510

Enhancing Decision Trees for Data Stream Mining

Enhancing Decision Trees for Data Stream Mining

Abstract

Full Text

1. Introduction

2. Related Work

2.1. Bagging

2.2. Adaptability

2.3. Split Function

3. The proposed VFDT-S1.0

4. Implementation and Evaluation

5. Conclusion

6. Future Work

Conflict of Interest

References (27)

Cited By

Citations by Dimensions

Citations by PlumX

Google Scholar

Crossref Citations

Mertics

Related Articles