Mining Educational Data to Predict Student’s academic Performance using Ensemble Methods


Download 1.57 Mb.
Pdf ko'rish
bet9/17
Sana15.12.2022
Hajmi1.57 Mb.
#1008189
1   ...   5   6   7   8   9   10   11   12   ...   17
Bog'liq
R-paper

Educational Topics 
Educational topics
Figure 4. Educational Topics Visualization 
The data set includes also the school attendance feature, as shown in Figure5, the 
students are visualized into two categories based on their absence days: 191 students 
exceed 7 absence days and 289 students their absence days under 7. 
Online 
Version 
Only. 
Book 
made 
by 
this 
file 
is 
ILLEGAL.


International Journal of Database Theory and Application 
Vol.9, No.8 (2016) 
126 
Copyright ⓒ 2016 SERSC 
Figure 5. 
Students’ Absence Days’ Feature Visualization 
This research uses the “student absence days” feature to show the influence of such 
feature on student’s performance. This research also utilizes new category of features; this 
feature is parent participation in the educational process. Parent parturition feature have 
two sub features: Parent Answering Survey and Parent School Satisfaction. There are 270 
of the parents answered survey and 210 are not, 292 of the parents are satisfied from the 
school and 188 are not. Data preprocessing used in this research to study the nature of 
students’ performance features, and to get the influence ratio of features by defining the 
percentage value of each feature. The influence ratio of features will be defined accurately 
using feature selection process. 
 
3.2.2. Data Cleaning
Data cleaning is one of the main preprocessing tasks, is applied on this data set to 
remove irrelevant items and missing values. The data set contains 20 missing values in 
various features from 500 records, the records with missing values are removed from the 
data set, and the data set after cleaning becomes 480 records.
3.2.3. Feature Selection
Feature selection is a fundamental task in data preprocessing area. The objective of 
feature selection process is to select an appropriate subset of features which can 
efficiently describe the input data, reduces the dimensionality of feature space, removes 
redundant and irrelevant data [24]. This process can play an important role in improving 
the data quality therefore the performance of the learning algorithm.
Feature selection 
methods are categorized into wrapper-based and filter-based methods. Filter method is 
searching for the minimum set of relevant features while ignoring the rest. It uses variable 
ranking techniques to rank the features where the highly ranked features are selected and 
applied to the learning algorithm. Different feature ranking techniques have been 
proposed for feature evaluations such as information gain and gain ratio. 
In this research, we applied filter-method using information gain based selection 
algorithm to evaluate the feature ranks, checking which features are most important to 
build students’ performance model. Figure6, shows the feature ranks after filter-based 
evaluation. During feature selection, each feature assigned a rank value according to their 
influence on data classification. The highly ranked features are selected while others are 
excluded. 
Online 
Version 
Only. 
Book 
made 
by 
this 
file 
is 
ILLEGAL.


International Journal of Database Theory and Application 
Vol.9, No.8 (2016) 
Copyright ⓒ 2016 SERSC
127 

Download 1.57 Mb.

Do'stlaringiz bilan baham:
1   ...   5   6   7   8   9   10   11   12   ...   17




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling