Mining Educational Data to Predict Student’s academic Performance using Ensemble Methods

bet	9/17
Sana	15.12.2022
Hajmi	1.57 Mb.
	#1008189

1 ... 5 6 7 8 9 10 11 12 ... 17

Bog'liq
R-paper

Figure 5. Students’ Absence Days’ Feature Visualization
3.2.3. Feature Selection

Educational Topics
Educational topics
Figure 4. Educational Topics Visualization
The data set includes also the school attendance feature, as shown in Figure5, the
students are visualized into two categories based on their absence days: 191 students
exceed 7 absence days and 289 students their absence days under 7.
Online
Version
Only.
Book
made
by
this
file
is
ILLEGAL.

International Journal of Database Theory and Application
Vol.9, No.8 (2016)
126
Copyright ⓒ 2016 SERSC
Figure 5.
Students’ Absence Days’ Feature Visualization
This research uses the “student absence days” feature to show the influence of such
feature on student’s performance. This research also utilizes new category of features; this
feature is parent participation in the educational process. Parent parturition feature have
two sub features: Parent Answering Survey and Parent School Satisfaction. There are 270
of the parents answered survey and 210 are not, 292 of the parents are satisfied from the
school and 188 are not. Data preprocessing used in this research to study the nature of
students’ performance features, and to get the influence ratio of features by defining the
percentage value of each feature. The influence ratio of features will be defined accurately
using feature selection process.

3.2.2. Data Cleaning
Data cleaning is one of the main preprocessing tasks, is applied on this data set to
remove irrelevant items and missing values. The data set contains 20 missing values in
various features from 500 records, the records with missing values are removed from the
data set, and the data set after cleaning becomes 480 records.
3.2.3. Feature Selection
Feature selection is a fundamental task in data preprocessing area. The objective of
feature selection process is to select an appropriate subset of features which can
efficiently describe the input data, reduces the dimensionality of feature space, removes
redundant and irrelevant data [24]. This process can play an important role in improving
the data quality therefore the performance of the learning algorithm.
Feature selection
methods are categorized into wrapper-based and filter-based methods. Filter method is
searching for the minimum set of relevant features while ignoring the rest. It uses variable
ranking techniques to rank the features where the highly ranked features are selected and
applied to the learning algorithm. Different feature ranking techniques have been
proposed for feature evaluations such as information gain and gain ratio.
In this research, we applied filter-method using information gain based selection
algorithm to evaluate the feature ranks, checking which features are most important to
build students’ performance model. Figure6, shows the feature ranks after filter-based
evaluation. During feature selection, each feature assigned a rank value according to their
influence on data classification. The highly ranked features are selected while others are
excluded.
Online
Version
Only.
Book
made
by
this
file
is
ILLEGAL.

International Journal of Database Theory and Application
Vol.9, No.8 (2016)
Copyright ⓒ 2016 SERSC
127

Download 1.57 Mb.

Do'stlaringiz bilan baham:

1 ... 5 6 7 8 9 10 11 12 ... 17