Mining Educational Data to Predict Student’s academic Performance using Ensemble Methods
Download 1.57 Mb. Pdf ko'rish
|
R-paper
- Bu sahifa navigatsiya:
- Figure 5. Students’ Absence Days’ Feature Visualization
- 3.2.3. Feature Selection
Educational Topics
Educational topics Figure 4. Educational Topics Visualization The data set includes also the school attendance feature, as shown in Figure5, the students are visualized into two categories based on their absence days: 191 students exceed 7 absence days and 289 students their absence days under 7. Online Version Only. Book made by this file is ILLEGAL. International Journal of Database Theory and Application Vol.9, No.8 (2016) 126 Copyright ⓒ 2016 SERSC Figure 5. Students’ Absence Days’ Feature Visualization This research uses the “student absence days” feature to show the influence of such feature on student’s performance. This research also utilizes new category of features; this feature is parent participation in the educational process. Parent parturition feature have two sub features: Parent Answering Survey and Parent School Satisfaction. There are 270 of the parents answered survey and 210 are not, 292 of the parents are satisfied from the school and 188 are not. Data preprocessing used in this research to study the nature of students’ performance features, and to get the influence ratio of features by defining the percentage value of each feature. The influence ratio of features will be defined accurately using feature selection process. 3.2.2. Data Cleaning Data cleaning is one of the main preprocessing tasks, is applied on this data set to remove irrelevant items and missing values. The data set contains 20 missing values in various features from 500 records, the records with missing values are removed from the data set, and the data set after cleaning becomes 480 records. 3.2.3. Feature Selection Feature selection is a fundamental task in data preprocessing area. The objective of feature selection process is to select an appropriate subset of features which can efficiently describe the input data, reduces the dimensionality of feature space, removes redundant and irrelevant data [24]. This process can play an important role in improving the data quality therefore the performance of the learning algorithm. Feature selection methods are categorized into wrapper-based and filter-based methods. Filter method is searching for the minimum set of relevant features while ignoring the rest. It uses variable ranking techniques to rank the features where the highly ranked features are selected and applied to the learning algorithm. Different feature ranking techniques have been proposed for feature evaluations such as information gain and gain ratio. In this research, we applied filter-method using information gain based selection algorithm to evaluate the feature ranks, checking which features are most important to build students’ performance model. Figure6, shows the feature ranks after filter-based evaluation. During feature selection, each feature assigned a rank value according to their influence on data classification. The highly ranked features are selected while others are excluded. Online Version Only. Book made by this file is ILLEGAL. International Journal of Database Theory and Application Vol.9, No.8 (2016) Copyright ⓒ 2016 SERSC 127 Download 1.57 Mb. Do'stlaringiz bilan baham: |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling