18
Fig. 8. Process of dataset splitting into training and testing sets.
This approach is realistic in real-world models, where no knowledge of the
results increases the out of sample accuracy, which represent the prediction
for an observation that it was not part of the testing data set. Also, the test set
is highly dependent on the training set, so the test part should be added to the
training part again to increase accuracy in a beneficial way.
In the evaluation phase, the predicted values coming from the output of the
algorithm are compared to the test set equivalent, resulting in two parameters:
● The training accuracy is the percentage of correct assumption using
the same dataset, and a high training accuracy results in an over-fit
model, which does not correspond to general conditions.
● The out of sample accuracy rate is
the percentage of correct
prediction from outside the dataset. This rate should be high enough
to generalize the model for real-time scenarios.
Original Collected Data
Training Set
Test Set
Testing Size
19
3.2. Algorithms
3.2.1.
Supervise Learning
Several algorithms for different applications are available in the
libraries of ML, accordingly the algorithms should be chosen carefully to get
the finest results with minimum complexity.
Commonly used algorithms
include:
Do'stlaringiz bilan baham: