Software engineering


Download 341.69 Kb.
bet16/21
Sana20.12.2022
Hajmi341.69 Kb.
#1035265
1   ...   13   14   15   16   17   18   19   20   21
Bog'liq
MASHINA-LEARNING2

Cross Validation
One of the challenges of machine learning is training models that are able to generalize well to unseen data (overfitting versus underfitting or a bias-variance trade-off). The main idea behind cross validation is to split the data one time or several times so that each split is used once as a validation set and the remainder is used as a training set: part of the data (the training sample) is used to train the algorithm, and the remaining part (the validation sample) is used for estimating the risk of the algorithm. Cross validation allows us to obtain reliable estimates of the model’s generalization error. It is easiest to understand it with an example. When doing k-fold cross validation, we randomly split the training data into k folds. Then we train the model using k-1 folds and evaluate the performance on the kth fold. We repeat this process k times and average the resulting scores.
Figure 4-6 shows an example of cross validation, where the data is split into five sets and in each round one of the sets is used for validation.
Figure 4-6. Cross validation
A potential drawback of cross validation is the computational cost, especially when paired with a grid search for hyperparameter tuning. Cross validation can be performed in a couple of lines using the sklearn package; we will perform cross validation in the supervised learning case studies.
In the next section, we cover the evaluation metrics for the supervised learning models that are used to measure and compare the models’ performance.
Evaluation Metrics
The metrics used to evaluate the machine learning algorithms are very important. The choice of metrics to use influences how the performance of machine learning algorithms is measured and compared. The metrics influence both how you weight the importance of different characteristics in the results and your ultimate choice of algorithm.
The main evaluation metrics for regression and classification are illustrated in Figure 4-7.
Regression ■ Classification
• Mean absolute error (MAE) H 'Accuracy
• Mean squared error (MSE) I 'Precision
• R squared (R2) I • Recall
• Adjusted R squared (Adj-R2) I • Area under curve (AUC)
■ • Confusion matrix

Download 341.69 Kb.

Do'stlaringiz bilan baham:
1   ...   13   14   15   16   17   18   19   20   21




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling