Software engineering


Download 341.69 Kb.
bet19/21
Sana20.12.2022
Hajmi341.69 Kb.
#1035265
1   ...   13   14   15   16   17   18   19   20   21
Bog'liq
MASHINA-LEARNING2

CO
Q)
5
ro
Positive (1)
tj? Negative (0)

/->1
TP

r-a
FN
^ j

f A
FP
^__j

r S
TN
^ j

Figure 4-9. Confusion matrix
The confusion matrix is a handy presentation of the accuracy of a model with two or more classes. The table presents predictions on the x-axis and accuracy outcomes on the y-axis. The cells of the table are the number of predictions made by the model. For example, a model can predict zero or one, and each prediction may actually have been a zero or a one. Predictions for zero that were actually zero appear in the cell for prediction = 0 and actual = 0, whereas predictions for zero that were actually one appear in the cell for prediction = 0 and actual = 1.
Selecting an evaluation metric for supervised classification
The evaluation metric for classification depends heavily on the task at hand. For example, recall is a good measure when there is a high cost associated with false negatives such as fraud detection. We will further examine these evaluation metrics in the case studies.
Model Selection
Selecting the perfect machine learning model is both an art and a science. Looking at machine learning models, there is no one solution or approach that fits all. There are several factors that can affect your choice of a machine learning model. The main criteria in most of the cases is the model performance that we discussed in the previous section. However, there are many other factors to consider while performing model selection. In the following section, we will go over all such factors, followed by a discussion of model trade-offs.
Factors for Model Selection
The factors considered for the model selection process are as follows:
Simplicity
The degree of simplicity of the model. Simplicity usually results in quicker, more scalable, and easier to understand models and results.
Training time
Speed, performance, memory usage and overall time taken for model training.
Handle nonlinearity in the data
The ability of the model to handle the nonlinear relationship between the variables. Robustness to overfitting
The ability of the model to handle overfitting.
Size of the dataset
The ability of the model to handle large number of training examples in the dataset. Number of features
The ability of the model to handle high dimensionality of the feature space.
Model interpretation
How explainable is the model? Model interpretability is important because it allows us to take concrete actions to solve the underlying problem.
Feature scaling
Does the model require variables to be scaled or normally distributed?
Figure 4-10 compares the supervised learning models on the factors mentioned previously and outlines a general rule-of-thumb to narrow down the search for the best machine learning algorithm7 for a given problem. The table is based on the advantages and disadvantages of different models discussed in the individual model section in this chapter.

_

Linear
gression

Logistic
regression

SVM

c

Simplicity

V

Training Time

V

<✓


Download 341.69 Kb.

Do'stlaringiz bilan baham:
1   ...   13   14   15   16   17   18   19   20   21




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling