Software engineering


from sklearn.neighbors import


Download 341.69 Kb.
bet8/21
Sana20.12.2022
Hajmi341.69 Kb.
#1035265
1   ...   4   5   6   7   8   9   10   11   ...   21
Bog'liq
MASHINA-LEARNING2

from sklearn.neighbors import KNeighborsClassifier model = KNeighborsClassifier() model.fit(X, Y)
Regression
from sklearn.neighbors import KNeighborsRegressor model = KNeighborsRegressor() model.fit(X, Y)
Hyperparameters
The following key parameters are present in the sklearn implementation of KNN and can be tweaked while performing the grid search:
Number of neighbors (n_neighbors in sklearn)
The most important hyperparameter for KNN is the number of neighbors (n_neighbors). Good values are between 1 and 20.
Distance metric (metric in sklearn)
It may also be interesting to test different distance metrics for choosing the composition of the neighborhood. Good values are euclidean and manhattan.
Advantages and disadvantages
In terms of advantages, no training is involved and hence there is no learning phase. Since the algorithm requires no training before making predictions, new data can be added seamlessly
without impacting the accuracy of the algorithm. It is intuitive and easy to understand. The model naturally handles multiclass classification and can learn complex decision boundaries. KNN is effective if the training data is large. It is also robust to noisy data, and there is no need to filter the outliers.
In terms of the disadvantages, the distance metric to choose is not obvious and difficult to justify in many cases. KNN performs poorly on high dimensional datasets. It is expensive and slow to predict new instances because the distance to all neighbors must be recalculated. KNN is sensitive to noise in the dataset. We need to manually input missing values and remove outliers. Also, feature scaling (standardization and normalization) is required before applying the KNN algorithm to any dataset; otherwise, KNN may generate wrong predictions.
Linear Discriminant Analysis
The objective of the linear discriminant analysis (LDA) algorithm is to project the data onto a lower-dimensional space in a way that the class separability is maximized and the variance within a class is minimized.4
During the training of the LDA model, the statistical properties (i.e., mean and covariance matrix) of each class are computed. The statistical properties are estimated on the basis of the following assumptions about the data:
• Data is normally distributed, so that each variable is shaped like a bell curve when plotted.
• Each attribute has the same variance, and the values of each variable vary around the mean by the same amount on average.
To make a prediction, LDA estimates the probability that a new set of inputs belongs to every class. The output class is the one that has the highest probability.
Implementation in Python and hyperparameters
The LDA classification model can be constructed using the sklearn package of Python, as shown in the following code snippet:

Download 341.69 Kb.

Do'stlaringiz bilan baham:
1   ...   4   5   6   7   8   9   10   11   ...   21




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling