Software engineering

Download 341,69 Kb.

bet	7/21
Sana	20.12.2022
Hajmi	341,69 Kb.
	#1035265

1 2 3 4 5 6 7 8 9 10 ... 21

Bog'liq
MASHINA-LEARNING2

K-Nearest Neighbors

from sklearn.svm import SVR
model = SVR () model.fit(X, Y)
Classification
from sklearn.svm import SVC
model = SVC() model.fit(X, Y)
Hyperparameters
The following key parameters are present in the sklearn implementation of SVM and can be tweaked while performing the grid search:
Kernels (kernel in sklearn)
The choice of kernel controls the manner in which the input variables will be projected. There are many kernels to choose from, but linear and RBF are the most common.
Penalty (c in sklearn)
The penalty parameter tells the SVM optimization how much you want to avoid misclassifying each training example. For large values of the penalty parameter, the optimization will choose a smaller-margin hyperplane. Good values might be a log scale from 10 to 1,000.
Advantages and disadvantages
In terms of advantages, SVM is fairly robust against overfitting, especially in higher dimensional space. It handles the nonlinear relationships quite well, with many kernels to choose from. Also, there is no distributional requirement for the data.
In terms of disadvantages, SVM can be inefficient to train and memory-intensive to run and tune. It doesn’t perform well with large datasets. It requires the feature scaling of the data. There are also many hyperparameters, and their meanings are often not intuitive.
K-Nearest Neighbors
K-nearest neighbors (KNN) is considered a “lazy learner,” as there is no learning required in the model. For a new data point, predictions are made by searching through the entire training set for the K most similar instances (the neighbors) and summarizing the output variable for those K instances.
To determine which of the K instances in the training dataset are most similar to a new input, a distance measure is used. The most popular distance measure is Euclidean distance, which is calculated as the square root of the sum of the squared differences between a point a and a point b across all input attributes i, and which is represented as d ( a , b ) = X i=1 n (a i -b i ) 2 . Euclidean distance is a good distance measure to use if the input variables are similar in type.
Another distance metric is Manhattan distance, in which the distance between point a and point b is represented as d ( a , b ) = X i=1 n | a i - b i | . Manhattan distance is a good measure to use if the input variables are not similar in type.
The steps of KNN can be summarized as follows:
1. Choose the number of K and a distance metric.
2. Find the K-nearest neighbors of the sample that we want to classify.
3. Assign the class label by majority vote.
KNN regression and classification models can be constructed using the sklearn package of Python, as shown in the following code:
Classification

Download 341,69 Kb.

Do'stlaringiz bilan baham:

1 2 3 4 5 6 7 8 9 10 ... 21