Software engineering
Download 341.69 Kb.
|
MASHINA-LEARNING2
- Bu sahifa navigatsiya:
- K-Nearest Neighbors
from sklearn.svm import SVR
model = SVR () model.fit(X, Y) Classification from sklearn.svm import SVC model = SVC() model.fit(X, Y) Hyperparameters The following key parameters are present in the sklearn implementation of SVM and can be tweaked while performing the grid search: Kernels (kernel in sklearn) The choice of kernel controls the manner in which the input variables will be projected. There are many kernels to choose from, but linear and RBF are the most common. Penalty (c in sklearn) The penalty parameter tells the SVM optimization how much you want to avoid misclassifying each training example. For large values of the penalty parameter, the optimization will choose a smaller-margin hyperplane. Good values might be a log scale from 10 to 1,000. Advantages and disadvantages In terms of advantages, SVM is fairly robust against overfitting, especially in higher dimensional space. It handles the nonlinear relationships quite well, with many kernels to choose from. Also, there is no distributional requirement for the data. In terms of disadvantages, SVM can be inefficient to train and memory-intensive to run and tune. It doesn’t perform well with large datasets. It requires the feature scaling of the data. There are also many hyperparameters, and their meanings are often not intuitive. K-Nearest Neighbors K-nearest neighbors (KNN) is considered a “lazy learner,” as there is no learning required in the model. For a new data point, predictions are made by searching through the entire training set for the K most similar instances (the neighbors) and summarizing the output variable for those K instances. To determine which of the K instances in the training dataset are most similar to a new input, a distance measure is used. The most popular distance measure is Euclidean distance, which is calculated as the square root of the sum of the squared differences between a point a and a point b across all input attributes i, and which is represented as d ( a , b ) = X i=1 n (a i -b i ) 2 . Euclidean distance is a good distance measure to use if the input variables are similar in type. Another distance metric is Manhattan distance, in which the distance between point a and point b is represented as d ( a , b ) = X i=1 n | a i - b i | . Manhattan distance is a good measure to use if the input variables are not similar in type. The steps of KNN can be summarized as follows: 1. Choose the number of K and a distance metric. 2. Find the K-nearest neighbors of the sample that we want to classify. 3. Assign the class label by majority vote. KNN regression and classification models can be constructed using the sklearn package of Python, as shown in the following code: Classification Download 341.69 Kb. Do'stlaringiz bilan baham: |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling