Software engineering


from sklearn.ensemble import


Download 341.69 Kb.
bet11/21
Sana20.12.2022
Hajmi341.69 Kb.
#1035265
1   ...   7   8   9   10   11   12   13   14   ...   21
Bog'liq
MASHINA-LEARNING2

from sklearn.ensemble import RandomForestClassifier model = RandomForestClassifier() model.fit(X, Y)
Regression
from sklearn.ensemble import RandomForestRegressor model = RandomForestRegressor() model.fit(X, Y)
Hyperparameters
Some of the main hyperparameters that are present in the sklearn implementation of random forest and that can be tweaked while performing the grid search are:
Maximum number offeatures (max_features in sklearn)
This is the most important parameter. It is the number of random features to sample at each split point. You could try a range of integer values, such as 1 to 20, or 1 to half the number of input features.
Number of estimators (n_estimators in sklearn)
This parameter represents the number of trees. Ideally, this should be increased until no further improvement is seen in the model. Good values might be a log scale from 10 to 1,000.
Advantages and disadvantages
The random forest algorithm (or model) has gained huge popularity in ML applications during the last decade due to its good performance, scalability, and ease of use. It is flexible and naturally assigns feature importance scores, so it can handle redundant feature columns. It scales
to large datasets and is generally robust to overfitting. The algorithm doesn’t need the data to be
scaled and can model a nonlinear relationship.
In terms of disadvantages, random forest can feel like a black box approach, as we have very little control over what the model does, and the results may be difficult to interpret. Although random forest does a good job at classification, it may not be good for regression problems, as it does not give a precise continuous nature prediction. In the case of regression, it doesn’t predict beyond the range in the training data and may overfit datasets that are particularly noisy.
Extra trees
Extra trees, otherwise known as extremely randomized trees, is a variant of a random forest; it builds multiple trees and splits nodes using random subsets of features similar to random forest. However, unlike random forest, where observations are drawn with replacement, the observations are drawn without replacement in extra trees. So there is no repetition of observations.
Additionally, random forest selects the best split to convert the parent into the two most homogeneous child nodes.6 However, extra trees selects a random split to divide the parent node into two random child nodes. In extra trees, randomness doesn’t come from bootstrapping the data; it comes from the random splits of all observations.
In real-world cases, performance is comparable to an ordinary random forest, sometimes a bit better. The advantages and disadvantages of extra trees are similar to those of random forest.
Implementation In Python
Extra trees regression and classification models can be constructed using the sklearn package of Python, as shown in the following code snippet. The hyperparameters of extra trees are similar to random forest, as shown in the previous section:
Classification

Download 341.69 Kb.

Do'stlaringiz bilan baham:
1   ...   7   8   9   10   11   12   13   14   ...   21




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling