My DS Coding Bolg: Xgboost

import xgboost as xgb # Create XGB Classifier object xgb_clf = xgb.XGBClassifier(objective = "multi:softmax") # Fit model xgb_model = xgb_clf.fit(X_train, target_train) # Predictions y_train_preds = xgb_model.predict(X_train) y_test_preds = xgb_model.predict(X_test) # Print F1 scores and Accuracy print("Training F1 Micro Average: ", f1_score(target_train, y_train_preds, average = "micro")) print("Test F1 Micro Average: ", f1_score(target_test, y_test_preds, average = "micro")) print("Test Accuracy: ", accuracy_score(target_test, y_test_preds))

from sklearn.model_selection import RandomizedSearchCV import xgboost as xgb

# Create XGB Classifier object

xgb_clf = xgb.XGBClassifier(tree_method = "gpu_exact", predictor = "gpu_predictor", verbosity = True

eval_metric = ["merror", "map", "auc"], objective = "multi:softmax")

# Create parameter grid

parameters = {"learning_rate": [0.1, 0.01, 0.001],

"gamma" : [0.01, 0.1, 0.3, 0.5, 1, 1.5, 2],

"max_depth": [2, 4, 7, 10],

"colsample_bytree": [0.3, 0.6, 0.8, 1.0],

"subsample": [0.2, 0.4, 0.5, 0.6, 0.7],

"reg_alpha": [0, 0.5, 1],

"reg_lambda": [1, 1.5, 2, 3, 4.5],

"min_child_weight": [1, 3, 5, 7],

"n_estimators": [100, 250, 500, 1000]}

# Create RandomizedSearchCV Object

xgb_rscv = RandomizedSearchCV(xgb_clf, param_distributions = parameters, scoring = "f1_micro",

cv = 7, verbose = 3, random_state = 40)

# Fit the model

model_xgboost = xgb_rscv.fit(X_train, target_train)

Now, let’s take a look at each hyperparameter individually.

learning_rate: to start, let’s clarify that this learning rate is not the same as in gradient descent. In the case of gradient boosting, the learning rate is meant to lessen the effect of each additional tree to the model. In their paper, A Scalable Tree Boosting System Tianqi Chen and Carlos Guestrin refer to this regularization technique as shrinkage, and it is an additional method to prevent overfitting. The lower the learning rate, the more robust the model will be in preventing overfitting.
gamma: mathematically, this is known as the Lagrangian Multiplier, and its purpose is complexity control. It is a pseudo-regularization term for the loss function; and it represents by how much the loss has to be reduced when considering a split, in order for that split to happen.
max_depth: refers to the depth of a tree. It sets the maximum number of nodes that can exist between the root and the farthest leaf. Remember that deeper trees are prone to overfitting.
colsample_bytreee: represents a fraction of the columns (features) to be considered at each tree built, and so it occurs once for every tree constructed. It is referred to in the paper A Scalable Tree Boosting Systemby Tianqi Chen and Carlos Guestrin as another of the main techniques to prevent overfitting and to improve the computational speed.
subsample: represents a fraction of the rows (observations) to be considered when building each subtree. Tianqi Chen and Carlos Guestrin in their paper A Scalable Tree Boosting System recommend colsample_bytree over subsample to prevent overfitting, as they found that the former is more effective for this purpose.
reg_alpha: L1 regularization term. L1 regularization encourages sparsity (meaning pulling weights to 0). It can be more useful when the objective is logistic regression since you might need help with feature selection.
reg_lambda: L2 regularization term. L2 encourages smaller weights, this approach can be more useful in tree-models where zeroing features might not make much sense.
min_child_weight: similar to gamma, as it performs regularization at the splitting step. It is the minimum Hessian weight required to create a new node. The Hessian is the second derivative.
n_estimators: the number of trees to fit.
booster: allows you to choose which booster to use: gbtree, gblinearor dart. We’ve been using gbtree, but dart and gblinear also have their own additional hyperparameters to explore.
scale_pos_weight: balances between negative and positive weights, and should definitely be used in cases where the data present high class imbalance.
importance_type: refers to the feature importance type to be used by the feature_importances_ method. gain calculates the relative contribution of a feature to all the trees in a model (the higher the relative gain, the more relevant the feature). cover calculates the relative number of observations related to a feature when used to decide the leaf node. weight measures the relative number of times a feature is used to split the data across all the trees in a model.
base_score: global bias. This parameter is useful when dealing with high class imbalance.
max_delta_step: sets the maximum absolute value possible for the weights. Also useful when dealing with unbalanced classes.

# Create XGB Classifier object
	xgb_clf = xgb.XGBClassifier(tree_method = "exact", predictor = "cpu_predictor", verbosity = True,
	objective = "multi:softmax")

	# Create parameter grid
	parameters = {"learning_rate": [0.1, 0.01, 0.001],
	"gamma" : [0.01, 0.1, 0.3, 0.5, 1, 1.5, 2],
	"max_depth": [2, 4, 7, 10],
	"colsample_bytree": [0.3, 0.6, 0.8, 1.0],
	"subsample": [0.2, 0.4, 0.5, 0.6, 0.7],
	"reg_alpha": [0, 0.5, 1],
	"reg_lambda": [1, 1.5, 2, 3, 4.5],
	"min_child_weight": [1, 3, 5, 7],
	"n_estimators": [100, 250, 500, 1000]}

	from sklearn.model_selection import RandomizedSearchCV
	# Create RandomizedSearchCV Object
	xgb_rscv = RandomizedSearchCV(xgb_clf, param_distributions = parameters, scoring = "f1_micro",
	cv = 10, verbose = 3, random_state = 40 )

	# Fit the model
	model_xgboost = xgb_rscv.fit(X_train, target_train)

	# Model best estimators
	print("Learning Rate: ", model_xgboost.best_estimator_.get_params()["learning_rate"])
	print("Gamma: ", model_xgboost.best_estimator_.get_params()["gamma"])
	print("Max Depth: ", model_xgboost.best_estimator_.get_params()["max_depth"])
	print("Subsample: ", model_xgboost.best_estimator_.get_params()["subsample"])
	print("Max Features at Split: ", model_xgboost.best_estimator_.get_params()["colsample_bytree"])
	print("Alpha: ", model_xgboost.best_estimator_.get_params()["reg_alpha"])
	print("Lamda: ", model_xgboost.best_estimator_.get_params()["reg_lambda"])
	print("Minimum Sum of the Instance Weight Hessian to Make a Child: ",
	model_xgboost.best_estimator_.get_params()["min_child_weight"])
	print("Number of Trees: ", model_xgboost.best_estimator_.get_params()["n_estimators"])

My DS Coding Bolg

Saturday, September 12, 2020

Xgboost

No comments:

Post a Comment

Blog Archive