当前位置：首页 > news >正文

江苏城乡建设教育网站茂名做网站报价

news 2025/12/25 23:32:58

江苏城乡建设教育网站,茂名做网站报价,郴州信息港网站,怎样做网络销售平台SHAP#xff08;六#xff09;#xff1a;使用 XGBoost 和 HyperOpt 进行信用卡欺诈检测本笔记本介绍了 XGBoost Classifier 在金融行业中的实现#xff0c;特别是在信用卡欺诈检测方面。构建 XGBoost 分类器后#xff0c;它将使用 HyperOpt 库#xff08;sklearn 的 …SHAP六使用 XGBoost 和 HyperOpt 进行信用卡欺诈检测本笔记本介绍了 XGBoost Classifier 在金融行业中的实现特别是在信用卡欺诈检测方面。构建 XGBoost 分类器后它将使用 HyperOpt 库sklearn 的 GridSearchCV 和 RandomziedSearchCV 算法的替代方案来调整各种模型参数目标是实现正常交易和欺诈交易分类的最大 f1 分数。作为模型评估的一部分将计算 f1 分数度量为分类构建混淆矩阵生成分类报告并绘制精确召回曲线。最后将根据 XGBoost 的内部算法以及特征重要性的 SHAP 实现来计算和绘制特征重要性。来源https://github.com/albazahm/Credit_Card_Fraud_Detection_with_XGBoost_and_HyperOpt/tree/master 1. Loading Libraries and Data #loading libraries import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.metrics import f1_score, make_scorer, confusion_matrix, classification_report, precision_recall_curve, plot_precision_recall_curve, average_precision_score, auc from sklearn.model_selection import train_test_split import seaborn as sns from hyperopt import hp, fmin, tpe, Trials, STATUS_OK import xgboost as xgb import shap # Any results you write to the current directory are saved as output./kaggle/input/creditcardfraud/creditcard.csv#loading the data into a dataframe credit_df pd.read_csv(./creditcard.csv)2. Data Overview #preview of the first 10 rows of data credit_df.head(10)TimeV1V2V3V4V5V6V7V8V9...V21V22V23V24V25V26V27V28AmountClass00.0-1.359807-0.0727812.5363471.378155-0.3383210.4623880.2395990.0986980.363787...-0.0183070.277838-0.1104740.0669280.128539-0.1891150.133558-0.021053149.62010.01.1918570.2661510.1664800.4481540.060018-0.082361-0.0788030.085102-0.255425...-0.225775-0.6386720.101288-0.3398460.1671700.125895-0.0089830.0147242.69021.0-1.358354-1.3401631.7732090.379780-0.5031981.8004990.7914610.247676-1.514654...0.2479980.7716790.909412-0.689281-0.327642-0.139097-0.055353-0.059752378.66031.0-0.966272-0.1852261.792993-0.863291-0.0103091.2472030.2376090.377436-1.387024...-0.1083000.005274-0.190321-1.1755750.647376-0.2219290.0627230.061458123.50042.0-1.1582330.8777371.5487180.403034-0.4071930.0959210.592941-0.2705330.817739...-0.0094310.798278-0.1374580.141267-0.2060100.5022920.2194220.21515369.99052.0-0.4259660.9605231.141109-0.1682520.420987-0.0297280.4762010.260314-0.568671...-0.208254-0.559825-0.026398-0.371427-0.2327940.1059150.2538440.0810803.67064.01.2296580.1410040.0453711.2026130.1918810.272708-0.0051590.0812130.464960...-0.167716-0.270710-0.154104-0.7800550.750137-0.2572370.0345070.0051684.99077.0-0.6442691.4179641.074380-0.4921990.9489340.4281181.120631-3.8078640.615375...1.943465-1.0154550.057504-0.649709-0.415267-0.051634-1.206921-1.08533940.80087.0-0.8942860.286157-0.113192-0.2715262.6695993.7218180.3701450.851084-0.392048...-0.073425-0.268092-0.2042331.0115920.373205-0.3841570.0117470.14240493.20099.0-0.3382621.1195931.044367-0.2221870.499361-0.2467610.6515830.069539-0.736727...-0.246914-0.633753-0.120794-0.385050-0.0697330.0941990.2462190.0830763.680 10 rows × 31 columns #displaying descriptive statistics credit_df.describe()TimeV1V2V3V4V5V6V7V8V9...V21V22V23V24V25V26V27V28AmountClasscount284807.0000002.848070e052.848070e052.848070e052.848070e052.848070e052.848070e052.848070e052.848070e052.848070e05...2.848070e052.848070e052.848070e052.848070e052.848070e052.848070e052.848070e052.848070e05284807.000000284807.000000mean94813.8595753.919560e-155.688174e-16-8.769071e-152.782312e-15-1.552563e-152.010663e-15-1.694249e-15-1.927028e-16-3.137024e-15...1.537294e-167.959909e-165.367590e-164.458112e-151.453003e-151.699104e-15-3.660161e-16-1.206049e-1688.3496190.001727std47488.1459551.958696e001.651309e001.516255e001.415869e001.380247e001.332271e001.237094e001.194353e001.098632e00...7.345240e-017.257016e-016.244603e-016.056471e-015.212781e-014.822270e-014.036325e-013.300833e-01250.1201090.041527min0.000000-5.640751e01-7.271573e01-4.832559e01-5.683171e00-1.137433e02-2.616051e01-4.355724e01-7.321672e01-1.343407e01...-3.483038e01-1.093314e01-4.480774e01-2.836627e00-1.029540e01-2.604551e00-2.256568e01-1.543008e010.0000000.00000025%54201.500000-9.203734e-01-5.985499e-01-8.903648e-01-8.486401e-01-6.915971e-01-7.682956e-01-5.540759e-01-2.086297e-01-6.430976e-01...-2.283949e-01-5.423504e-01-1.618463e-01-3.545861e-01-3.171451e-01-3.269839e-01-7.083953e-02-5.295979e-025.6000000.00000050%84692.0000001.810880e-026.548556e-021.798463e-01-1.984653e-02-5.433583e-02-2.741871e-014.010308e-022.235804e-02-5.142873e-02...-2.945017e-026.781943e-03-1.119293e-024.097606e-021.659350e-02-5.213911e-021.342146e-031.124383e-0222.0000000.00000075%139320.5000001.315642e008.037239e-011.027196e007.433413e-016.119264e-013.985649e-015.704361e-013.273459e-015.971390e-01...1.863772e-015.285536e-011.476421e-014.395266e-013.507156e-012.409522e-019.104512e-027.827995e-0277.1650000.000000max172792.0000002.454930e002.205773e019.382558e001.687534e013.480167e017.330163e011.205895e022.000721e011.559499e01...2.720284e011.050309e012.252841e014.584549e007.519589e003.517346e003.161220e013.384781e0125691.1600001.000000 8 rows × 31 columns #exploring datatypes and count of non-NULL rows for each feature credit_df.info()class pandas.core.frame.DataFrame RangeIndex: 284807 entries, 0 to 284806 Data columns (total 31 columns): Time 284807 non-null float64 V1 284807 non-null float64 V2 284807 non-null float64 V3 284807 non-null float64 V4 284807 non-null float64 V5 284807 non-null float64 V6 284807 non-null float64 V7 284807 non-null float64 V8 284807 non-null float64 V9 284807 non-null float64 V10 284807 non-null float64 V11 284807 non-null float64 V12 284807 non-null float64 V13 284807 non-null float64 V14 284807 non-null float64 V15 284807 non-null float64 V16 284807 non-null float64 V17 284807 non-null float64 V18 284807 non-null float64 V19 284807 non-null float64 V20 284807 non-null float64 V21 284807 non-null float64 V22 284807 non-null float64 V23 284807 non-null float64 V24 284807 non-null float64 V25 284807 non-null float64 V26 284807 non-null float64 V27 284807 non-null float64 V28 284807 non-null float64 Amount 284807 non-null float64 Class 284807 non-null int64 dtypes: float64(30), int64(1) memory usage: 67.4 MB3. Data Preparation 在这里我们查找并删除数据中的重复观测值定义用于分类的自变量 (X) 和因变量 (Y)并分离出验证集和测试集。 #checking for duplicated observations credit_df.duplicated().value_counts()False 283726 True 1081 dtype: int64#dropping duplicated observations credit_df credit_df.drop_duplicates()#defining independent (X) and dependent (Y) variables from dataframe X credit_df.drop(columns Class) Y credit_df[Class].values#splitting a testing set from the data X_train, X_test, Y_train, Y_test train_test_split(X, Y, test_size 0.20, stratify Y, random_state 42) #splitting a validation set from the training set to tune parameters X_train, X_val, Y_train, Y_val train_test_split(X_train, Y_train, test_size 0.20, stratify Y_train, random_state 42)4. Model Set-Up and Training 在本节中我们基于 f1 度量创建一个评分器并为 XGBoost 模型定义参数搜索空间。此外我们定义了一个包含分类器的函数提取其预测计算损失并将其提供给优化器。最后我们使用所需的设置初始化优化器运行它并查看试验中的参数和分数。 #creating a scorer from the f1-score metric f1_scorer make_scorer(f1_score)# defining the space for hyperparameter tuning space {eta: hp.uniform(eta, 0.1, 1),max_depth: hp.quniform(max_depth, 3, 18, 1),gamma: hp.uniform (gamma, 1,9),reg_alpha : hp.quniform(reg_alpha, 50, 200, 1),reg_lambda : hp.uniform(reg_lambda, 0, 1),colsample_bytree : hp.uniform(colsample_bytree, 0.5, 1),min_child_weight : hp.quniform(min_child_weight, 0, 10, 1),n_estimators: hp.quniform(n_estimators, 100, 200, 10)}#defining function to optimize def hyperparameter_tuning(space):clf xgb.XGBClassifier(n_estimators int(space[n_estimators]), #number of trees to useeta space[eta], #learning ratemax_depth int(space[max_depth]), #depth of treesgamma space[gamma], #loss reduction required to further partition treereg_alpha int(space[reg_alpha]), #L1 regularization for weightsreg_lambda space[reg_lambda], #L2 regularization for weightsmin_child_weight space[min_child_weight], #minimum sum of instance weight needed in childcolsample_bytree space[colsample_bytree], #ratio of column sampling for each treenthread -1) #number of parallel threads usedevaluation [(X_train, Y_train), (X_val, Y_val)]clf.fit(X_train, Y_train,eval_set evaluation,early_stopping_rounds 10,verbose False)pred clf.predict(X_val)pred [1 if i 0.5 else 0 for i in pred]f1 f1_score(Y_val, pred)print (SCORE:, f1)return {loss: -f1, status: STATUS_OK }# run the hyper paramter tuning trials Trials() best fmin(fn hyperparameter_tuning,space space,algo tpe.suggest,max_evals 100,trials trials)print (best)SCORE: 0.7552447552447553 SCORE: 0.0 SCORE: 0.0 SCORE: 0.0 SCORE: 0.0 SCORE: 0.0 SCORE: 0.0 SCORE: 0.0 SCORE: 0.0 SCORE: 0.0 SCORE: 0.8169014084507042 SCORE: 0.0 SCORE: 0.0 SCORE: 0.0 SCORE: 0.6666666666666666 SCORE: 0.7737226277372262 SCORE: 0.0 SCORE: 0.0 SCORE: 0.0 SCORE: 0.0 SCORE: 0.8169014084507042 SCORE: 0.8169014084507042 SCORE: 0.8169014084507042 SCORE: 0.7891156462585034 SCORE: 0.7401574803149605 SCORE: 0.7737226277372262 SCORE: 0.7971014492753624 SCORE: 0.7499999999999999 SCORE: 0.0 SCORE: 0.7552447552447553 SCORE: 0.0 SCORE: 0.7883211678832117 SCORE: 0.7891156462585034 SCORE: 0.7737226277372262 SCORE: 0.782608695652174 SCORE: 0.8055555555555555 SCORE: 0.7401574803149605 SCORE: 0.0 SCORE: 0.0 SCORE: 0.7552447552447553 SCORE: 0.0 SCORE: 0.0 SCORE: 0.0 SCORE: 0.0 SCORE: 0.0 SCORE: 0.0 SCORE: 0.7737226277372262 SCORE: 0.7499999999999999 SCORE: 0.0 SCORE: 0.8085106382978723 SCORE: 0.0 SCORE: 0.0 SCORE: 0.0 SCORE: 0.7401574803149605 SCORE: 0.0 SCORE: 0.7972972972972973 SCORE: 0.608695652173913 SCORE: 0.7552447552447553 SCORE: 0.0 SCORE: 0.0 SCORE: 0.7384615384615385 SCORE: 0.8169014084507042 SCORE: 0.802919708029197 SCORE: 0.8169014084507042 SCORE: 0.8201438848920864 SCORE: 0.8201438848920864 SCORE: 0.8201438848920864 SCORE: 0.8085106382978723 SCORE: 0.8169014084507042 SCORE: 0.8085106382978723 SCORE: 0.7910447761194029 SCORE: 0.0 SCORE: 0.7819548872180451 SCORE: 0.802919708029197 SCORE: 0.8085106382978723 SCORE: 0.8169014084507042 SCORE: 0.7910447761194029 SCORE: 0.7910447761194029 SCORE: 0.0 SCORE: 0.0 SCORE: 0.0 SCORE: 0.7999999999999999 SCORE: 0.8085106382978723 SCORE: 0.8169014084507042 SCORE: 0.7692307692307692 SCORE: 0.7999999999999999 SCORE: 0.0 SCORE: 0.7737226277372262 SCORE: 0.0 SCORE: 0.0 SCORE: 0.7301587301587301 SCORE: 0.7786259541984732 SCORE: 0.7878787878787878 SCORE: 0.0 SCORE: 0.7878787878787878 SCORE: 0.7692307692307692 SCORE: 0.0 SCORE: 0.7499999999999999 SCORE: 0.8169014084507042 SCORE: 0.7910447761194029 100%|██████████| 100/100 [11:2400:00, 6.84s/trial, best loss: -0.8201438848920864] {colsample_bytree: 0.9999995803500363, eta: 0.1316102455832729, gamma: 1.6313395777817137, max_depth: 5.0, min_child_weight: 3.0, n_estimators: 100.0, reg_alpha: 47.0, reg_lambda: 0.4901343161108276}#plotting feature space and f1-scores for the different trials parameters space.keys() cols len(parameters)f, axes plt.subplots(nrows1, ncolscols, figsize(20,5)) cmap plt.cm.jet for i, val in enumerate(parameters):xs np.array([t[misc][vals][val] for t in trials.trials]).ravel()ys [-t[result][loss] for t in trials.trials]xs, ys zip(*sorted(zip(xs, ys)))axes[i].scatter(xs, ys, s20, linewidth0.01, alpha0.25, ccmap(float(i)/len(parameters)))axes[i].set_title(val)axes[i].grid()#printing best model parameters print(best){colsample_bytree: 0.9999995803500363, eta: 0.1316102455832729, gamma: 1.6313395777817137, max_depth: 5.0, min_child_weight: 3.0, n_estimators: 100.0, reg_alpha: 47.0, reg_lambda: 0.4901343161108276}5. Model Test and Evaluation 本节将探讨并可视化模型在测试数据上的表现。 #initializing XGBoost Classifier with best model parameters best_clf xgb.XGBClassifier(n_estimators int(best[n_estimators]), eta best[eta], max_depth int(best[max_depth]), gamma best[gamma], reg_alpha int(best[reg_alpha]), min_child_weight best[min_child_weight], colsample_bytree best[colsample_bytree], nthread -1)#fitting XGBoost Classifier with best model parameters to training data best_clf.fit(X_train, Y_train)XGBClassifier(base_score0.5, boostergbtree, colsample_bylevel1,colsample_bynode1, colsample_bytree0.9999995803500363,eta0.1316102455832729, gamma1.6313395777817137,learning_rate0.1, max_delta_step0, max_depth5,min_child_weight3.0, missingNone, n_estimators100, n_jobs1,nthread-1, objectivebinary:logistic, random_state0,reg_alpha47, reg_lambda1, scale_pos_weight1, seedNone,silentNone, subsample1, verbosity1)#using the model to predict on the test set Y_pred best_clf.predict(X_test)#printing f1 score of test set predictions print(The f1-score on the test data is: {0:.2f}.format(f1_score(Y_test, Y_pred)))The f1-score on the test data is: 0.74#creating a confusion matrix and labels cm confusion_matrix(Y_test, Y_pred) labels [Normal, Fraud]#plotting the confusion matrix sns.heatmap(cm, annot True, xticklabels labels, yticklabels labels, fmt d) plt.xlabel(Predicted) plt.ylabel(Actual) plt.title(Confusion Matrix for Credit Card Fraud Detection)Text(0.5, 1.0, Confusion Matrix for Credit Card Fraud Detection)#printing classification report print(classification_report(Y_test, Y_pred))precision recall f1-score support0 1.00 1.00 1.00 566511 0.87 0.64 0.74 95accuracy 1.00 56746macro avg 0.94 0.82 0.87 56746 weighted avg 1.00 1.00 1.00 56746Y_score best_clf.predict_proba(X_test)[:, 1] average_precision average_precision_score(Y_test, Y_score) fig plot_precision_recall_curve(best_clf, X_test, Y_test) fig.ax_.set_title(Precision-Recall Curve: AP{0:.2f}.format(average_precision))Text(0.5, 1.0, Precision-Recall Curve: AP0.74)6. Feature Importances 本节将介绍两种算法一种在 XGBoost 中一种在 SHAP 中用于可视化特征重要性。不幸的是由于该数据集的特征是使用主成分分析PCA进行编码的因此我们无法凭直觉得出模型如何从实际角度预测正常交易和欺诈交易的结论。 #extracting the booster from model booster best_clf.get_booster()# scoring features based on information gain importance booster.get_score(importance_type gain)#rounding importances to 2 decimal places for key in importance.keys():importance[key] round(importance[key],2)# plotting feature importances ax xgb.plot_importance(importance, importance_typegain, show_valuesTrue) plt.title(Feature Importances (Gain)) plt.show()#obtaining SHAP values for XGBoost Model explainer shap.TreeExplainer(best_clf) shap_values explainer.shap_values(X_train)#plotting SHAP Values of Feature Importances shap.summary_plot(shap_values, X_train)

查看全文

http://www.huolong8.cn/news/68521/