added Gradient Boosting Tree Classifier

2024-06-11 20:55:26 +02:00 · 2024-06-11 20:55:26 +02:00 · 642431e484
parent 9924e1675d
commit 642431e484
4 changed files with 513 additions and 12 deletions
--- a/README.md
+++ b/README.md
@ -152,11 +152,16 @@ The exact procedure for creating the matrix can be found in the notebook [demogr
 The following two hypotheses were applied in this project:
-1. Using ECG data, a classifier can classify the four disease groupings with an accuracy of 80%.
+**Hypotheses 1**:
 1. Using ECG data, a classifier can classify the four diagnostic groupings with an accuracy of at least 80%.
    Result: 
 - For the first hypothesis, an accuracy of 83 % was achieved with the XGBoost classifier. The detailed procedure can be found in the following notebook: [ml_xgboost.ipynb](notebooks/ml_xgboost.ipynb)
 - Also a 82 % accuracy was achieved with a Gradient Boosting Tree Classifier. The detailed procedure can be found in the following notebook: [ml_grad_boost_tree.ipynb](notebooks/ml_grad_boost_tree.ipynb)
 With those Classifiers, the hypothesis can be proven, that a classifier is able to classify the diagnostic Groups with a accuracy of at least 80%.
 **Hypotheses 2**:
 2. Sinus bradycardia occurs significantly more frequently in the 60 to 70 age group than in other age groups.
--- a/ml_models/best_gbt_model_20240611203442.joblib
+++ b/ml_models/best_gbt_model_20240611203442.joblib
--- a/notebooks/ml_grad_boost_tree.ipynb
+++ b/notebooks/ml_grad_boost_tree.ipynb
--- a/notebooks/ml_xgboost.ipynb
+++ b/notebooks/ml_xgboost.ipynb
@ -1,8 +1,15 @@
 {
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Extreme Gradient Boosting (XGBoost) Training and Analysis"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 36,
   "metadata": {},
   "outputs": [],
   "source": [
@ -14,6 +21,8 @@
    "import xgboost as xgb\n",
    "from sklearn.model_selection import GridSearchCV\n",
    "from sklearn.metrics import confusion_matrix\n",
    "from sklearn.impute import SimpleImputer\n",
    "from sklearn.preprocessing import MinMaxScaler\n",
    "import seaborn as sns"
   ]
  },
@ -26,7 +35,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 12,
+   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
@ -50,7 +59,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 13,
+   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
@ -323,12 +332,12 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "load the best model"
+    "load the best model to get the best hyperparameters from it"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
@ -353,14 +362,14 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 15,
+   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "[23:05:40] WARNING: C:/Users/administrator/workspace/xgboost-win64_release_1.6.0/src/learner.cc:627: \n",
+      "[20:16:51] WARNING: C:/Users/administrator/workspace/xgboost-win64_release_1.6.0/src/learner.cc:627: \n",
      "Parameters: { \"best_iteration\", \"best_ntree_limit\", \"scikit_learn\" } might not be used.\n",
      "\n",
      "  This could be a false alarm, with some parameters getting used by language bindings but\n",
@ -474,8 +483,8 @@
      "[97]\ttrain-merror:0.00029\teval-merror:0.18265\n",
      "[98]\ttrain-merror:0.00029\teval-merror:0.18265\n",
      "[99]\ttrain-merror:0.00029\teval-merror:0.18265\n",
-      "CPU times: total: 15.5 s\n",
+      "CPU times: total: 17.6 s\n",
-      "Wall time: 1.2 s\n"
+      "Wall time: 1.36 s\n"
     ]
    }
   ],
@ -497,7 +506,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 29,
+   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
@ -537,7 +546,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 30,
+   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {