added plot to readme

2024-06-26 18:44:50 +02:00 · 2024-06-26 18:44:50 +02:00 · c29c65e958
parent 3ac4880237
commit c29c65e958
4 changed files with 100 additions and 18 deletions
--- a/README.md
+++ b/README.md
@ -255,7 +255,13 @@ The ARI and NMI scores indicate that the clustering algorithm has a moderate lev
 The Silhouette Score suggests that the clusters identified are internally cohesive and distinct from each other, indicating that the clustering algorithm has been somewhat successful in identifying meaningful structures within the data, even if these structures do not align perfectly with the true labels.
-Further analysis included the creation of a Euclidean distance matrix plot to visualize patterns of data point separation. This analysis revealed the presence of outliers, as some data points were significantly more distant from others. Finally, a parallel axis plot was generated to examine the relationship between data features and the clusters. Notably, this plot highlighted the ventricular rate feature as a significant separator in the original labels, underscoring its importance as identified by our machine learning models in predicting the labels.
+Further analysis included the creation of a Euclidean distance matrix plot to visualize patterns of data point separation. This analysis revealed the presence of outliers, as some data points were significantly more distant from others.
 Finally, a parallel axis plot was generated to examine the relationship between data features and the clusters. Notably, this plot highlighted the ventricular rate feature as a significant separator in the original labels, underscoring its importance as identified by our machine learning models in predicting the labels.
 ![Alt-Text](readme_data/Cluster_analysis.png)
 <br>The detailed procedures can be found in the following notebook:
 <br>[cluster_features.ipynb](notebooks/cluster_features.ipynb)
--- a/notebooks/cluster_features.ipynb
+++ b/notebooks/cluster_features.ipynb
--- a/notebooks/demographic_plots.ipynb
+++ b/notebooks/demographic_plots.ipynb
@ -11,7 +11,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
@ -61,9 +61,42 @@
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 3,
   "metadata": {},
-   "outputs": [],
+   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Average age:  59.58733889924617\n",
      "Std Dev age:  18.29087120360519\n",
      "Average age group:  age_group\n",
      "(0, 10]      6.715503\n",
      "(10, 20]    16.360606\n",
      "(20, 30]    26.066710\n",
      "(30, 40]    35.847409\n",
      "(40, 50]    46.229902\n",
      "(50, 60]    55.403579\n",
      "(60, 70]    65.557701\n",
      "(70, 80]    75.208785\n",
      "(80, 90]    84.706091\n",
      "Name: age, dtype: float64\n",
      "Std Dev age group:  age_group\n",
      "(0, 10]     1.883777\n",
      "(10, 20]    2.817185\n",
      "(20, 30]    2.968634\n",
      "(30, 40]    2.878519\n",
      "(40, 50]    2.749121\n",
      "(50, 60]    2.936383\n",
      "(60, 70]    2.884971\n",
      "(70, 80]    2.945118\n",
      "(80, 90]    2.749137\n",
      "Name: age, dtype: float64\n",
      "Male Ratio:  0.5733970981600065\n",
      "Female Ratio: 0.42657588284564046\n"
     ]
    }
   ],
   "source": [
    "# avg age and std dev overall and for each group\n",
    "avg_age = df_dgc['age'].mean()\n",
--- a/readme_data/cluster_analysis.png
+++ b/readme_data/cluster_analysis.png