added plot to readme

main
Felix Jan Michael Mucha 2024-06-26 18:44:50 +02:00
parent 3ac4880237
commit c29c65e958
4 changed files with 100 additions and 18 deletions

View File

@ -255,7 +255,13 @@ The ARI and NMI scores indicate that the clustering algorithm has a moderate lev
The Silhouette Score suggests that the clusters identified are internally cohesive and distinct from each other, indicating that the clustering algorithm has been somewhat successful in identifying meaningful structures within the data, even if these structures do not align perfectly with the true labels. The Silhouette Score suggests that the clusters identified are internally cohesive and distinct from each other, indicating that the clustering algorithm has been somewhat successful in identifying meaningful structures within the data, even if these structures do not align perfectly with the true labels.
Further analysis included the creation of a Euclidean distance matrix plot to visualize patterns of data point separation. This analysis revealed the presence of outliers, as some data points were significantly more distant from others. Finally, a parallel axis plot was generated to examine the relationship between data features and the clusters. Notably, this plot highlighted the ventricular rate feature as a significant separator in the original labels, underscoring its importance as identified by our machine learning models in predicting the labels. Further analysis included the creation of a Euclidean distance matrix plot to visualize patterns of data point separation. This analysis revealed the presence of outliers, as some data points were significantly more distant from others.
Finally, a parallel axis plot was generated to examine the relationship between data features and the clusters. Notably, this plot highlighted the ventricular rate feature as a significant separator in the original labels, underscoring its importance as identified by our machine learning models in predicting the labels.
![Alt-Text](readme_data/Cluster_analysis.png)
<br>The detailed procedures can be found in the following notebook: <br>The detailed procedures can be found in the following notebook:
<br>[cluster_features.ipynb](notebooks/cluster_features.ipynb) <br>[cluster_features.ipynb](notebooks/cluster_features.ipynb)

File diff suppressed because one or more lines are too long

View File

@ -11,7 +11,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 9, "execution_count": 1,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
@ -61,9 +61,42 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": 3,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Average age: 59.58733889924617\n",
"Std Dev age: 18.29087120360519\n",
"Average age group: age_group\n",
"(0, 10] 6.715503\n",
"(10, 20] 16.360606\n",
"(20, 30] 26.066710\n",
"(30, 40] 35.847409\n",
"(40, 50] 46.229902\n",
"(50, 60] 55.403579\n",
"(60, 70] 65.557701\n",
"(70, 80] 75.208785\n",
"(80, 90] 84.706091\n",
"Name: age, dtype: float64\n",
"Std Dev age group: age_group\n",
"(0, 10] 1.883777\n",
"(10, 20] 2.817185\n",
"(20, 30] 2.968634\n",
"(30, 40] 2.878519\n",
"(40, 50] 2.749121\n",
"(50, 60] 2.936383\n",
"(60, 70] 2.884971\n",
"(70, 80] 2.945118\n",
"(80, 90] 2.749137\n",
"Name: age, dtype: float64\n",
"Male Ratio: 0.5733970981600065\n",
"Female Ratio: 0.42657588284564046\n"
]
}
],
"source": [ "source": [
"# avg age and std dev overall and for each group\n", "# avg age and std dev overall and for each group\n",
"avg_age = df_dgc['age'].mean()\n", "avg_age = df_dgc['age'].mean()\n",

Binary file not shown.

After

Width:  |  Height:  |  Size: 732 KiB