added plot to readme
parent
3ac4880237
commit
c29c65e958
|
@ -255,7 +255,13 @@ The ARI and NMI scores indicate that the clustering algorithm has a moderate lev
|
|||
|
||||
The Silhouette Score suggests that the clusters identified are internally cohesive and distinct from each other, indicating that the clustering algorithm has been somewhat successful in identifying meaningful structures within the data, even if these structures do not align perfectly with the true labels.
|
||||
|
||||
Further analysis included the creation of a Euclidean distance matrix plot to visualize patterns of data point separation. This analysis revealed the presence of outliers, as some data points were significantly more distant from others. Finally, a parallel axis plot was generated to examine the relationship between data features and the clusters. Notably, this plot highlighted the ventricular rate feature as a significant separator in the original labels, underscoring its importance as identified by our machine learning models in predicting the labels.
|
||||
Further analysis included the creation of a Euclidean distance matrix plot to visualize patterns of data point separation. This analysis revealed the presence of outliers, as some data points were significantly more distant from others.
|
||||
|
||||
Finally, a parallel axis plot was generated to examine the relationship between data features and the clusters. Notably, this plot highlighted the ventricular rate feature as a significant separator in the original labels, underscoring its importance as identified by our machine learning models in predicting the labels.
|
||||
|
||||
|
||||
![Alt-Text](readme_data/Cluster_analysis.png)
|
||||
|
||||
|
||||
<br>The detailed procedures can be found in the following notebook:
|
||||
<br>[cluster_features.ipynb](notebooks/cluster_features.ipynb)
|
||||
|
|
File diff suppressed because one or more lines are too long
|
@ -11,7 +11,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -61,9 +61,42 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Average age: 59.58733889924617\n",
|
||||
"Std Dev age: 18.29087120360519\n",
|
||||
"Average age group: age_group\n",
|
||||
"(0, 10] 6.715503\n",
|
||||
"(10, 20] 16.360606\n",
|
||||
"(20, 30] 26.066710\n",
|
||||
"(30, 40] 35.847409\n",
|
||||
"(40, 50] 46.229902\n",
|
||||
"(50, 60] 55.403579\n",
|
||||
"(60, 70] 65.557701\n",
|
||||
"(70, 80] 75.208785\n",
|
||||
"(80, 90] 84.706091\n",
|
||||
"Name: age, dtype: float64\n",
|
||||
"Std Dev age group: age_group\n",
|
||||
"(0, 10] 1.883777\n",
|
||||
"(10, 20] 2.817185\n",
|
||||
"(20, 30] 2.968634\n",
|
||||
"(30, 40] 2.878519\n",
|
||||
"(40, 50] 2.749121\n",
|
||||
"(50, 60] 2.936383\n",
|
||||
"(60, 70] 2.884971\n",
|
||||
"(70, 80] 2.945118\n",
|
||||
"(80, 90] 2.749137\n",
|
||||
"Name: age, dtype: float64\n",
|
||||
"Male Ratio: 0.5733970981600065\n",
|
||||
"Female Ratio: 0.42657588284564046\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# avg age and std dev overall and for each group\n",
|
||||
"avg_age = df_dgc['age'].mean()\n",
|
||||
|
|
Binary file not shown.
After Width: | Height: | Size: 732 KiB |
Loading…
Reference in New Issue