added first biases

main
Felix Jan Michael Mucha 2024-06-26 18:33:03 +02:00
parent 8631cc43d7
commit 3ac4880237
2 changed files with 43 additions and 3 deletions

View File

@ -260,11 +260,21 @@ Further analysis included the creation of a Euclidean distance matrix plot to vi
<br>The detailed procedures can be found in the following notebook: <br>The detailed procedures can be found in the following notebook:
<br>[cluster_features.ipynb](notebooks/cluster_features.ipynb) <br>[cluster_features.ipynb](notebooks/cluster_features.ipynb)
## Legal basis ## Legal Basis and Data Biases
(version 03.07) (version 03.07)
- The data used all come from one hospital ### Local Bias
- Most of the data are from people of older age, predominantly from the 60-70 age group - The dataset originates exclusively from one hospital, encompassing contributions from Chapman University, Shaoxing Peoples Hospital (affiliated with Shaoxing Hospital Zhejiang University School of Medicine), and Ningbo First Hospital. This may introduce a local bias, as all data are collected from a specific geographic and institutional context.
### Demographic Bias
- The dataset predominantly features data from older individuals, with the majority of participants falling within the 60-70 age group. This demographic skew is further detailed by:
- Average age: 59.59 years
- Standard deviation of age: 18.29 years
- Male ratio: 57.34%
- Female ratio: 42.66%
This indicates a potential demographic bias towards older age groups and a gender imbalance.
# TODO
- Zustimmung und Anonymität: - Zustimmung und Anonymität:
- Datenschutz und Ethik: - Datenschutz und Ethik:

View File

@ -59,6 +59,36 @@
"df_dgc['age_group'] = pd.cut(df_dgc['age'], bins=age_categories)" "df_dgc['age_group'] = pd.cut(df_dgc['age'], bins=age_categories)"
] ]
}, },
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# avg age and std dev overall and for each group\n",
"avg_age = df_dgc['age'].mean()\n",
"std_age = df_dgc['age'].std()\n",
"avg_age_group = df_dgc.groupby('age_group')['age'].mean()\n",
"std_age_group = df_dgc.groupby('age_group')['age'].std()\n",
"\n",
"# print \n",
"print(\"Average age: \", avg_age)\n",
"print(\"Std Dev age: \", std_age)\n",
"print(\"Average age group: \", avg_age_group)\n",
"print(\"Std Dev age group: \", std_age_group)\n",
"\n",
"# female and male ratio\n",
"count_male = df_dgc[df_dgc['gender'] == 'Male'].shape[0]\n",
"count_female = df_dgc[df_dgc['gender'] == 'Female'].shape[0]\n",
"count_total = df_dgc.shape[0]\n",
"male_ratio = count_male / count_total\n",
"female_ratio = count_female / count_total\n",
"\n",
"# print\n",
"print('Male Ratio: ', male_ratio)\n",
"print('Female Ratio:', female_ratio)\n"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},