contribution, usage added

main
Felix Jan Michael Mucha 2024-06-11 19:42:25 +02:00
parent c22da6d352
commit 9924e1675d
1 changed files with 81 additions and 11 deletions

View File

@ -40,17 +40,71 @@ The data provision provides for the following points, which can be taken from th
## Getting Started
This project was implemented in Python. To use the project, all packages listed in the requirements.txt file need to be installed first. After that, you can interact with the project as follows:
This project is implemented in Python. Follow these steps to set up and use the project:
1. Ensure you have 10GB of available space.
2. First, visit the website and download the dataset (https://doi.org/10.13026/wgex-er52, last visit: 15.05.2024).
3. Extract the data.
4. Open the generate_data.py script and adjust the "project_dir" path to point to the downloaded data.
5. Run the generate_data.py script as the main file. This will generate several pickle files, which may take some time.
6. You can now use the notebooks by adjusting the "path" variable in the top lines of each notebook to point to the pickle files.
### Prerequisites
- Ensure you have Python 3.8 or newer installed on your system.
- At least `10 GB` of available disk space and `32 GB` of RAM are recommended for optimal performance.
### Installation
1. **Download the Dataset:**
- Visit [the dataset page](https://doi.org/10.13026/wgex-er52) (last visited: 15.05.2024) and download the dataset.
- Extract the dataset to a known directory on your system.
2. **Install Dependencies:**
- Open a terminal and navigate to the project directory.
- Run `pip install -r requirements.txt` to install the required Python packages.
3. **Configure the Project:**
- Open the `settings.json` file in the project directory.
- Adjust the parameters as needed, especially the path variables to match where you extracted the dataset.
### Generating Data
1. **Generate Basic Data Files:**
- In the terminal, ensure you are in the project directory.
- Run `generate_data.py` `main-function` with the folloing parameters `gen_data=True` `gen_features=False` to generate several pickle files. This process may take some time.
2. **Generate Machine Learning Features (Optional):**
- Run `generate_data.py` `main-function` with the folloing parameters `gen_data=False` `gen_features=True` to generate a databse file `.db` for machine learning features. This also may take some time.
### Using the Project
- With the data generated, you can now proceed to use the notebooks and other data as intended in the project.
Please refer to the individual notebook files for specific instructions on running analyses or models.
## Usage
- coming at the end of the Project...
Let's walk through a user story to illustrate how to use our project, incorporating the updated "Getting Started" instructions:
### User Story: Analyzing Health Data with Emma
**Emma**, a health data analyst, is keen on exploring the relationship between ECG Signals and health outcomes. She decides to use our project for her analysis. Here's how she proceeds:
1. **Preparation:**
- Emma checks that her computer has at least 10GB of free space and 32GB of RAM.
- She visits the dataset page (https://doi.org/10.13026/wgex-er52, last visited: 15.05.2024) and downloads the dataset.
- After downloading, Emma extracts the data to a specific directory on her computer.
2. **Setting Up:**
- Emma opens a terminal, navigates to the project directory, and runs `pip install -r requirements.txt` to install the required Python packages.
- She opens the `settings.json` file in the project directory and adjusts the parameters, especially the path variables to match the directory where she extracted the dataset.
3. **Generating Data:**
- To generate basic data files, Emma ensures she's in the project directory in the terminal. She then runs `generate_data.py` and manually adjusts the script beforehand to call the `main` function with `gen_data=True` and `gen_features=False`. This process generates several pickle files and may take some time.
- For generating machine learning features (optional), Emma adjusts the script to call the `main` function with `gen_data=False` and `gen_features=True` to generate a database file `.db`. This also may take some time.
4. **Analysis:**
- With the data and features generated, Emma is now ready to dive into the analysis. She opens the provided Jupyter notebooks and can see the demographic plots, methods of feature detection and noise reduction. With the `filter_params.json` file she is also able to adujst paramters to see how it changes the noise reducing.
5. **Deep Dive:**
- Interested in the features and the resulting machine learning accurarcies, Emma uses the signal processing notebooks to analyze patterns in the health data.
- She adjusts parameters and runs different analyses, noting interesting trends and correlations.
- After Training her own models, she can also compare here results with the included models of the `ml_models` directionary to evaluate the performance of her models.
6. **Sharing Insights:**
- Emma compiles her findings into a report, using plots and insights generated from our project.
- She shares her report with her team, highlighting how features like the R Axis can influence health outcomes.
Through this process, Emma was able to leverage our project to generate meaningful insights into health data, demonstrating the project's utility in real-world analysis.
## Progress
- Data was searched and found at : (https://doi.org/10.13026/wgex-er52, last visit: 15.05.2024)
@ -94,14 +148,14 @@ The exact procedure for creating the matrix can be found in the notebook [demogr
#### Hypotheses
## Hypotheses
The following two hypotheses were applied in this project:
1. Using ECG data, a classifier can classify the four disease groupings with an accuracy of 80%.
Result:
- For the first hypothesis, an accuracy of 83 % was achieved with the XGBoost classifier.
- For the first hypothesis, an accuracy of 83 % was achieved with the XGBoost classifier. The detailed procedure can be found in the following notebook: [ml_xgboost.ipynb](notebooks/ml_xgboost.ipynb)
2. Sinus bradycardia occurs significantly more frequently in the 60 to 70 age group than in other age groups.
@ -128,7 +182,23 @@ The following two hypotheses were applied in this project:
The sample size in the study conducted may also play a role in the significance of the frequency.
## Contributing
- coming at the end of the Project...
Thank you for your interest in contributing to our project! As an open-source project, we welcome contributions from everyone. Here are some ways you can contribute:
- **Reporting Bugs:** If you find a bug, please open an issue on our GitHub page with a detailed description of the bug, steps to reproduce it, and any other relevant information that could help us fix it.
- **Suggesting Enhancements:** Have ideas on how to make this project better? Open an issue on our GitHub page with your suggestions.
- **Pull Requests:** Ready to contribute code or documentation? Great! Please follow these steps:
1. Fork the repository.
2. Create a new branch for your feature or fix.
3. Commit your changes with clear, descriptive commit messages.
4. Push your changes to your branch.
5. Submit a pull request to our repository. Include a clear description of your changes and the purpose of them.
Please note, by contributing to this project, you agree that your contributions will be licensed under its MIT License.
We look forward to your contributions. Thank you for helping us improve this project!
## License
This project is licensed under the [MIT License](https://opensource.org/licenses/MIT).