update
parent
d3a8e1dd1d
commit
3a738f277f
42
README.md
42
README.md
|
|
@ -15,7 +15,7 @@ Leverage advanced NLP techniques (LSTM, CNN, BERT, and Transformer) to analyze t
|
|||
The data is sourced from the SemEval-2021 Task 7:
|
||||
It contains a dataset of humor and offense ratings for jokes. The jokes are annotated with a humor rating on a scale from 0 to 4.
|
||||
|
||||
- Traindata: HaHackathon.https://homepages.inf.ed.ac.uk/s1573290/data.html
|
||||
- Traindata: HaHackathon.https://homepages.inf.ed.ac.uk/s1573290/data.html -> associated paper: https://aclanthology.org/2021.semeval-1.9.pdf#:~:text=HaHackathon%20is%20the%20first%20shared%20task%20to%20combine,its%20average%20ratings%20for%20both%20humor%20and%20offense
|
||||
- Testdata: Since no test data was available, the traindata also was used as test data and divided into test, train and validation data
|
||||
|
||||
|
||||
|
|
@ -24,14 +24,13 @@ It contains a dataset of humor and offense ratings for jokes. The jokes are anno
|
|||
|
||||
|
||||
### Preprocessing Steps
|
||||
**1. Daten laden und bereinigen:** Der Datensatz wird geladen und alle Zeilen mit fehlenden humor_rating-Werten werden entfernt. Außerdem wird die Zielvariable für die Humorbewertung extrahiert.
|
||||
**1. Load and clean data:** The data set is loaded and all rows with missing humor_rating values are removed. In addition, the target variable for the humor rating is extracted.
|
||||
|
||||
**2. Text-Embeddings:** Vortrainierte GloVe-Embeddings werden geladen und in eine Matrix umgewandelt, die für die Modellierung genutzt werden kann.
|
||||
**2. text embeddings:** Pre-trained GloVe embeddings are loaded and converted into a matrix that can be used for modeling.
|
||||
|
||||
**3. Datenaufteilung:** Der Datensatz wird in Trainings-, Test- und Validierungsdaten aufgeteilt, um die Modelle später zu trainieren und zu evaluieren.
|
||||
|
||||
**4. Ensemble-Datenindizes:** Verschiedene Methoden zur Erstellung von Datenindizes werden bereitgestellt, um die Trainingsdaten für Ensemble-Methoden aufzubereiten.
|
||||
**3. data splitting:** The data set is split into training, test and validation data to train and evaluate the models later.
|
||||
|
||||
**4. ensemble data indices:** Various methods for creating data indices are provided to prepare the training data for ensemble methods.
|
||||
---
|
||||
|
||||
|
||||
|
|
@ -63,7 +62,7 @@ The text data is cleaned and transformed into formats suitable for analysis. The
|
|||
Various machine learning models, including Convolutional Neural Networks (CNNs), Long Short-Term Memory Networks (LSTMs), BERT, and Transformers, are trained to predict the humor rating of jokes based on their linguistic features.
|
||||
|
||||
### 3. Model Evaluation
|
||||
The trained models are evaluated to determine their performance in predicting humor ratings. Metrics such as Mean Squared Error (MSE) and R² scores are used to assess the models.
|
||||
The trained models are evaluated to determine their performance in predicting humor ratings. Metrics such as RNSE (Root Mean Squared Error) and R² scores are used to assess the models.
|
||||
|
||||
### 4. Classification and Regression
|
||||
While the primary goal of the project is to predict the numerical humor rating (regression task), we also experiment with classification models for humor detection (e.g., humor vs. non-humor)
|
||||
|
|
@ -80,34 +79,9 @@ While the primary goal of the project is to predict the numerical humor rating (
|
|||
|
||||
3. **Humor Detection: A Transformer Gets the Last Laugh** (https://aclanthology.org/D19-1372/)
|
||||
|
||||
|
||||
|
||||
---
|
||||
## Summary
|
||||
|
||||
|
||||
|
||||
|
||||
# Master MDS Use NLP techniques to analyse texts or to build an application. Document your approach.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
## Data
|
||||
|
||||
|
||||
https://competitions.codalab.org/competitions/27446
|
||||
|
||||
https://aclanthology.org/2021.semeval-1.9.pdf#:~:text=HaHackathon%20is%20the%20first%20shared%20task%20to%20combine,its%20average%20ratings%20for%20both%20humor%20and%20offense.
|
||||
|
||||
|
||||
- Hackathon: https://homepages.inf.ed.ac.uk/s1573290/data.html
|
||||
|
||||
|
||||
|
||||
#### Not Prioritised (Pun data)
|
||||
- Challenge https://alt.qcri.org/semeval2017/task7/
|
||||
- Pun Annotated Amazon (joke not included ...): https://github.com/amazon-science/expunations/tree/main/data
|
||||
|
||||
|
||||
|
||||
|
|
|
|||
Loading…
Reference in New Issue