# Master MDS Use NLP techniques to analyse texts or to build an application. Document your approach.

Go to file

Felix Jan Michael Mucha c444b0d451 added helpfull functionality		2025-02-09 15:33:01 +01:00
data	fixed splitting bug	2025-02-07 23:08:24 +01:00
puns	init structure, added data exploration hack, added init transformer	2025-01-23 21:28:45 +01:00
.gitignore	added helpfull functionality	2025-02-09 15:33:01 +01:00
BalancedCELoss.py	custom loss function	2025-02-09 11:10:34 +01:00
EarlyStopping.py	added helpfull functionality	2025-02-09 15:33:01 +01:00
HumorDataset.py	added helpfull functionality	2025-02-09 15:33:01 +01:00
LICENSE	Initial commit	2025-01-17 20:26:51 +01:00
README.md	added glove embeddings	2025-01-27 20:55:22 +01:00
cnn.py	update	2025-01-27 13:56:00 +01:00
cnn_1b.ipynb	CNN 1b ausgeführt	2025-02-07 14:01:32 +01:00
data_explore_hack.ipynb	added analysis for humor rating	2025-01-27 07:09:29 +01:00
data_explore_hack_rating.ipynb	added analysis for humor rating	2025-01-27 07:09:29 +01:00
dataset_generator.py	added helpfull functionality	2025-02-09 15:33:01 +01:00
lstm_1b.py	lstm update	2025-02-09 11:35:41 +01:00
ml_evaluation.py	added helpfull functionality	2025-02-09 15:33:01 +01:00
ml_helper.py	added machine learning helper	2025-01-27 07:10:07 +01:00
ml_history.py	added helpfull functionality	2025-02-09 15:33:01 +01:00
transformer.ipynb	updated transformer models	2025-02-09 15:31:47 +01:00
transformer_reg.ipynb	updated transformer models	2025-02-09 15:31:47 +01:00

README.md

ANLP_WS24_CA2

Master MDS Use NLP techniques to analyse texts or to build an application. Document your approach.

TODOS

data

maybe buffer zone between good and bad jokes (trade off would be less data)
maybe not bineary classification
maybe change to humor detection (more data available)
dataset shape doesnt work correctly
history: integrate validation loss

Data

https://competitions.codalab.org/competitions/27446

https://aclanthology.org/2021.semeval-1.9.pdf#:~:text=HaHackathon%20is%20the%20first%20shared%20task%20to%20combine,its%20average%20ratings%20for%20both%20humor%20and%20offense.

Hackathon: https://homepages.inf.ed.ac.uk/s1573290/data.html

Data embeddings

gloVe 6B tokens: https://nlp.stanford.edu/projects/glove/

Not Prioritised (Pun data)

Challenge https://alt.qcri.org/semeval2017/task7/
Pun Annotated Amazon (joke not included ...): https://github.com/amazon-science/expunations/tree/main/data