|
|
||
|---|---|---|
| data | ||
| histories | ||
| .gitignore | ||
| BERT.py | ||
| CNN.py | ||
| Datasets.py | ||
| EarlyStopping.py | ||
| LICENSE | ||
| LSTM.py | ||
| README.md | ||
| Transformer.py | ||
| cnn_bootstrap_agg.py | ||
| data_exploration.ipynb | ||
| dataset_helper.py | ||
| ml_helper.py | ||
| ml_history.py | ||
| ml_plots.py | ||
| ml_train.py | ||
| model_comparison.ipynb | ||
| model_evaluation.ipynb | ||
| transformer_bootstrap_agg.py | ||
README.md
ANLP_WS24_CA2
Master MDS Use NLP techniques to analyse texts or to build an application. Document your approach.
TODOS
data
-
maybe buffer zone between good and bad jokes (trade off would be less data)
-
maybe not bineary classification
-
maybe change to humor detection (more data available)
-
dataset shape doesnt work correctly
-
history: integrate validation loss
Data
https://competitions.codalab.org/competitions/27446
Data embeddings
- gloVe 6B tokens: https://nlp.stanford.edu/projects/glove/
Not Prioritised (Pun data)
- Challenge https://alt.qcri.org/semeval2017/task7/
- Pun Annotated Amazon (joke not included ...): https://github.com/amazon-science/expunations/tree/main/data