{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Notebook Fine-Tuning Bert\n",
"In diesem Notebook wird Bert bzw. 'BertForSequenceClassification' feingetuned.
\n",
"Funktionen werden aus diesem [Skript](bert_no_ernie.py) geladen."
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"from bert_no_ernie import *\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Rohdaten einlesen\n",
"An dieser Stelle, wird der Hackathon Datensatz eingelesen welcher Annotierte Daten enthält.\n",
"Die wichtigsten Attribute dieses Datensatzes in diesem sind *Text* (welcher den \"Witz\" als String enthält) und *is_humor* (ein durch 0 und 1 dargestellter Wahrheitswert) welcher angibt ob der entsprechende Text in der Zeile ein Witz ist oder nicht."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
| \n", " | id | \n", "text | \n", "is_humor | \n", "humor_rating | \n", "humor_controversy | \n", "offense_rating | \n", "
|---|---|---|---|---|---|---|
| 0 | \n", "1 | \n", "TENNESSEE: We're the best state. Nobody even c... | \n", "1 | \n", "2.42 | \n", "1.0 | \n", "0.2 | \n", "
| 1 | \n", "2 | \n", "A man inserted an advertisement in the classif... | \n", "1 | \n", "2.50 | \n", "1.0 | \n", "1.1 | \n", "
| 2 | \n", "3 | \n", "How many men does it take to open a can of bee... | \n", "1 | \n", "1.95 | \n", "0.0 | \n", "2.4 | \n", "
| 3 | \n", "4 | \n", "Told my mom I hit 1200 Twitter followers. She ... | \n", "1 | \n", "2.11 | \n", "1.0 | \n", "0.0 | \n", "
| 4 | \n", "5 | \n", "Roses are dead. Love is fake. Weddings are bas... | \n", "1 | \n", "2.78 | \n", "0.0 | \n", "0.1 | \n", "