initial
commit
fabe6c7048
|
@ -0,0 +1,12 @@
|
|||
/models
|
||||
models/
|
||||
.rasa/
|
||||
__pycache__/
|
||||
volumes/
|
||||
.keras/
|
||||
.config/
|
||||
*.db
|
||||
backend/faiss_db.json
|
||||
backend/faiss_db
|
||||
**/*.db
|
||||
elasticsearch-7.9.2.tar
|
|
@ -0,0 +1,6 @@
|
|||
{
|
||||
"[python]": {
|
||||
"editor.defaultFormatter": "ms-python.python"
|
||||
},
|
||||
"python.formatting.provider": "none"
|
||||
}
|
|
@ -0,0 +1,190 @@
|
|||
# Project Setup Guide
|
||||
|
||||
This guide outlines the steps to set up and run the project. Please follow the instructions in the order they are provided.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Ensure you have Conda installed on your system.
|
||||
- Docker should be installed and running.
|
||||
- NVIDIA Drivers installed
|
||||
|
||||
## Requirements
|
||||
|
||||
- If you run all compose services including GROBID and LLaMA7B, you need up to 48 GB VRAM for generating LLaMA2 Embeddings. If you use LLaMA only for text generation and not for generating Embeddings, you can first run the data_service scripts and then shut down GROBID, and load the LLaMA Model. In that case ~28 GB VRAM for loading LLaMA7B is enough.
|
||||
|
||||
## Installation Steps
|
||||
|
||||
## 1. Create and Activate Conda Environment
|
||||
|
||||
Create a new Conda environment using the provided `environment.yml` file:
|
||||
|
||||
```bash
|
||||
conda env create -f environment.yml
|
||||
```
|
||||
|
||||
Activate the newly created environment:
|
||||
|
||||
```bash
|
||||
conda activate chatbot_env
|
||||
```
|
||||
|
||||
## 2. Train Rasa Model
|
||||
|
||||
Navigate to the chatbot directory and train the Rasa model:
|
||||
|
||||
```bash
|
||||
cd chatbot
|
||||
rasa train
|
||||
cd ..
|
||||
```
|
||||
|
||||
## 3. Start Docker Services
|
||||
|
||||
Start all required services using Docker Compose:
|
||||
|
||||
```bash
|
||||
docker-compose up
|
||||
```
|
||||
|
||||
### Note on Using OpenAI Models:
|
||||
|
||||
If you want to use OpenAI models for embedding generation or text generation, you need to provide an API key in the following files:
|
||||
|
||||
- `backend/app.py`
|
||||
- `data_service/data_indexer.py`
|
||||
- `model_service/openai_models.py`
|
||||
|
||||
To provide the API key, insert your OpenAI API key in the appropriate places in these files.
|
||||
|
||||
If you do not plan to use OpenAI models, you must adjust the configuration template (`config1` template) to avoid using GPT. Also the indexing script to not use ada emb. model. This involves modifying the settings to use alternative models or disabling certain features that rely on OpenAI models.
|
||||
|
||||
### Note on Using LlaMA Models:
|
||||
|
||||
If you want to use HF Transformer Models like LLaMA2 then you have to download and save it in `model_service/models`
|
||||
|
||||
## 4. Data Indexing
|
||||
|
||||
### 4.1 Create and Activate Conda Environment
|
||||
|
||||
Create a new Conda environment using the provided `data_service.yml` file:
|
||||
|
||||
```bash
|
||||
conda env create -f data_service.yml
|
||||
```
|
||||
|
||||
Activate the newly created environment:
|
||||
|
||||
```bash
|
||||
conda activate data_service
|
||||
```
|
||||
|
||||
### 4.2
|
||||
|
||||
After all services are up and running, navigate to the `/data_service` directory:
|
||||
|
||||
```bash
|
||||
cd data_service
|
||||
```
|
||||
|
||||
Run the `data_indexing.py` script to index your data:
|
||||
|
||||
```bash
|
||||
python data_indexing.py
|
||||
```
|
||||
|
||||
If the `/data_service/data` directory is empty, you need to manually download the necessary documents and place them in the appropriate directories as outlined below:
|
||||
|
||||
```
|
||||
/data_service/data
|
||||
│ ├── modulhandbuch-ib.pdf
|
||||
│ └── Juni_2023_SPO_Bachelor.pdf
|
||||
├── papers
|
||||
│ ├── Wolf
|
||||
│ │ ├── paper_title.pdf
|
||||
│ │ └── ...
|
||||
│ ├── Hummel
|
||||
│ │ ├── paper_title.pdf
|
||||
│ │ └── ...
|
||||
│ └── ...
|
||||
└── other_documents
|
||||
└── ...
|
||||
```
|
||||
|
||||
### Notes on Data Structure
|
||||
|
||||
- For papers, the structure should be `/data_service/data/paper/{AUTHOR}/{PAPER_NAME}.pdf`.
|
||||
- Make sure to follow the same structure for other documents like module handbooks and study regulations.
|
||||
|
||||
### Author Mapping in `expert_search.py` and `reader.py`
|
||||
|
||||
In the `expert_search.py` and `reader.py` scripts, there is an author mapping that associates short names with full names. This is crucial for correctly identifying authors in the data processing steps.
|
||||
|
||||
#### Current Author Mapping
|
||||
|
||||
The current mapping is as follows:
|
||||
|
||||
```python
|
||||
AUTHOR_MAPPING = {
|
||||
"Wolf": "Prof. Dr. Ivo Wolf",
|
||||
"Hummel": "Prof. Dr. Oliver Hummel",
|
||||
"Fimmel": "Prof. Dr. Elena Fimmel",
|
||||
"Eckert": "Prof. Dr. rer. nat. Kai Eckert",
|
||||
"Fischer": "Prof. Dr. Jörn Fischer",
|
||||
"Gröschel": "Prof. Dr. Michael Gröschel",
|
||||
"Gumbel": "Prof. Dr. Markus Gumbel",
|
||||
"Nagel": "Prof. Dr. Till Nagel",
|
||||
"Specht": "Prof. Dr. Thomas Specht",
|
||||
"Steinberger": "Prof. Dr. Jessica Steinberger",
|
||||
"Dietrich": "Prof. Dr. Gabriele Roth-Dietrich",
|
||||
"Dopatka": "Prof. Dr. rer. nat. Frank Dopatka",
|
||||
"Kraus": "Prof. Dr. Stefan Kraus",
|
||||
"Leuchter": "Prof. Dr.-Ing. Sandro Leuchter",
|
||||
"Paulus": "Prof. Dr. Sachar Paulus",
|
||||
}
|
||||
```
|
||||
|
||||
#### Updating the Author Mapping
|
||||
|
||||
- If new authors are added to the `data/paper` directory, you will need to update the `AUTHOR_MAPPING` in both `expert_search.py` and `reader.py` to reflect these changes.
|
||||
- Ensure that the short name used in the directory structure matches the key used in the `AUTHOR_MAPPING`.
|
||||
|
||||
**Note:** Keeping the author mapping updated is essential for the accuracy of the expert search and data processing functionalities.
|
||||
|
||||
### Running the Web Crawler
|
||||
|
||||
The project includes web crawlers for collecting data from specific sources. Follow these steps to run the crawlers:
|
||||
|
||||
#### 1. Crawling Available URLs
|
||||
|
||||
To crawl all available URLs for crawling from the HS Mannheim domain, use the following command:
|
||||
|
||||
```bash
|
||||
scrapy runspider hsma_url_crawler.py
|
||||
```
|
||||
|
||||
This command runs the `hsma_url_crawler.py` script, which gathers URLs from the specified domain.
|
||||
|
||||
#### 2. Crawling Content from URLs
|
||||
|
||||
After gathering URLs, you can crawl the content from these URLs:
|
||||
|
||||
- First, make sure you have executed `hsma_content_crawler.py` as described above.
|
||||
- Then, run the content crawler with the following command:
|
||||
|
||||
```bash
|
||||
scrapy runspider hsma_content_crawler.py
|
||||
```
|
||||
|
||||
- This command runs the `hsma_content_crawler.py` script, which collects content from the list of URLs obtained in the previous step.
|
||||
|
||||
#### 3. Post-Crawling Steps
|
||||
|
||||
After crawling, move the generated `url_texts.json` file into the `/data` directory and rename it to `crawled_hsma_web.json`.
|
||||
|
||||
```bash
|
||||
mv url_texts.json /path/to/data_service/data/crawled_hsma_web.json
|
||||
```
|
||||
|
||||
Replace `/path/to/data_service/data` with the actual path to your `data_service/data` directory.
|
||||
|
||||
---
|
|
@ -0,0 +1,2 @@
|
|||
models/
|
||||
models
|
|
@ -0,0 +1 @@
|
|||
SYS_PATH=/home/alibabaoglu/BA_QA_HSMA/data_service/
|
|
@ -0,0 +1,17 @@
|
|||
# syntax=docker/dockerfile:1
|
||||
FROM python:3.10.12-slim
|
||||
WORKDIR /home/user/
|
||||
ENV FLASK_APP=app.py
|
||||
COPY requirements.txt requirements.txt
|
||||
RUN apt-get update && apt-get install -y\
|
||||
git\
|
||||
wget\
|
||||
curl \
|
||||
pkg-config \
|
||||
cmake \
|
||||
g++ \
|
||||
poppler-utils && \
|
||||
rm -rf /var/lib/apt/lists/*
|
||||
RUN pip install --upgrade pip
|
||||
RUN pip install -r requirements.txt
|
||||
COPY . .
|
|
@ -0,0 +1,57 @@
|
|||
import requests
|
||||
import json
|
||||
import os
|
||||
host= os.environ.get("MODEL_SERVICE_HOST","127.0.0.1" )
|
||||
BASE_URL= f"http://{host}:5000/"
|
||||
ANSWER_URL= f"{BASE_URL}generate_answer"
|
||||
EMBEDDINGS_URL= f"{BASE_URL}generate_embeddings"
|
||||
RERANKING_URL= f"{BASE_URL}rerank_documents"
|
||||
class EmbeddingServiceCaller:
|
||||
def __init__(self) -> None:
|
||||
pass
|
||||
def get_embeddings(self, text,embedding_type="input_embeddings", operation="mean", embedding_model="llama", layer=-1 ):
|
||||
headers = {
|
||||
'Content-Type': 'application/json'
|
||||
}
|
||||
payload=json.dumps({
|
||||
"query":text,
|
||||
"embedding_type":embedding_type,
|
||||
"operation":operation,
|
||||
"embedding_model":embedding_model,
|
||||
"layer":layer
|
||||
|
||||
})
|
||||
response= requests.request("POST", f"{EMBEDDINGS_URL}", headers=headers, data=payload)
|
||||
return response.json()
|
||||
def get_answer(self,payload="", prompt="", llama=False):
|
||||
headers = {
|
||||
'Content-Type': 'application/json'
|
||||
}
|
||||
if payload:
|
||||
print("PAYLOAD: ",payload, flush=True)
|
||||
response = requests.request("POST", f"{ANSWER_URL}", headers=headers, data=payload)
|
||||
print(response)
|
||||
else:
|
||||
payload = json.dumps({
|
||||
"prompt": prompt
|
||||
})
|
||||
response = requests.request("POST", f"{ANSWER_URL}", headers=headers, data=payload)
|
||||
print("PROMPT: ",prompt)
|
||||
print(response.text)
|
||||
if llama:
|
||||
return response.text
|
||||
return response.json()
|
||||
def rerank_documents_gpt(self, payload):
|
||||
headers = {
|
||||
'Content-Type': 'application/json'
|
||||
}
|
||||
response= requests.request("POST", f"{RERANKING_URL}", headers=headers, data=payload)
|
||||
return response.json()
|
||||
|
||||
def _call(self, url, method):
|
||||
response = requests.request(method, url)
|
||||
return response.json()
|
||||
|
||||
if __name__ == "__main__":
|
||||
caller= EmbeddingServiceCaller()
|
||||
print(caller.get_embeddings("Hallsdfasdf Hallsdfasdf Hallsdfasdf"))
|
|
@ -0,0 +1,222 @@
|
|||
import time
|
||||
|
||||
time.sleep(30) # sleep for 30 seconds cause elastic search and weaviate need time to ramp up
|
||||
from flask import Flask, request, abort, render_template, jsonify
|
||||
from api.embeddingsServiceCaller import EmbeddingServiceCaller
|
||||
from retriever.retriever_pipeline import CustomPipeline
|
||||
from question_answering import QuestionAnswering
|
||||
from elasticsearch import Elasticsearch
|
||||
import os
|
||||
|
||||
es_host = os.environ.get("ELASTIC_HOST", "localhost")
|
||||
es = Elasticsearch([{"host": es_host, "port": 9200}])
|
||||
|
||||
server = Flask(__name__, static_folder="static")
|
||||
caller = EmbeddingServiceCaller()
|
||||
pipeline = CustomPipeline(api_key="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")
|
||||
question_answering = QuestionAnswering(pipeline=pipeline, embedder=caller)
|
||||
|
||||
|
||||
@server.route("/feedback", methods=["POST", "GET"])
|
||||
def feedback():
|
||||
if request.method == "POST":
|
||||
request_data = request.get_json()
|
||||
type = request_data.get("type")
|
||||
user_queston = request_data.get("user_queston")
|
||||
provided_answer = request_data.get("provided_answer")
|
||||
retrieval_method_or_model = request_data.get("retrieval_method_or_model")
|
||||
reader_model = request_data.get("reader_model")
|
||||
feedback = request_data.get("feedback")
|
||||
last_searched_index = request_data.get("last_searched_index")
|
||||
document = {
|
||||
"type": type,
|
||||
"user_queston": user_queston,
|
||||
"provided_answer": provided_answer,
|
||||
"retrieval_method_or_model": retrieval_method_or_model,
|
||||
"reader_model": reader_model,
|
||||
"feedback": feedback,
|
||||
"last_searched_index": last_searched_index,
|
||||
}
|
||||
response = es.index(index="feedback", body=document)
|
||||
print(response, "response")
|
||||
return jsonify(response)
|
||||
elif request.method == "GET":
|
||||
res = es.search(index="feedback", body={"query": {"match_all": {}}})
|
||||
feedbacks = []
|
||||
for hit in res["hits"]["hits"]:
|
||||
feedbacks.append(hit["_source"])
|
||||
return jsonify(feedbacks)
|
||||
|
||||
|
||||
@server.route("/get_module_credits", methods=["POST"])
|
||||
def get_credits():
|
||||
request_data = request.get_json()
|
||||
module = request_data["module"]
|
||||
print(module, flush=True)
|
||||
result = question_answering.get_module_credits(module=module, index="ib")
|
||||
print(result, flush=True)
|
||||
return result
|
||||
|
||||
|
||||
@server.route("/get_relevant_documents", methods=["POST"])
|
||||
def get_relevant_documents():
|
||||
request_data = request.get_json()
|
||||
query = request_data["query"]
|
||||
index = request_data["index"]
|
||||
retrieval_method_or_model = request_data.get("retrieval_method_or_model", "mpnet")
|
||||
meta = request_data.get("meta", {})
|
||||
result = question_answering.get_top_k(
|
||||
query=query,
|
||||
index=index,
|
||||
retrieval_method_or_model=retrieval_method_or_model,
|
||||
meta=meta,
|
||||
)
|
||||
print(result, flush=True)
|
||||
return result
|
||||
|
||||
|
||||
@server.route("/get_answer", methods=["POST"])
|
||||
def get_answer():
|
||||
request_data = request.get_json()
|
||||
query = request_data["query"]
|
||||
index = request_data["index"]
|
||||
retrieval_method_or_model = request_data.get("retrieval_method_or_model", "mpnet")
|
||||
reader_model = request_data.get("reader_model", "GPT")
|
||||
rerank_documents = request_data.get("rerank_documents", True)
|
||||
result = question_answering.get_answers(
|
||||
query=query,
|
||||
index=index,
|
||||
retrieval_method_or_model=retrieval_method_or_model,
|
||||
reader_model=reader_model,
|
||||
rerank_documents=rerank_documents,
|
||||
)
|
||||
if isinstance(result, tuple) and len(result or []) > 0:
|
||||
return {"answer": result[0], "documents": result[1]}
|
||||
else:
|
||||
return result
|
||||
|
||||
|
||||
@server.route("/search_experts", methods=["POST"])
|
||||
def search_experts():
|
||||
request_data = request.get_json()
|
||||
if not request_data:
|
||||
abort(400, description="Bad Request: Expecting JSON data")
|
||||
query = request_data.get("query")
|
||||
if not query:
|
||||
abort(400, description="Missing parameter 'query'")
|
||||
retriever_model = request_data.get("retriever_model", "mpnet")
|
||||
reader_model = request_data.get("reader_model", "GPT")
|
||||
search_method = request_data.get("search_method", "classic_retriever_reader")
|
||||
generate_answer = request_data.get("generate_answer", False)
|
||||
rerank_retrieved_results = request_data.get("rerank_retrieved_results", True)
|
||||
if not query:
|
||||
return {"status": "failed", "message": "Missing parameter 'query'"}
|
||||
result = question_answering.search_experts(
|
||||
query=query,
|
||||
search_method=search_method,
|
||||
retriever_model=retriever_model,
|
||||
generate_answer=generate_answer,
|
||||
rerank=rerank_retrieved_results,
|
||||
)
|
||||
if isinstance(result, tuple) and len(result or []) > 0:
|
||||
return {"answer": result[0], "documents": result[1]}
|
||||
else:
|
||||
return result
|
||||
|
||||
|
||||
@server.route("/recommend_wpms", methods=["POST"])
|
||||
def recommend_wpms():
|
||||
request_data = request.get_json()
|
||||
if not request_data:
|
||||
abort(400, description="Bad Request: Expecting JSON data")
|
||||
interests = request_data.get("interests")
|
||||
previous_courses = request_data.get("previous_courses")
|
||||
future_carrer = request_data.get("future_carrer")
|
||||
if not (interests and previous_courses and future_carrer):
|
||||
abort(
|
||||
400,
|
||||
description="Provide at least one of the parameters: 'interests', 'previous_courses' or 'future_carrer' ",
|
||||
)
|
||||
retrieval_method_or_model = request_data.get("retrieval_method_or_model", "mpnet")
|
||||
recommendation_method = request_data.get(
|
||||
"recommendation_method", "get_retrieved_results"
|
||||
)
|
||||
rerank_retrieved_results = request_data.get("rerank_retrieved_results", True)
|
||||
result = question_answering.recommend_wpm(
|
||||
interets=interests,
|
||||
previous_courses=previous_courses,
|
||||
future_carrer=future_carrer,
|
||||
recommendation_method=recommendation_method,
|
||||
rerank_retrieved_results=rerank_retrieved_results,
|
||||
retrieval_method_or_model=retrieval_method_or_model,
|
||||
)
|
||||
if isinstance(result, tuple) and len(result or []) > 0:
|
||||
return {"answer": result[0], "documents": result[1]}
|
||||
else:
|
||||
return result
|
||||
|
||||
|
||||
@server.route("/get_all_weaviate_data", methods=["GET"])
|
||||
def get_weaviate_data():
|
||||
index = request.args.get("index")
|
||||
return pipeline.get_all_weaviate_data(index=index)
|
||||
|
||||
|
||||
@server.route("/get_all_es_data", methods=["GET"])
|
||||
def get_elastic_data():
|
||||
index = request.args.get("index")
|
||||
return pipeline.get_all_elastic_data(index=index)
|
||||
|
||||
|
||||
@server.route("/get_document_by_id", methods=["POST"])
|
||||
def get_doc_by_id():
|
||||
request.data = request.get_json()
|
||||
id = request.data["id"]
|
||||
return pipeline.query_by_ids([id])
|
||||
|
||||
|
||||
@server.route("/conf1")
|
||||
def config1():
|
||||
return render_template("config1.html")
|
||||
|
||||
|
||||
# @server.route("/conf2")
|
||||
# def config2():
|
||||
# return render_template("config2.html")
|
||||
|
||||
|
||||
# @server.route("/conf3")
|
||||
# def config3():
|
||||
# return render_template("config3.html")
|
||||
|
||||
|
||||
# @server.route("/conf4")
|
||||
# def config4():
|
||||
# return render_template("config4.html")
|
||||
|
||||
|
||||
# @server.route("/conf5")
|
||||
# def config5():
|
||||
# return render_template("config5.html")
|
||||
|
||||
|
||||
# @server.route("/conf6")
|
||||
# def config6():
|
||||
# return render_template("config6.html")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
if not es.indices.exists(index="feedback"):
|
||||
mapping = {
|
||||
"mappings": {
|
||||
"properties": {
|
||||
"question": {"type": "text"},
|
||||
"answer": {"type": "text"},
|
||||
"feedback": {"type": "text"},
|
||||
"timestamp": {"type": "date"},
|
||||
}
|
||||
}
|
||||
}
|
||||
es.indices.create(index="feedback", body=mapping)
|
||||
|
||||
server.run(host="::", port=8080)
|
|
@ -0,0 +1,31 @@
|
|||
from haystack.document_stores.elasticsearch import ElasticsearchDocumentStore
|
||||
import os
|
||||
import sys
|
||||
sys.path.append('/root/home/BA_QA_HSMA/backendd')
|
||||
|
||||
class ElasticSearchData:
|
||||
def __init__(self) -> None:
|
||||
self.doc_store = ElasticsearchDocumentStore(port=9210)
|
||||
pass
|
||||
|
||||
def write_data(self, data, index):
|
||||
return self.doc_store.write_documents(documents=data, index=index)
|
||||
|
||||
def get_data(self, index):
|
||||
return self.doc_store.get_all_documents(index=index)
|
||||
|
||||
def get_all_doc_count(self,index):
|
||||
return self.doc_store.get_document_count(index=index)
|
||||
|
||||
def delete_all_docs(self, index):
|
||||
return self.doc_store.delete_all_documents(index=index)
|
||||
|
||||
def remove_docs(self):
|
||||
pass
|
||||
def update_embeddings(self, retriever):
|
||||
return self.doc_store.update_embeddings(retriever=retriever, index="ib")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
es = ElasticSearchData()
|
||||
print(es.delete_all_docs(index="ib"))
|
|
@ -0,0 +1,33 @@
|
|||
from haystack.document_stores.faiss import FAISSDocumentStore
|
||||
import os
|
||||
import sys
|
||||
sys.path.append('/root/home/BA_QA_HSMA/backendd')
|
||||
|
||||
class FAISSHandler:
|
||||
def __init__(self) -> None:
|
||||
self.doc_store= FAISSDocumentStore(faiss_index_factory_str="Flat",embedding_dim=5120, index="ib" )
|
||||
pass
|
||||
|
||||
def write_data(self, data, index):
|
||||
return self.doc_store.write_documents(documents=data, index=index)
|
||||
|
||||
def get_data(self, index):
|
||||
return self.doc_store.get_all_documents(index=index)
|
||||
|
||||
def query_emb(self, emb, index):
|
||||
return self.doc_store.query_by_embedding(query_emb=emb, index=index)
|
||||
|
||||
def get_all_doc_count(self,index):
|
||||
return self.doc_store.get_document_count(index=index)
|
||||
|
||||
def delete_all_docs(self, index):
|
||||
return self.doc_store.delete_all_documents(index=index)
|
||||
|
||||
def remove_docs(self):
|
||||
pass
|
||||
def update_embeddings(self, retriever):
|
||||
return self.doc_store.update_embeddings(retriever=retriever, index="ib")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
pass
|
|
@ -0,0 +1,91 @@
|
|||
import time
|
||||
import json
|
||||
from pymilvus import Collection, utility, connections, CollectionSchema, FieldSchema, DataType
|
||||
import os
|
||||
import sys
|
||||
sys.path.append('/root/home/BA_QA_HSMA/backendd')
|
||||
|
||||
|
||||
class MilviusHandler:
|
||||
def __init__(self) -> None:
|
||||
# TODO: get alias from param
|
||||
connections.connect(
|
||||
alias="default",
|
||||
host='localhost',
|
||||
port='19530'
|
||||
)
|
||||
|
||||
# TODO: get collection_name from param
|
||||
if not utility.has_collection("ib"):
|
||||
id = FieldSchema(
|
||||
name="id",
|
||||
dtype=DataType.INT64,
|
||||
is_primary=True,
|
||||
auto_id=True
|
||||
)
|
||||
|
||||
es_id = FieldSchema(
|
||||
name="es_id",
|
||||
dtype=DataType.VARCHAR,
|
||||
max_length=200)
|
||||
|
||||
content = FieldSchema(
|
||||
name="module_content",
|
||||
dtype=DataType.FLOAT_VECTOR,
|
||||
dim=4096
|
||||
)
|
||||
schema = CollectionSchema(
|
||||
fields=[content, id, es_id],
|
||||
description="module search"
|
||||
)
|
||||
# TODO: get collection_name from param
|
||||
self.collection = Collection(
|
||||
name="ib",
|
||||
schema=schema,
|
||||
using='default'
|
||||
)
|
||||
else:
|
||||
# Get an existing collection.
|
||||
self.collection = Collection("ib")
|
||||
self.collection.load()
|
||||
|
||||
def write_data(self, data):
|
||||
self.collection.insert(data)
|
||||
# utility.do_bulk_insert(
|
||||
# collection_name="ib",
|
||||
# files=[data])
|
||||
index_params = {
|
||||
"metric_type": "L2",
|
||||
"index_type": "IVF_FLAT",
|
||||
"params": {"nlist": 1024}
|
||||
}
|
||||
self.collection.create_index(
|
||||
field_name="module_content",
|
||||
index_params=index_params
|
||||
)
|
||||
return
|
||||
|
||||
def search(self, query_emb):
|
||||
self.collection.load()
|
||||
search_params = {
|
||||
"metric_type": "L2",
|
||||
"params": {"nprobe": 100},
|
||||
}
|
||||
result = self.collection.search(query_emb, anns_field="module_content", param=search_params, limit=3, output_fields=["es_id"])
|
||||
return result
|
||||
def query(self):
|
||||
self.collection.load()
|
||||
res = self.collection.query(
|
||||
expr = "id > 0",
|
||||
offset = 0,
|
||||
limit = 10,
|
||||
output_fields = ["id", "es_id", "module_content"],
|
||||
consistency_level="Strong"
|
||||
)
|
||||
return res
|
||||
if __name__ == "__main__":
|
||||
with open('../embedded_docs.json', 'r') as f:
|
||||
data = json.load(f)
|
||||
emb= data[0]["module_content"]
|
||||
es = MilviusHandler()
|
||||
print(es.search([emb]))
|
File diff suppressed because it is too large
Load Diff
Binary file not shown.
|
@ -0,0 +1,18 @@
|
|||
from llama_cpp import Llama
|
||||
|
||||
|
||||
class Embedder:
|
||||
def __init__(self, llama_model_path:str) -> None:
|
||||
self.llama = Llama(
|
||||
model_path=llama_model_path,
|
||||
n_ctx=2048,
|
||||
n_parts=1,
|
||||
f16_kv=3,
|
||||
embedding=True,
|
||||
)
|
||||
|
||||
def embed_text_llama(self, doc: str):
|
||||
embeddings_query = self.llama.embed(doc)
|
||||
return embeddings_query
|
||||
|
||||
|
|
@ -0,0 +1,299 @@
|
|||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"import sys\n",
|
||||
"\n",
|
||||
"# from embeddings.llama import Embedder\n",
|
||||
"sys.path.append('/root/home/BA_QA_HSMA/backendd')\n",
|
||||
"from embeddings.llama import Embedder\n",
|
||||
"from transformers import LlamaForCausalLM, LlamaTokenizer\n",
|
||||
"import torch\n",
|
||||
"from database.es_handler import ElasticSearchData\n",
|
||||
"from tqdm import tqdm\n",
|
||||
"import pickle\n",
|
||||
"from transformer_llama import LlamaTransformerEmbeddings"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. \n",
|
||||
"The tokenizer class you load from this checkpoint is 'LLaMATokenizer'. \n",
|
||||
"The class this function is called from is 'LlamaTokenizer'.\n",
|
||||
"Loading checkpoint shards: 0%| | 0/41 [00:00<?, ?it/s]/home/maydane/miniconda3/envs/backend/lib/python3.10/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()\n",
|
||||
" return self.fget.__get__(instance, owner)()\n",
|
||||
"Loading checkpoint shards: 100%|██████████| 41/41 [05:15<00:00, 7.70s/it]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"model_path = \"../models/tmp/llama-13b-hf\"\n",
|
||||
"embeddings_model = LlamaTransformerEmbeddings(model_path)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 30,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"ename": "AttributeError",
|
||||
"evalue": "'LlamaTokenizer' object has no attribute 'vocab'",
|
||||
"output_type": "error",
|
||||
"traceback": [
|
||||
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
|
||||
"\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)",
|
||||
"Cell \u001b[0;32mIn[30], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[39mlist\u001b[39m(embeddings_model\u001b[39m.\u001b[39;49mtokenizer\u001b[39m.\u001b[39;49mvocab\u001b[39m.\u001b[39mkeys())[\u001b[39m5000\u001b[39m:\u001b[39m5020\u001b[39m]\n",
|
||||
"\u001b[0;31mAttributeError\u001b[0m: 'LlamaTokenizer' object has no attribute 'vocab'"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"list(embeddings_model.tokenizer.vocab.keys())[5000:5020]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 32,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tokenizer= embeddings_model.tokenizer\n",
|
||||
"text= \"After stealing money from the bank vault, the bank robber was seen \" \\\n",
|
||||
" \"fishing on the Mississippi river bank.\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 33,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Split the sentence into tokens.\n",
|
||||
"tokenized_text = tokenizer.tokenize(text)\n",
|
||||
"\n",
|
||||
"# Map the token strings to their vocabulary indeces.\n",
|
||||
"indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 35,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]"
|
||||
]
|
||||
},
|
||||
"execution_count": 35,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"segments_ids = [1] * len(tokenized_text)\n",
|
||||
"segments_ids"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 36,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tokens_tensor = torch.tensor([indexed_tokens])\n",
|
||||
"segments_tensors = torch.tensor([segments_ids])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 25,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"0 ▁After\n",
|
||||
"1 ▁ste\n",
|
||||
"2 aling\n",
|
||||
"3 ▁money\n",
|
||||
"4 ▁from\n",
|
||||
"5 ▁the\n",
|
||||
"6 ▁bank\n",
|
||||
"7 ▁v\n",
|
||||
"8 ault\n",
|
||||
"9 ,\n",
|
||||
"10 ▁the\n",
|
||||
"11 ▁bank\n",
|
||||
"12 ▁rob\n",
|
||||
"13 ber\n",
|
||||
"14 ▁was\n",
|
||||
"15 ▁seen\n",
|
||||
"16 ▁fish\n",
|
||||
"17 ing\n",
|
||||
"18 ▁on\n",
|
||||
"19 ▁the\n",
|
||||
"20 ▁Mississippi\n",
|
||||
"21 ▁river\n",
|
||||
"22 ▁bank\n",
|
||||
"23 .\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"tokenized_text = embeddings_model.tokenizer.tokenize(\"After stealing money from the bank vault, the bank robber was seen \" \\\n",
|
||||
" \"fishing on the Mississippi river bank.\")\n",
|
||||
"for i, token_str in enumerate(tokenized_text):\n",
|
||||
" print (i, token_str)\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 24,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from scipy.spatial.distance import cosine\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 26,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"ename": "NameError",
|
||||
"evalue": "name 'token_embeddings' is not defined",
|
||||
"output_type": "error",
|
||||
"traceback": [
|
||||
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
|
||||
"\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
|
||||
"Cell \u001b[0;32mIn[26], line 7\u001b[0m\n\u001b[1;32m 2\u001b[0m token_vecs_cat \u001b[39m=\u001b[39m []\n\u001b[1;32m 4\u001b[0m \u001b[39m# `token_embeddings` is a [22 x 12 x 768] tensor.\u001b[39;00m\n\u001b[1;32m 5\u001b[0m \n\u001b[1;32m 6\u001b[0m \u001b[39m# For each token in the sentence...\u001b[39;00m\n\u001b[0;32m----> 7\u001b[0m \u001b[39mfor\u001b[39;00m token \u001b[39min\u001b[39;00m token_embeddings:\n\u001b[1;32m 8\u001b[0m \n\u001b[1;32m 9\u001b[0m \u001b[39m# `token` is a [12 x 768] tensor\u001b[39;00m\n\u001b[1;32m 10\u001b[0m \n\u001b[1;32m 11\u001b[0m \u001b[39m# Concatenate the vectors (that is, append them together) from the last \u001b[39;00m\n\u001b[1;32m 12\u001b[0m \u001b[39m# four layers.\u001b[39;00m\n\u001b[1;32m 13\u001b[0m \u001b[39m# Each layer vector is 768 values, so `cat_vec` is length 3,072.\u001b[39;00m\n\u001b[1;32m 14\u001b[0m cat_vec \u001b[39m=\u001b[39m torch\u001b[39m.\u001b[39mcat((token[\u001b[39m-\u001b[39m\u001b[39m1\u001b[39m], token[\u001b[39m-\u001b[39m\u001b[39m2\u001b[39m], token[\u001b[39m-\u001b[39m\u001b[39m3\u001b[39m], token[\u001b[39m-\u001b[39m\u001b[39m4\u001b[39m]), dim\u001b[39m=\u001b[39m\u001b[39m0\u001b[39m)\n\u001b[1;32m 16\u001b[0m \u001b[39m# Use `cat_vec` to represent `token`.\u001b[39;00m\n",
|
||||
"\u001b[0;31mNameError\u001b[0m: name 'token_embeddings' is not defined"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Stores the token vectors, with shape [22 x 3,072]\n",
|
||||
"token_vecs_cat = []\n",
|
||||
"\n",
|
||||
"# `token_embeddings` is a [22 x 12 x 768] tensor.\n",
|
||||
"\n",
|
||||
"# For each token in the sentence...\n",
|
||||
"for token in token_embeddings:\n",
|
||||
" \n",
|
||||
" # `token` is a [12 x 768] tensor\n",
|
||||
"\n",
|
||||
" # Concatenate the vectors (that is, append them together) from the last \n",
|
||||
" # four layers.\n",
|
||||
" # Each layer vector is 768 values, so `cat_vec` is length 3,072.\n",
|
||||
" cat_vec = torch.cat((token[-1], token[-2], token[-3], token[-4]), dim=0)\n",
|
||||
" \n",
|
||||
" # Use `cat_vec` to represent `token`.\n",
|
||||
" token_vecs_cat.append(cat_vec)\n",
|
||||
"\n",
|
||||
"print ('Shape is: %d x %d' % (len(token_vecs_cat), len(token_vecs_cat[0])))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Stores the token vectors, with shape [22 x 768]\n",
|
||||
"token_vecs_sum = []\n",
|
||||
"\n",
|
||||
"# `token_embeddings` is a [22 x 12 x 768] tensor.\n",
|
||||
"\n",
|
||||
"# For each token in the sentence...\n",
|
||||
"for token in token_embeddings:\n",
|
||||
"\n",
|
||||
" # `token` is a [12 x 768] tensor\n",
|
||||
"\n",
|
||||
" # Sum the vectors from the last four layers.\n",
|
||||
" sum_vec = torch.sum(token[-4:], dim=0)\n",
|
||||
" \n",
|
||||
" # Use `sum_vec` to represent `token`.\n",
|
||||
" token_vecs_sum.append(sum_vec)\n",
|
||||
"\n",
|
||||
"print ('Shape is: %d x %d' % (len(token_vecs_sum), len(token_vecs_sum[0])))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Calculate the cosine similarity between the word bank \n",
|
||||
"# in \"bank robber\" vs \"river bank\" (different meanings).\n",
|
||||
"diff_bank = 1 - cosine(token_vecs_sum[10], token_vecs_sum[19])\n",
|
||||
"\n",
|
||||
"# Calculate the cosine similarity between the word bank\n",
|
||||
"# in \"bank robber\" vs \"bank vault\" (same meaning).\n",
|
||||
"same_bank = 1 - cosine(token_vecs_sum[10], token_vecs_sum[6])\n",
|
||||
"\n",
|
||||
"print('Vector similarity for *similar* meanings: %.2f' % same_bank)\n",
|
||||
"print('Vector similarity for *different* meanings: %.2f' % diff_bank)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.10.11 ('backend')",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.11"
|
||||
},
|
||||
"orig_nbformat": 4,
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "ec98c019f1befdeef47e250107e8ecbbb590b18e092be4f687ed7315b206d36b"
|
||||
}
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
|
@ -0,0 +1,102 @@
|
|||
import os
|
||||
import sys
|
||||
from dotenv import load_dotenv
|
||||
load_dotenv()
|
||||
sys_path = os.environ.get('SYS_PATH')
|
||||
|
||||
# from embeddings.llama import Embedder
|
||||
sys.path.append(sys_path)
|
||||
from embeddings.llama import Embedder
|
||||
from transformers import LlamaForCausalLM, LlamaTokenizer
|
||||
import torch
|
||||
from database.es_handler import ElasticSearchData
|
||||
from tqdm import tqdm
|
||||
import pickle
|
||||
|
||||
class LlamaTransformerEmbeddings:
|
||||
def __init__(self, model_path, ggml=False, output_hidden_states= True):
|
||||
if ggml:
|
||||
self.model_ggml= Embedder(model_path)
|
||||
else:
|
||||
self.tokenizer = LlamaTokenizer.from_pretrained(model_path)
|
||||
self.model = LlamaForCausalLM.from_pretrained(model_path, output_hidden_states=output_hidden_states)
|
||||
def get_embeddings_hf(self, text):
|
||||
inputs = self.tokenizer(text, return_tensors="pt")
|
||||
with torch.no_grad():
|
||||
outputs = self.model(**inputs)
|
||||
# embeddings = outputs.hidden_states[-1]
|
||||
# avg_embeddings = torch.mean(embeddings, dim=1)
|
||||
# return avg_embeddings[0]
|
||||
# embeddings, _ = torch.max(outputs.hidden_states[-1], dim=1)
|
||||
hidden_states= outputs[2]
|
||||
token_vecs = hidden_states[-2][0]
|
||||
sentence_embedding = torch.mean(token_vecs, dim=0)
|
||||
return sentence_embedding
|
||||
|
||||
def get_input_embeddings(self, text):
|
||||
inputs = self.tokenizer(text, return_tensors="pt")
|
||||
input_ids = inputs['input_ids']
|
||||
with torch.no_grad():
|
||||
input_embeddings = self.model.get_input_embeddings()
|
||||
embeddings = input_embeddings(input_ids)
|
||||
mean = torch.mean(embeddings[0], 0).cpu().numpy()
|
||||
return mean
|
||||
|
||||
def get_embeddings_ggml(self,text):
|
||||
return self.model_ggml.embed_text_llama(doc=text)
|
||||
|
||||
def generate_answer(self, prompt):
|
||||
inputs = self.tokenizer(prompt, return_tensors="pt")
|
||||
generate_ids = self.model.generate(inputs.input_ids, max_length=700)
|
||||
return self.tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
|
||||
|
||||
# print("Program has ended.")
|
||||
|
||||
if __name__ == "__main__":
|
||||
# tokenizer = LlamaTokenizer.from_pretrained("../models/tmp/llama-13b-hf")
|
||||
# model = LlamaForCausalLM.from_pretrained("../models/tmp/llama-13b-hf" ,output_hidden_states=True)
|
||||
|
||||
|
||||
# text = "Ihr Text für die semantische Suche"
|
||||
# inputs = tokenizer(text, return_tensors="pt")
|
||||
|
||||
# with torch.no_grad():
|
||||
# outputs = model(**inputs)
|
||||
|
||||
# embeddings = outputs.hidden_states
|
||||
# with open('hidden_states.pkl', 'wb') as f:
|
||||
# pickle.dump(embeddings, f)
|
||||
# final_embeddings = embeddings[-1]
|
||||
with open('hidden_states.pkl', 'rb') as f:
|
||||
hidden_states = pickle.load(f)
|
||||
print(len(hidden_states))
|
||||
########### WORD EMBEDDINGS##############################
|
||||
# Concatenate the tensors for all layers. We use `stack` here to
|
||||
# create a new dimension in the tensor.
|
||||
# token_embeddings = torch.stack(hidden_states, dim=0)
|
||||
# print(token_embeddings.size())
|
||||
# # Remove dimension 1, the "batches".
|
||||
# token_embeddings = torch.squeeze(token_embeddings, dim=1)
|
||||
# print(token_embeddings.size())
|
||||
# # Swap dimensions 0 and 1.
|
||||
# token_embeddings = token_embeddings.permute(1,0,2)
|
||||
# print(token_embeddings.size())
|
||||
# # Stores the token vectors, with shape [ num_tokens x 20480]
|
||||
# token_vecs_cat = []
|
||||
# for token in token_embeddings:
|
||||
# # `token` is a [40 x 5120] tensor
|
||||
|
||||
# # Concatenate the vectors (that is, append them together) from the last
|
||||
# # four layers.
|
||||
# # Each layer vector is 5120 values, so `cat_vec` is length 20480.
|
||||
# cat_vec = torch.cat((token[-1], token[-2], token[-3], token[-4]), dim=0)
|
||||
# # Use `cat_vec` to represent `token`.
|
||||
# token_vecs_cat.append(cat_vec)
|
||||
# print ('Shape is: %d x %d' % (len(token_vecs_cat), len(token_vecs_cat[0])))
|
||||
########### WORD EMBEDDINGS##############################
|
||||
|
||||
########### SENTENCE EMBEDDINGS##############################
|
||||
token_vecs = hidden_states[-2][0]
|
||||
sentence_embedding = torch.mean(token_vecs, dim=0)
|
||||
# print ("Our final sentence embedding vector of shape:", sentence_embedding.size())
|
||||
########### SENTENCE EMBEDDINGS##############################
|
|
@ -0,0 +1,47 @@
|
|||
version: "3"
|
||||
services:
|
||||
backend:
|
||||
image: deepset/haystack-annotation:latest
|
||||
environment:
|
||||
NODE_ENV: "production"
|
||||
DB_HOSTNAME: "db"
|
||||
DB_NAME: "databasename"
|
||||
DB_USERNAME: "somesafeuser"
|
||||
DB_PASSWORD: "somesafepassword"
|
||||
# IMPORTANT: please configure credentials with secure strings.
|
||||
# DEFAULT_ADMIN_EMAIL: "example@example.com"
|
||||
# DEFAULT_ADMIN_PASSWORD: "DEMO_PASSWORD"
|
||||
# COOKIE_KEYS: "somesafecookiekeys"
|
||||
# JWT_SECRET: "somesafesecret"
|
||||
# DOMAIN_WHITELIST: "*"
|
||||
ports:
|
||||
- "7001:7001"
|
||||
links:
|
||||
- "db:database"
|
||||
depends_on:
|
||||
- db
|
||||
networks:
|
||||
- app-network
|
||||
restart: unless-stopped
|
||||
|
||||
db:
|
||||
image: "postgres:12"
|
||||
environment:
|
||||
POSTGRES_USER: "somesafeuser"
|
||||
POSTGRES_PASSWORD: "somesafepassword"
|
||||
POSTGRES_DB: "databasename"
|
||||
ports:
|
||||
- "5432:5432"
|
||||
volumes:
|
||||
- ./postgres-data:/var/lib/postgresql/data
|
||||
networks:
|
||||
- app-network
|
||||
healthcheck:
|
||||
test: "pg_isready --username=somesafeuser --dbname=databasename && psql --username=somesafeuser --list"
|
||||
timeout: 3s
|
||||
retries: 5
|
||||
restart: unless-stopped
|
||||
|
||||
networks:
|
||||
app-network:
|
||||
driver: bridge
|
|
@ -0,0 +1,508 @@
|
|||
# pylint: disable=ungrouped-imports
|
||||
from typing import List, Dict, Union, Optional, Any, Literal, Callable
|
||||
|
||||
import logging
|
||||
from pathlib import Path
|
||||
from copy import deepcopy
|
||||
from requests.exceptions import HTTPError
|
||||
|
||||
import numpy as np
|
||||
from tqdm import tqdm
|
||||
|
||||
import pandas as pd
|
||||
from huggingface_hub import hf_hub_download
|
||||
|
||||
from haystack.errors import HaystackError
|
||||
from haystack.schema import Document, FilterType
|
||||
from haystack.document_stores import BaseDocumentStore
|
||||
from haystack.telemetry import send_event
|
||||
from haystack.lazy_imports import LazyImport
|
||||
from haystack.nodes.retriever import DenseRetriever
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
with LazyImport(message="Run 'pip install farm-haystack[inference]'") as torch_and_transformers_import:
|
||||
import torch
|
||||
from haystack.modeling.utils import initialize_device_settings # pylint: disable=ungrouped-imports
|
||||
from transformers import AutoConfig
|
||||
|
||||
import sys
|
||||
sys.path.append("../..")
|
||||
from api.embeddingsServiceCaller import EmbeddingServiceCaller
|
||||
|
||||
_EMBEDDING_ENCODERS: Dict[str, Callable] = {
|
||||
"llama": {}
|
||||
}
|
||||
|
||||
class LlamaRetriever(DenseRetriever):
|
||||
def __init__(
|
||||
self,
|
||||
model_format = "llama",
|
||||
document_store: Optional[BaseDocumentStore] = None,
|
||||
model_version: Optional[str] = None,
|
||||
use_gpu: bool = True,
|
||||
batch_size: int = 32,
|
||||
max_seq_len: int = 512,
|
||||
pooling_strategy: str = "reduce_mean",
|
||||
emb_extraction_layer: int = -1,
|
||||
top_k: int = 10,
|
||||
progress_bar: bool = True,
|
||||
devices: Optional[List[Union[str, "torch.device"]]] = None,
|
||||
use_auth_token: Optional[Union[str, bool]] = None,
|
||||
scale_score: bool = True,
|
||||
embed_meta_fields: Optional[List[str]] = None,
|
||||
api_key: Optional[str] = None,
|
||||
azure_api_version: str = "2022-12-01",
|
||||
azure_base_url: Optional[str] = None,
|
||||
azure_deployment_name: Optional[str] = None,
|
||||
api_base: str = "https://api.openai.com/v1",
|
||||
openai_organization: Optional[str] = None,
|
||||
):
|
||||
"""
|
||||
:param document_store: An instance of DocumentStore from which to retrieve documents.
|
||||
:param embedding_model: Local path or name of model in Hugging Face's model hub such
|
||||
as ``'sentence-transformers/all-MiniLM-L6-v2'``. The embedding model could also
|
||||
potentially be an OpenAI model ["ada", "babbage", "davinci", "curie"] or
|
||||
a Cohere model ["small", "medium", "large"].
|
||||
:param model_version: The version of model to use from the HuggingFace model hub. Can be tag name, branch name, or commit hash.
|
||||
:param use_gpu: Whether to use all available GPUs or the CPU. Falls back on CPU if no GPU is available.
|
||||
:param batch_size: Number of documents to encode at once.
|
||||
:param max_seq_len: Longest length of each document sequence. Maximum number of tokens for the document text. Longer ones will be cut down.
|
||||
:param model_format: Name of framework that was used for saving the model or model type. If no model_format is
|
||||
provided, it will be inferred automatically from the model configuration files.
|
||||
Options:
|
||||
|
||||
- ``'farm'`` (will use `_DefaultEmbeddingEncoder` as embedding encoder)
|
||||
- ``'transformers'`` (will use `_DefaultEmbeddingEncoder` as embedding encoder)
|
||||
- ``'sentence_transformers'`` (will use `_SentenceTransformersEmbeddingEncoder` as embedding encoder)
|
||||
- ``'retribert'`` (will use `_RetribertEmbeddingEncoder` as embedding encoder)
|
||||
- ``'openai'``: (will use `_OpenAIEmbeddingEncoder` as embedding encoder)
|
||||
- ``'cohere'``: (will use `_CohereEmbeddingEncoder` as embedding encoder)
|
||||
:param pooling_strategy: Strategy for combining the embeddings from the model (for farm / transformers models only).
|
||||
Options:
|
||||
|
||||
- ``'cls_token'`` (sentence vector)
|
||||
- ``'reduce_mean'`` (sentence vector)
|
||||
- ``'reduce_max'`` (sentence vector)
|
||||
- ``'per_token'`` (individual token vectors)
|
||||
:param emb_extraction_layer: Number of layer from which the embeddings shall be extracted (for farm / transformers models only).
|
||||
Default: -1 (very last layer).
|
||||
:param top_k: How many documents to return per query.
|
||||
:param progress_bar: If true displays progress bar during embedding.
|
||||
:param devices: List of torch devices (e.g. cuda, cpu, mps) to limit inference to specific devices.
|
||||
A list containing torch device objects and/or strings is supported (For example
|
||||
[torch.device('cuda:0'), "mps", "cuda:1"]). When specifying `use_gpu=False` the devices
|
||||
parameter is not used and a single cpu device is used for inference.
|
||||
Note: As multi-GPU training is currently not implemented for EmbeddingRetriever,
|
||||
training will only use the first device provided in this list.
|
||||
:param use_auth_token: The API token used to download private models from Huggingface.
|
||||
If this parameter is set to `True`, then the token generated when running
|
||||
`transformers-cli login` (stored in ~/.huggingface) will be used.
|
||||
Additional information can be found here
|
||||
https://huggingface.co/transformers/main_classes/model.html#transformers.PreTrainedModel.from_pretrained
|
||||
:param scale_score: Whether to scale the similarity score to the unit interval (range of [0,1]).
|
||||
If true (default) similarity scores (e.g. cosine or dot_product) which naturally have a different value range will be scaled to a range of [0,1], where 1 means extremely relevant.
|
||||
Otherwise raw similarity scores (e.g. cosine or dot_product) will be used.
|
||||
:param embed_meta_fields: Concatenate the provided meta fields and text passage / table to a text pair that is
|
||||
then used to create the embedding.
|
||||
This approach is also used in the TableTextRetriever paper and is likely to improve
|
||||
performance if your titles contain meaningful information for retrieval
|
||||
(topic, entities etc.).
|
||||
If no value is provided, a default empty list will be created.
|
||||
:param api_key: The OpenAI API key or the Cohere API key. Required if one wants to use OpenAI/Cohere embeddings.
|
||||
For more details see https://beta.openai.com/account/api-keys and https://dashboard.cohere.ai/api-keys
|
||||
:param azure_api_version: The version of the Azure OpenAI API to use. The default is `2022-12-01` version.
|
||||
:param azure_base_url: The base URL for the Azure OpenAI API. If not supplied, Azure OpenAI API will not be used.
|
||||
This parameter is an OpenAI Azure endpoint, usually in the form `https://<your-endpoint>.openai.azure.com'
|
||||
:param azure_deployment_name: The name of the Azure OpenAI API deployment. If not supplied, Azure OpenAI API
|
||||
will not be used.
|
||||
:param api_base: The OpenAI API base URL, defaults to `"https://api.openai.com/v1"`.
|
||||
:param openai_organization: The OpenAI-Organization ID, defaults to `None`. For more details, see OpenAI
|
||||
[documentation](https://platform.openai.com/docs/api-reference/requesting-organization).
|
||||
"""
|
||||
torch_and_transformers_import.check()
|
||||
|
||||
if embed_meta_fields is None:
|
||||
embed_meta_fields = []
|
||||
super().__init__()
|
||||
|
||||
self.devices, _ = initialize_device_settings(devices=devices, use_cuda=use_gpu, multi_gpu=True)
|
||||
|
||||
if batch_size < len(self.devices):
|
||||
logger.warning("Batch size is less than the number of devices.All gpus will not be utilized.")
|
||||
|
||||
self.document_store = document_store
|
||||
self.model_version = model_version
|
||||
self.use_gpu = use_gpu
|
||||
self.batch_size = batch_size
|
||||
self.max_seq_len = max_seq_len
|
||||
self.pooling_strategy = pooling_strategy
|
||||
self.emb_extraction_layer = emb_extraction_layer
|
||||
self.top_k = top_k
|
||||
self.progress_bar = progress_bar
|
||||
self.use_auth_token = use_auth_token
|
||||
self.scale_score = scale_score
|
||||
self.api_key = api_key
|
||||
self.api_base = api_base
|
||||
self.api_version = azure_api_version
|
||||
self.azure_base_url = azure_base_url
|
||||
self.azure_deployment_name = azure_deployment_name
|
||||
self.openai_organization = openai_organization
|
||||
self.model_format= model_format
|
||||
self.emb_caller= EmbeddingServiceCaller()
|
||||
|
||||
|
||||
|
||||
|
||||
self.embed_meta_fields = embed_meta_fields
|
||||
|
||||
def retrieve(
|
||||
self,
|
||||
query: str,
|
||||
filters: Optional[FilterType] = None,
|
||||
top_k: Optional[int] = None,
|
||||
index: Optional[str] = None,
|
||||
headers: Optional[Dict[str, str]] = None,
|
||||
scale_score: Optional[bool] = None,
|
||||
document_store: Optional[BaseDocumentStore] = None,
|
||||
) -> List[Document]:
|
||||
"""
|
||||
Scan through the documents in a DocumentStore and return a small number of documents
|
||||
that are most relevant to the query.
|
||||
|
||||
:param query: The query
|
||||
:param filters: Optional filters to narrow down the search space to documents whose metadata fulfill certain
|
||||
conditions.
|
||||
Filters are defined as nested dictionaries. The keys of the dictionaries can be a logical
|
||||
operator (`"$and"`, `"$or"`, `"$not"`), a comparison operator (`"$eq"`, `"$in"`, `"$gt"`,
|
||||
`"$gte"`, `"$lt"`, `"$lte"`) or a metadata field name.
|
||||
Logical operator keys take a dictionary of metadata field names and/or logical operators as
|
||||
value. Metadata field names take a dictionary of comparison operators as value. Comparison
|
||||
operator keys take a single value or (in case of `"$in"`) a list of values as value.
|
||||
If no logical operator is provided, `"$and"` is used as default operation. If no comparison
|
||||
operator is provided, `"$eq"` (or `"$in"` if the comparison value is a list) is used as default
|
||||
operation.
|
||||
|
||||
__Example__:
|
||||
|
||||
```python
|
||||
filters = {
|
||||
"$and": {
|
||||
"type": {"$eq": "article"},
|
||||
"date": {"$gte": "2015-01-01", "$lt": "2021-01-01"},
|
||||
"rating": {"$gte": 3},
|
||||
"$or": {
|
||||
"genre": {"$in": ["economy", "politics"]},
|
||||
"publisher": {"$eq": "nytimes"}
|
||||
}
|
||||
}
|
||||
}
|
||||
# or simpler using default operators
|
||||
filters = {
|
||||
"type": "article",
|
||||
"date": {"$gte": "2015-01-01", "$lt": "2021-01-01"},
|
||||
"rating": {"$gte": 3},
|
||||
"$or": {
|
||||
"genre": ["economy", "politics"],
|
||||
"publisher": "nytimes"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
To use the same logical operator multiple times on the same level, logical operators take
|
||||
optionally a list of dictionaries as value.
|
||||
|
||||
__Example__:
|
||||
|
||||
```python
|
||||
filters = {
|
||||
"$or": [
|
||||
{
|
||||
"$and": {
|
||||
"Type": "News Paper",
|
||||
"Date": {
|
||||
"$lt": "2019-01-01"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"$and": {
|
||||
"Type": "Blog Post",
|
||||
"Date": {
|
||||
"$gte": "2019-01-01"
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
:param top_k: How many documents to return per query.
|
||||
:param index: The name of the index in the DocumentStore from which to retrieve documents
|
||||
:param headers: Custom HTTP headers to pass to document store client if supported (e.g. {'Authorization': 'Basic API_KEY'} for basic authentication)
|
||||
:param scale_score: Whether to scale the similarity score to the unit interval (range of [0,1]).
|
||||
If true similarity scores (e.g. cosine or dot_product) which naturally have a different value range will be scaled to a range of [0,1], where 1 means extremely relevant.
|
||||
Otherwise raw similarity scores (e.g. cosine or dot_product) will be used.
|
||||
:param document_store: the docstore to use for retrieval. If `None`, the one given in the `__init__` is used instead.
|
||||
"""
|
||||
document_store = document_store or self.document_store
|
||||
if document_store is None:
|
||||
raise ValueError(
|
||||
"This Retriever was not initialized with a Document Store. Provide one to the retrieve() method."
|
||||
)
|
||||
if top_k is None:
|
||||
top_k = self.top_k
|
||||
if index is None:
|
||||
index = document_store.index
|
||||
if scale_score is None:
|
||||
scale_score = self.scale_score
|
||||
query_emb = self.embed_queries(queries=[query])
|
||||
documents = document_store.query_by_embedding(
|
||||
query_emb=query_emb, filters=filters, top_k=top_k, index=index, headers=headers, scale_score=scale_score
|
||||
)
|
||||
return documents
|
||||
|
||||
def retrieve_batch(
|
||||
self,
|
||||
queries: List[str],
|
||||
filters: Optional[Union[FilterType, List[Optional[FilterType]]]] = None,
|
||||
top_k: Optional[int] = None,
|
||||
index: Optional[str] = None,
|
||||
headers: Optional[Dict[str, str]] = None,
|
||||
batch_size: Optional[int] = None,
|
||||
scale_score: Optional[bool] = None,
|
||||
document_store: Optional[BaseDocumentStore] = None,
|
||||
) -> List[List[Document]]:
|
||||
"""
|
||||
Scan through the documents in a DocumentStore and return a small number of documents
|
||||
that are most relevant to the supplied queries.
|
||||
|
||||
Returns a list of lists of Documents (one per query).
|
||||
|
||||
:param queries: List of query strings.
|
||||
:param filters: Optional filters to narrow down the search space to documents whose metadata fulfill certain
|
||||
conditions. Can be a single filter that will be applied to each query or a list of filters
|
||||
(one filter per query).
|
||||
|
||||
Filters are defined as nested dictionaries. The keys of the dictionaries can be a logical
|
||||
operator (`"$and"`, `"$or"`, `"$not"`), a comparison operator (`"$eq"`, `"$in"`, `"$gt"`,
|
||||
`"$gte"`, `"$lt"`, `"$lte"`) or a metadata field name.
|
||||
Logical operator keys take a dictionary of metadata field names and/or logical operators as
|
||||
value. Metadata field names take a dictionary of comparison operators as value. Comparison
|
||||
operator keys take a single value or (in case of `"$in"`) a list of values as value.
|
||||
If no logical operator is provided, `"$and"` is used as default operation. If no comparison
|
||||
operator is provided, `"$eq"` (or `"$in"` if the comparison value is a list) is used as default
|
||||
operation.
|
||||
|
||||
__Example__:
|
||||
|
||||
```python
|
||||
filters = {
|
||||
"$and": {
|
||||
"type": {"$eq": "article"},
|
||||
"date": {"$gte": "2015-01-01", "$lt": "2021-01-01"},
|
||||
"rating": {"$gte": 3},
|
||||
"$or": {
|
||||
"genre": {"$in": ["economy", "politics"]},
|
||||
"publisher": {"$eq": "nytimes"}
|
||||
}
|
||||
}
|
||||
}
|
||||
# or simpler using default operators
|
||||
filters = {
|
||||
"type": "article",
|
||||
"date": {"$gte": "2015-01-01", "$lt": "2021-01-01"},
|
||||
"rating": {"$gte": 3},
|
||||
"$or": {
|
||||
"genre": ["economy", "politics"],
|
||||
"publisher": "nytimes"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
To use the same logical operator multiple times on the same level, logical operators take
|
||||
optionally a list of dictionaries as value.
|
||||
|
||||
__Example__:
|
||||
|
||||
```python
|
||||
filters = {
|
||||
"$or": [
|
||||
{
|
||||
"$and": {
|
||||
"Type": "News Paper",
|
||||
"Date": {
|
||||
"$lt": "2019-01-01"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"$and": {
|
||||
"Type": "Blog Post",
|
||||
"Date": {
|
||||
"$gte": "2019-01-01"
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
:param top_k: How many documents to return per query.
|
||||
:param index: The name of the index in the DocumentStore from which to retrieve documents
|
||||
:param headers: Custom HTTP headers to pass to document store client if supported (e.g. {'Authorization': 'Basic API_KEY'} for basic authentication)
|
||||
:param batch_size: Number of queries to embed at a time.
|
||||
:param scale_score: Whether to scale the similarity score to the unit interval (range of [0,1]).
|
||||
If true similarity scores (e.g. cosine or dot_product) which naturally have a different
|
||||
value range will be scaled to a range of [0,1], where 1 means extremely relevant.
|
||||
Otherwise raw similarity scores (e.g. cosine or dot_product) will be used.
|
||||
:param document_store: the docstore to use for retrieval. If `None`, the one given in the `__init__` is used instead.
|
||||
"""
|
||||
document_store = document_store or self.document_store
|
||||
if document_store is None:
|
||||
raise ValueError(
|
||||
"This Retriever was not initialized with a Document Store. Provide one to the retrieve_batch() method."
|
||||
)
|
||||
if top_k is None:
|
||||
top_k = self.top_k
|
||||
|
||||
if batch_size is None:
|
||||
batch_size = self.batch_size
|
||||
|
||||
if index is None:
|
||||
index = document_store.index
|
||||
if scale_score is None:
|
||||
scale_score = self.scale_score
|
||||
|
||||
# embed_queries is already batched within by batch_size, so no need to batch the input here
|
||||
query_embs: np.ndarray = self.embed_queries(queries=queries)
|
||||
batched_query_embs: List[np.ndarray] = []
|
||||
for i in range(0, len(query_embs), batch_size):
|
||||
batched_query_embs.extend(query_embs[i : i + batch_size])
|
||||
documents = document_store.query_by_embedding_batch(
|
||||
query_embs=batched_query_embs,
|
||||
top_k=top_k,
|
||||
filters=filters,
|
||||
index=index,
|
||||
headers=headers,
|
||||
scale_score=scale_score,
|
||||
)
|
||||
|
||||
return documents
|
||||
|
||||
def embed_queries(self, queries: List[str]) -> np.ndarray:
|
||||
if isinstance(queries, str):
|
||||
queries = [queries]
|
||||
assert isinstance(queries, list), "Expecting a list of texts, i.e. create_embeddings(texts=['text1',...])"
|
||||
return np.array(self.emb_caller.get_embeddings(queries[0], embedding_type="last_layer" ))
|
||||
|
||||
def embed_documents(self, documents: List[Document]) -> np.ndarray:
|
||||
documents = self._preprocess_documents(documents)
|
||||
embeddings=[]
|
||||
for doc in documents:
|
||||
embeddings.append(self.emb_caller.get_embeddings(doc.content, embedding_type="last_layer"))
|
||||
return np.array(embeddings)
|
||||
|
||||
def _preprocess_documents(self, docs: List[Document]) -> List[Document]:
|
||||
"""
|
||||
Turns table documents into text documents by representing the table in csv format.
|
||||
This allows us to use text embedding models for table retrieval.
|
||||
It also concatenates specified meta data fields with the text representations.
|
||||
|
||||
:param docs: List of documents to linearize. If the document is not a table, it is returned as is.
|
||||
:return: List of documents with meta data + linearized tables or original documents if they are not tables.
|
||||
"""
|
||||
linearized_docs = []
|
||||
for doc in docs:
|
||||
doc = deepcopy(doc)
|
||||
if doc.content_type == "table":
|
||||
if isinstance(doc.content, pd.DataFrame):
|
||||
doc.content = doc.content.to_csv(index=False)
|
||||
else:
|
||||
raise HaystackError("Documents of type 'table' need to have a pd.DataFrame as content field")
|
||||
# Gather all relevant metadata fields
|
||||
meta_data_fields = []
|
||||
for key in self.embed_meta_fields:
|
||||
if key in doc.meta and doc.meta[key]:
|
||||
if isinstance(doc.meta[key], list):
|
||||
meta_data_fields.extend([item for item in doc.meta[key]])
|
||||
else:
|
||||
meta_data_fields.append(doc.meta[key])
|
||||
# Convert to type string (e.g. for ints or floats)
|
||||
meta_data_fields = [str(field) for field in meta_data_fields]
|
||||
doc.content = "\n".join(meta_data_fields + [doc.content])
|
||||
linearized_docs.append(doc)
|
||||
return linearized_docs
|
||||
|
||||
@staticmethod
|
||||
def _infer_model_format(model_name_or_path: str, use_auth_token: Optional[Union[str, bool]]) -> str:
|
||||
valid_openai_model_name = model_name_or_path in ["ada", "babbage", "davinci", "curie"] or any(
|
||||
m in model_name_or_path for m in ["-ada-", "-babbage-", "-davinci-", "-curie-"]
|
||||
)
|
||||
if valid_openai_model_name:
|
||||
return "openai"
|
||||
if model_name_or_path in ["small", "medium", "large", "multilingual-22-12", "finance-sentiment"]:
|
||||
return "cohere"
|
||||
# Check if model name is a local directory with sentence transformers config file in it
|
||||
if Path(model_name_or_path).exists():
|
||||
if Path(f"{model_name_or_path}/config_sentence_transformers.json").exists():
|
||||
return "sentence_transformers"
|
||||
# Check if sentence transformers config file in model hub
|
||||
else:
|
||||
try:
|
||||
hf_hub_download( # type: ignore [call-arg]
|
||||
repo_id=model_name_or_path,
|
||||
filename="config_sentence_transformers.json",
|
||||
use_auth_token=use_auth_token,
|
||||
)
|
||||
return "sentence_transformers"
|
||||
except HTTPError:
|
||||
pass
|
||||
|
||||
# Check if retribert model
|
||||
config = AutoConfig.from_pretrained(model_name_or_path, use_auth_token=use_auth_token)
|
||||
if config.model_type == "retribert":
|
||||
return "retribert"
|
||||
|
||||
# Model is neither sentence-transformers nor retribert model -> use _DefaultEmbeddingEncoder
|
||||
return "farm"
|
||||
|
||||
def train(
|
||||
self,
|
||||
training_data: List[Dict[str, Any]],
|
||||
learning_rate: float = 2e-5,
|
||||
n_epochs: int = 1,
|
||||
num_warmup_steps: Optional[int] = None,
|
||||
batch_size: int = 16,
|
||||
train_loss: Literal["mnrl", "margin_mse"] = "mnrl",
|
||||
num_workers: int = 0,
|
||||
use_amp: bool = False,
|
||||
**kwargs,
|
||||
) -> None:
|
||||
"""
|
||||
Trains/adapts the underlying embedding model. We only support the training of sentence-transformer embedding models.
|
||||
|
||||
Each training data example is a dictionary with the following keys:
|
||||
|
||||
* question: the question string
|
||||
* pos_doc: the positive document string
|
||||
* neg_doc: the negative document string
|
||||
* score: the score margin
|
||||
|
||||
:param training_data: The training data in a dictionary format.
|
||||
:param learning_rate: The learning rate.
|
||||
:param n_epochs: The number of epochs that you want the train for.
|
||||
:param num_warmup_steps: Behavior depends on the scheduler. For WarmupLinear (default), the learning rate is
|
||||
increased from 0 up to the maximal learning rate. After these many training steps, the learning rate is
|
||||
decreased linearly back to zero.
|
||||
:param batch_size: The batch size to use for the training. The default values is 16.
|
||||
:param train_loss: The loss to use for training.
|
||||
If you're using a sentence-transformer embedding_model (which is the only model that training is supported for),
|
||||
possible values are 'mnrl' (Multiple Negatives Ranking Loss) or 'margin_mse' (MarginMSE).
|
||||
:param num_workers: The number of subprocesses to use for the Pytorch DataLoader.
|
||||
:param use_amp: Use Automatic Mixed Precision (AMP).
|
||||
:param kwargs: Additional training key word arguments to pass to the `SentenceTransformer.fit` function. Please
|
||||
reference the Sentence-Transformers [documentation](https://www.sbert.net/docs/training/overview.html#sentence_transformers.SentenceTransformer.fit)
|
||||
for a full list of keyword arguments.
|
||||
"""
|
||||
send_event(event_name="Training", event_properties={"class": self.__class__.__name__, "function_name": "train"})
|
||||
|
|
@ -0,0 +1,13 @@
|
|||
import json
|
||||
squad_format = {"data": []}
|
||||
with open('gold_standard_retriever_stupo.json', 'r') as f:
|
||||
data_input = json.load(f)
|
||||
for idx, item in enumerate(data_input):
|
||||
paragraphs = []
|
||||
for context in item["context"]:
|
||||
qas = [{"question": q, "id": f"{idx}_{qid}" ,"answers":[{"text":context, "answer_start":0}], "is_impossible":False} for qid, q in enumerate(item["questions"])]
|
||||
paragraphs.append({"context": context, "qas": qas})
|
||||
squad_format["data"].append({"title": f"doc_{idx}", "paragraphs": paragraphs})
|
||||
|
||||
with open('squad_format.json', 'w', encoding='utf-8') as f:
|
||||
json.dump(squad_format, f, ensure_ascii=False, indent=4)
|
|
@ -0,0 +1,317 @@
|
|||
from typing import Dict, Optional, List
|
||||
from haystack.document_stores.base import BaseDocumentStore
|
||||
from haystack.schema import Document, MultiLabel
|
||||
from haystack.nodes.retriever import BaseRetriever
|
||||
import logging
|
||||
from time import perf_counter
|
||||
from tqdm import tqdm
|
||||
import sys
|
||||
import json
|
||||
from haystack.nodes import (
|
||||
SentenceTransformersRanker,
|
||||
|
||||
)
|
||||
sys.path.append("../..")
|
||||
from reranker import ReRanker
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
def eval(
|
||||
document_store: BaseDocumentStore ,
|
||||
retriever: BaseRetriever,
|
||||
reRankerGPT: ReRanker=None,
|
||||
rerankerPipeline:SentenceTransformersRanker=None,
|
||||
label_index: str = "label",
|
||||
doc_index: str = "eval_document",
|
||||
label_origin: str = "gold-label",
|
||||
top_k: int = 10,
|
||||
open_domain: bool = False,
|
||||
return_preds: bool = False,
|
||||
headers: Optional[Dict[str, str]] = None,
|
||||
) -> dict:
|
||||
# Extract all questions for evaluation
|
||||
filters: Dict = {"origin": [label_origin]}
|
||||
debug=[]
|
||||
time_taken=0
|
||||
if document_store is None:
|
||||
raise ValueError(
|
||||
"This Retriever was not initialized with a Document Store. Provide one to the eval() method."
|
||||
)
|
||||
labels: List[MultiLabel] = document_store.get_all_labels_aggregated(
|
||||
index=label_index,
|
||||
filters=filters,
|
||||
open_domain=open_domain,
|
||||
drop_negative_labels=True,
|
||||
drop_no_answers=False,
|
||||
headers=headers,
|
||||
)
|
||||
|
||||
correct_retrievals = 0
|
||||
summed_avg_precision = 0.0
|
||||
summed_reciprocal_rank = 0.0
|
||||
|
||||
# Collect questions and corresponding answers/document_ids in a dict
|
||||
question_label_dict = {}
|
||||
for label in labels:
|
||||
# document_ids are empty if no_answer == True
|
||||
if not label.no_answer:
|
||||
id_question_tuple = (label.document_ids[0], label.query)
|
||||
if open_domain:
|
||||
# here are no no_answer '' included if there are other actual answers
|
||||
question_label_dict[id_question_tuple] = label.answers
|
||||
else:
|
||||
deduplicated_doc_ids = list({str(x) for x in label.document_ids})
|
||||
question_label_dict[id_question_tuple] = deduplicated_doc_ids
|
||||
|
||||
predictions = []
|
||||
|
||||
# Option 1: Open-domain evaluation by checking if the answer string is in the retrieved docs
|
||||
logger.info("Performing eval queries...")
|
||||
if open_domain:
|
||||
for (_, question), gold_answers in tqdm(question_label_dict.items()):
|
||||
tic = perf_counter()
|
||||
retrieved_docs = retriever.retrieve(query= question, headers=headers, index= doc_index, top_k= top_k)
|
||||
item={"retrieved_ids": [doc.id for doc in retrieved_docs]}
|
||||
if reRankerGPT:
|
||||
reranked_docs= reRankerGPT.rerank_documents_with_gpt35(query= question,documents=retrieved_docs)
|
||||
print(reranked_docs,)
|
||||
item["reranked_ids"]= [doc.id for doc in reranked_docs]
|
||||
item["isEqual"]= item["reranked_ids"] == item["retrieved_ids"]
|
||||
retrieved_docs= reRankerGPT.get_final_references(reranked_documents=reranked_docs, retrieved_documents=retrieved_docs)
|
||||
item["final_reorderd_ids"]= [doc.id for doc in retrieved_docs]
|
||||
if rerankerPipeline:
|
||||
retrieved_docs= rerankerPipeline.predict(query=question, documents=retrieved_docs)
|
||||
debug.append({question:item})
|
||||
toc = perf_counter()
|
||||
time_taken+= toc -tic
|
||||
if return_preds:
|
||||
predictions.append({"question": question, "retrieved_docs": retrieved_docs})
|
||||
# check if correct doc in retrieved docs
|
||||
found_relevant_doc = False
|
||||
relevant_docs_found = 0
|
||||
current_avg_precision = 0.0
|
||||
print("GOLD ANWERS: ", gold_answers)
|
||||
for doc_idx, doc in enumerate(retrieved_docs):
|
||||
for gold_answer in gold_answers:
|
||||
if gold_answer in doc.content:
|
||||
|
||||
relevant_docs_found += 1
|
||||
if not found_relevant_doc:
|
||||
correct_retrievals += 1
|
||||
summed_reciprocal_rank += 1 / (doc_idx + 1)
|
||||
current_avg_precision += relevant_docs_found / (doc_idx + 1)
|
||||
found_relevant_doc = True
|
||||
break
|
||||
if found_relevant_doc:
|
||||
summed_avg_precision += current_avg_precision / relevant_docs_found
|
||||
# Option 2: Strict evaluation by document ids that are listed in the labels
|
||||
else:
|
||||
for (_, question), gold_ids in tqdm(question_label_dict.items()):
|
||||
tic = perf_counter()
|
||||
retrieved_docs = retriever.retrieve(query= question, headers=headers, index= doc_index, top_k= top_k)
|
||||
item={"retrieved_ids": [doc.id for doc in retrieved_docs]}
|
||||
if reRanker:
|
||||
reranked_docs= reRanker.rerank_documents_with_gpt35(query= question,documents=retrieved_docs)
|
||||
print(reranked_docs,)
|
||||
item["reranked_ids"]= [doc.id for doc in reranked_docs]
|
||||
item["isEqual"]= item["reranked_ids"] == item["retrieved_ids"]
|
||||
retrieved_docs= reRanker.get_final_references(reranked_documents=reranked_docs, retrieved_documents=retrieved_docs)
|
||||
item["final_reorderd_ids"]= [doc.id for doc in retrieved_docs]
|
||||
debug.append({question:item})
|
||||
toc = perf_counter()
|
||||
time_taken+= toc -tic
|
||||
if return_preds:
|
||||
predictions.append({"question": question, "retrieved_docs": retrieved_docs})
|
||||
# check if correct doc in retrieved docs
|
||||
found_relevant_doc = False
|
||||
relevant_docs_found = 0
|
||||
current_avg_precision = 0.0
|
||||
for doc_idx, doc in enumerate(retrieved_docs):
|
||||
for gold_id in gold_ids:
|
||||
if str(doc.id) == gold_id:
|
||||
relevant_docs_found += 1
|
||||
if not found_relevant_doc:
|
||||
correct_retrievals += 1
|
||||
summed_reciprocal_rank += 1 / (doc_idx + 1)
|
||||
current_avg_precision += relevant_docs_found / (doc_idx + 1)
|
||||
found_relevant_doc = True
|
||||
break
|
||||
if found_relevant_doc:
|
||||
all_relevant_docs = len(set(gold_ids))
|
||||
summed_avg_precision += current_avg_precision / all_relevant_docs
|
||||
# Metrics
|
||||
number_of_questions = len(question_label_dict)
|
||||
recall = correct_retrievals / number_of_questions
|
||||
mean_reciprocal_rank = summed_reciprocal_rank / number_of_questions
|
||||
mean_avg_precision = summed_avg_precision / number_of_questions
|
||||
|
||||
logger.info(
|
||||
"For {} out of {} questions ({:.2%}), the answer was in the top-{} candidate passages selected by the retriever.".format(
|
||||
correct_retrievals, number_of_questions, recall, top_k
|
||||
)
|
||||
)
|
||||
|
||||
metrics = {
|
||||
"recall": recall,
|
||||
"map": mean_avg_precision,
|
||||
"mrr": mean_reciprocal_rank,
|
||||
"retrieve_time": time_taken,
|
||||
"n_questions": number_of_questions,
|
||||
"top_k": top_k,
|
||||
}
|
||||
with open("debug.json", "w") as fp:
|
||||
json.dump(debug, fp, ensure_ascii=False)
|
||||
if return_preds:
|
||||
return {"metrics": metrics, "predictions": predictions}
|
||||
else:
|
||||
return metrics
|
||||
|
||||
|
||||
def eval_llama(
|
||||
document_store: BaseDocumentStore ,
|
||||
vector_store: BaseDocumentStore ,
|
||||
retriever: BaseRetriever,
|
||||
reRanker: ReRanker=None,
|
||||
label_index: str = "label",
|
||||
doc_index: str = "eval_document",
|
||||
label_origin: str = "gold-label",
|
||||
top_k: int = 10,
|
||||
open_domain: bool = False,
|
||||
return_preds: bool = False,
|
||||
headers: Optional[Dict[str, str]] = None,
|
||||
) -> dict:
|
||||
# Extract all questions for evaluation
|
||||
filters: Dict = {"origin": [label_origin]}
|
||||
debug=[]
|
||||
time_taken=0
|
||||
if document_store is None:
|
||||
raise ValueError(
|
||||
"This Retriever was not initialized with a Document Store. Provide one to the eval() method."
|
||||
)
|
||||
labels: List[MultiLabel] = document_store.get_all_labels_aggregated(
|
||||
index=label_index,
|
||||
filters=filters,
|
||||
open_domain=open_domain,
|
||||
drop_negative_labels=True,
|
||||
drop_no_answers=False,
|
||||
headers=headers,
|
||||
)
|
||||
|
||||
correct_retrievals = 0
|
||||
summed_avg_precision = 0.0
|
||||
summed_reciprocal_rank = 0.0
|
||||
|
||||
# Collect questions and corresponding answers/document_ids in a dict
|
||||
question_label_dict = {}
|
||||
for label in labels:
|
||||
# document_ids are empty if no_answer == True
|
||||
if not label.no_answer:
|
||||
id_question_tuple = (label.document_ids[0], label.query)
|
||||
if open_domain:
|
||||
# here are no no_answer '' included if there are other actual answers
|
||||
question_label_dict[id_question_tuple] = label.answers
|
||||
else:
|
||||
deduplicated_doc_ids = list({str(x) for x in label.document_ids})
|
||||
question_label_dict[id_question_tuple] = deduplicated_doc_ids
|
||||
|
||||
predictions = []
|
||||
|
||||
# Option 1: Open-domain evaluation by checking if the answer string is in the retrieved docs
|
||||
logger.info("Performing eval queries...")
|
||||
if open_domain:
|
||||
for (_, question), gold_answers in tqdm(question_label_dict.items()):
|
||||
tic = perf_counter()
|
||||
retrieved_docs = retriever.retrieve(query= question, headers=headers, index= doc_index, top_k= top_k)
|
||||
print("retrieved_docs: ", retrieved_docs)
|
||||
item={"retrieved_ids": [doc.id for doc in retrieved_docs]}
|
||||
if reRanker:
|
||||
reranked_docs= reRanker.rerank_documents_with_gpt35(query= question,documents=retrieved_docs)
|
||||
print(reranked_docs,)
|
||||
item["reranked_ids"]= [doc.id for doc in reranked_docs]
|
||||
item["isEqual"]= item["reranked_ids"] == item["retrieved_ids"]
|
||||
retrieved_docs= reRanker.get_final_references(reranked_documents=reranked_docs, retrieved_documents=retrieved_docs)
|
||||
item["final_reorderd_ids"]= [doc.id for doc in retrieved_docs]
|
||||
debug.append({question:item})
|
||||
toc = perf_counter()
|
||||
time_taken+= toc -tic
|
||||
if return_preds:
|
||||
predictions.append({"question": question, "retrieved_docs": retrieved_docs})
|
||||
# check if correct doc in retrieved docs
|
||||
found_relevant_doc = False
|
||||
relevant_docs_found = 0
|
||||
current_avg_precision = 0.0
|
||||
print("GOLD ANWERS: ", gold_answers)
|
||||
for doc_idx, doc in enumerate(retrieved_docs):
|
||||
for gold_answer in gold_answers:
|
||||
if gold_answer in doc.content:
|
||||
|
||||
relevant_docs_found += 1
|
||||
if not found_relevant_doc:
|
||||
correct_retrievals += 1
|
||||
summed_reciprocal_rank += 1 / (doc_idx + 1)
|
||||
current_avg_precision += relevant_docs_found / (doc_idx + 1)
|
||||
found_relevant_doc = True
|
||||
break
|
||||
if found_relevant_doc:
|
||||
summed_avg_precision += current_avg_precision / relevant_docs_found
|
||||
# Option 2: Strict evaluation by document ids that are listed in the labels
|
||||
else:
|
||||
for (_, question), gold_ids in tqdm(question_label_dict.items()):
|
||||
tic = perf_counter()
|
||||
retrieved_docs = retriever.retrieve(query= question, headers=headers, index= doc_index, top_k= top_k)
|
||||
item={"retrieved_ids": [doc.id for doc in retrieved_docs]}
|
||||
if reRanker:
|
||||
reranked_docs= reRanker.rerank_documents_with_gpt35(query= question,documents=retrieved_docs)
|
||||
print(reranked_docs,)
|
||||
item["reranked_ids"]= [doc.id for doc in reranked_docs]
|
||||
item["isEqual"]= item["reranked_ids"] == item["retrieved_ids"]
|
||||
retrieved_docs= reRanker.get_final_references(reranked_documents=reranked_docs, retrieved_documents=retrieved_docs)
|
||||
item["final_reorderd_ids"]= [doc.id for doc in retrieved_docs]
|
||||
debug.append({question:item})
|
||||
toc = perf_counter()
|
||||
time_taken+= toc -tic
|
||||
if return_preds:
|
||||
predictions.append({"question": question, "retrieved_docs": retrieved_docs})
|
||||
# check if correct doc in retrieved docs
|
||||
found_relevant_doc = False
|
||||
relevant_docs_found = 0
|
||||
current_avg_precision = 0.0
|
||||
for doc_idx, doc in enumerate(retrieved_docs):
|
||||
for gold_id in gold_ids:
|
||||
if str(doc.id) == gold_id:
|
||||
relevant_docs_found += 1
|
||||
if not found_relevant_doc:
|
||||
correct_retrievals += 1
|
||||
summed_reciprocal_rank += 1 / (doc_idx + 1)
|
||||
current_avg_precision += relevant_docs_found / (doc_idx + 1)
|
||||
found_relevant_doc = True
|
||||
break
|
||||
if found_relevant_doc:
|
||||
all_relevant_docs = len(set(gold_ids))
|
||||
summed_avg_precision += current_avg_precision / all_relevant_docs
|
||||
# Metrics
|
||||
number_of_questions = len(question_label_dict)
|
||||
recall = correct_retrievals / number_of_questions
|
||||
mean_reciprocal_rank = summed_reciprocal_rank / number_of_questions
|
||||
mean_avg_precision = summed_avg_precision / number_of_questions
|
||||
|
||||
logger.info(
|
||||
"For {} out of {} questions ({:.2%}), the answer was in the top-{} candidate passages selected by the retriever.".format(
|
||||
correct_retrievals, number_of_questions, recall, top_k
|
||||
)
|
||||
)
|
||||
|
||||
metrics = {
|
||||
"recall": recall,
|
||||
"map": mean_avg_precision,
|
||||
"mrr": mean_reciprocal_rank,
|
||||
"retrieve_time": time_taken,
|
||||
"n_questions": number_of_questions,
|
||||
"top_k": top_k,
|
||||
}
|
||||
with open("debug.json", "w") as fp:
|
||||
json.dump(debug, fp, ensure_ascii=False)
|
||||
if return_preds:
|
||||
return {"metrics": metrics, "predictions": predictions}
|
||||
else:
|
||||
return metrics
|
||||
|
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
|
@ -0,0 +1,37 @@
|
|||
from custom_evaluation import eval
|
||||
import sys
|
||||
sys.path.append("../..")
|
||||
from reranker import ReRanker
|
||||
from retriever.retriever_pipeline import CustomPipeline
|
||||
from haystack.nodes import PreProcessor
|
||||
doc_index = "stupo_eval_docs"
|
||||
label_index = "stupo_eval_labels"
|
||||
pipeline= CustomPipeline(doc_index=doc_index, label_index=label_index, api_key="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")
|
||||
|
||||
reranker= ReRanker()
|
||||
open_domain=True
|
||||
if open_domain:
|
||||
preprocessor = PreProcessor(
|
||||
split_by="word",
|
||||
split_length=100,
|
||||
split_overlap=0,
|
||||
split_respect_sentence_boundary=False,
|
||||
clean_empty_lines=False,
|
||||
clean_whitespace=False,
|
||||
)
|
||||
pipeline.doc_store_ada.delete_documents(index=doc_index)
|
||||
pipeline.doc_store_ada.delete_documents(index=label_index)
|
||||
|
||||
# The add_eval_data() method converts the given dataset in json format into Haystack document and label objects. Those objects are then indexed in their respective document and label index in the document store. The method can be used with any dataset in SQuAD format.
|
||||
pipeline.doc_store_ada.add_eval_data(
|
||||
filename="squad_format.json",
|
||||
doc_index=doc_index,
|
||||
label_index=label_index,
|
||||
preprocessor=preprocessor,
|
||||
)
|
||||
pipeline.doc_store_ada.update_embeddings(pipeline.emb_retriever_ada, index=doc_index)
|
||||
|
||||
index= "stupo" if open_domain else doc_index
|
||||
retriever_eval_results= eval(label_index=label_index, doc_index=index, top_k=10, document_store= pipeline.doc_store_ada, retriever= pipeline.emb_retriever_ada, reRankerGPT=reranker, rerankerPipeline=None, open_domain=open_domain)
|
||||
|
||||
print(retriever_eval_results)
|
|
@ -0,0 +1,37 @@
|
|||
from custom_evaluation import eval
|
||||
doc_index = "stupo_eval_docs_distilbert"
|
||||
label_index = "stupo_eval_labels_distilbert"
|
||||
|
||||
from haystack.nodes import PreProcessor
|
||||
import sys
|
||||
sys.path.append("../..")
|
||||
from retriever.retriever_pipeline import CustomPipeline
|
||||
pipeline= CustomPipeline(doc_index=doc_index, label_index=label_index)
|
||||
from reranker import ReRanker
|
||||
reranker= ReRanker()
|
||||
open_domain=True
|
||||
|
||||
if not open_domain:
|
||||
preprocessor = PreProcessor(
|
||||
split_by="word",
|
||||
split_length=100,
|
||||
split_overlap=0,
|
||||
split_respect_sentence_boundary=False,
|
||||
clean_empty_lines=False,
|
||||
clean_whitespace=False,
|
||||
)
|
||||
pipeline.doc_store_distilbert.delete_documents(index=doc_index)
|
||||
pipeline.doc_store_distilbert.delete_documents(index=label_index)
|
||||
|
||||
# The add_eval_data() method converts the given dataset in json format into Haystack document and label objects. Those objects are then indexed in their respective document and label index in the document store. The method can be used with any dataset in SQuAD format.
|
||||
pipeline.doc_store_distilbert.add_eval_data(
|
||||
filename="squad_format.json",
|
||||
doc_index=doc_index,
|
||||
label_index=label_index,
|
||||
preprocessor=preprocessor,
|
||||
)
|
||||
pipeline.doc_store_distilbert.update_embeddings(pipeline.retriever_distilbert, index=doc_index)
|
||||
|
||||
index= "stupo" if open_domain else doc_index
|
||||
retriever_eval_results= eval(label_index=label_index, doc_index=index, top_k=20, document_store= pipeline.doc_store_distilbert, retriever= pipeline.retriever_distilbert, reRankerGPT=None, rerankerPipeline=pipeline.ranker, open_domain=open_domain)
|
||||
print(retriever_eval_results)
|
|
@ -0,0 +1,37 @@
|
|||
import numpy as np
|
||||
import sys
|
||||
from LlamaRetriever import LlamaRetriever
|
||||
sys.path.append("../..")
|
||||
from retriever.retriever_pipeline import CustomPipeline
|
||||
from api.embeddingsServiceCaller import EmbeddingServiceCaller
|
||||
from haystack.nodes import PreProcessor
|
||||
from custom_evaluation import eval, eval_llama
|
||||
|
||||
caller= EmbeddingServiceCaller()
|
||||
doc_index = "stupo_eval_docs_llama"
|
||||
label_index = "stupo_eval_labels_llama"
|
||||
pipeline= CustomPipeline(label_index=label_index, doc_index=doc_index)
|
||||
|
||||
retriever= LlamaRetriever(document_store=pipeline.vector_doc_store_llama)
|
||||
open_domain=True
|
||||
if not open_domain:
|
||||
preprocessor = PreProcessor(
|
||||
split_by="word",
|
||||
split_length=100,
|
||||
split_overlap=0,
|
||||
split_respect_sentence_boundary=False,
|
||||
clean_empty_lines=False,
|
||||
clean_whitespace=False,
|
||||
)
|
||||
# emb_query = np.array(caller.get_embeddings(query))
|
||||
# results = pipeline.query_by_emb(index=index, emb=emb_query)
|
||||
pipeline.doc_store_mpnet.add_eval_data(
|
||||
filename="squad_format.json",
|
||||
doc_index=doc_index,
|
||||
label_index=label_index,
|
||||
preprocessor=preprocessor,
|
||||
)
|
||||
# pipeline.vector_doc_store.update_embeddings(retriever,index=doc_index )
|
||||
index= "stupo" if open_domain else doc_index
|
||||
retriever_eval_results= eval_llama(label_index=label_index, doc_index=index, top_k=30, document_store= pipeline.doc_store_mpnet, vector_store= pipeline.vector_doc_store_llama, retriever= retriever, reRanker=None, open_domain=open_domain)
|
||||
print(retriever_eval_results)
|
|
@ -0,0 +1,37 @@
|
|||
from custom_evaluation import eval
|
||||
doc_index = "stupo_eval_docs"
|
||||
label_index = "stupo_eval_labels"
|
||||
|
||||
from haystack.nodes import PreProcessor
|
||||
import sys
|
||||
sys.path.append("../..")
|
||||
from retriever.retriever_pipeline import CustomPipeline
|
||||
pipeline= CustomPipeline(doc_index=doc_index, label_index=label_index)
|
||||
from reranker import ReRanker
|
||||
reranker= ReRanker()
|
||||
open_domain=True
|
||||
|
||||
if not open_domain:
|
||||
preprocessor = PreProcessor(
|
||||
split_by="word",
|
||||
split_length=100,
|
||||
split_overlap=0,
|
||||
split_respect_sentence_boundary=False,
|
||||
clean_empty_lines=False,
|
||||
clean_whitespace=False,
|
||||
)
|
||||
pipeline.doc_store_mpnet.delete_documents(index=doc_index)
|
||||
pipeline.doc_store_mpnet.delete_documents(index=label_index)
|
||||
|
||||
# The add_eval_data() method converts the given dataset in json format into Haystack document and label objects. Those objects are then indexed in their respective document and label index in the document store. The method can be used with any dataset in SQuAD format.
|
||||
pipeline.doc_store_mpnet.add_eval_data(
|
||||
filename="squad_format.json",
|
||||
doc_index=doc_index,
|
||||
label_index=label_index,
|
||||
preprocessor=preprocessor,
|
||||
)
|
||||
pipeline.doc_store_mpnet.update_embeddings(pipeline.emb_retriever_mpnet, index=doc_index)
|
||||
|
||||
index= "stupo" if open_domain else doc_index
|
||||
retriever_eval_results= eval(label_index=label_index, doc_index=index, top_k=20, document_store= pipeline.doc_store_mpnet, retriever= pipeline.emb_retriever_mpnet, reRankerGPT=None, rerankerPipeline=pipeline.ranker, open_domain=open_domain)
|
||||
print(retriever_eval_results)
|
|
@ -0,0 +1,260 @@
|
|||
[
|
||||
{
|
||||
"questions": [
|
||||
"Was sind die Zulassungsvoraussetzungen um an der Hochschule zu Studieren?",
|
||||
"Was ist erforderlich, um an der Hochschule Mannheim studieren zu können?"
|
||||
],
|
||||
"context": [
|
||||
"Studiengang Unternehmens- und Wirtschaftsinformatik (UIB)\n20. Studiengang Verfahrenstechnik (VB),\n21. Studiengang Wirtschaftsingenieurwesen (WB)\n22. Studiengang Wirtschaftsingenieurwesen International (WBI)\n(2) Alle Amts-, Status-, Funktions- und Berufsbezeichnungen, die in dieser Ordnung\nin männlicher Form erscheinen, betreffen alle Geschlechtsidentitäten. Dies gilt auch\nfür die Führung von Hochschulgraden, akademischen Bezeichnungen und Titeln.\n\n\f\nTeil A: Allgemeiner Teil\nI. Allgemeines\n§ 2 Allgemeine Zulassungsvoraussetzungen\nZum Studium an der Hochschule Mannheim kann zugelassen werden, wer eine Hoch-\nschulzugangsberechtigung nach § 58 Abs. 2 LHG hat. Näheres regelt die Zulassungs-\nund Immatrikulationsordnung der Hochschule.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"questions": ["Wie lange beträgt die Regelstudienzeit?"],
|
||||
"context": [
|
||||
"§ 3 Dauer und Gliederung des Studiums\n(1) Die Regelstudienzeit beträgt in den Studiengängen nach § 1 Abs. 1 Nr. 1 – 21 als\nVollzeitstudium sieben Semester. Im Studiengang Wirtschaftsingenieurwesen Interna-\ntional beträgt die Regelstudienzeit als Vollzeitstudium acht Semester. Wird das Stu-\ndium in Teilzeit absolviert, verlängert es sich entsprechend. Näheres zum Teilzeitstu-\ndium regelt die Satzung der Hochschule für ein Studium in Teilzeit.\n(2) Das Studium umfasst die theoretischen Studiensemester, ein integriertes prakti-\nsches Studiensemester (außer für den Studiengang Soziale Arbeit Plus) und die Prü-\nfungen einschließlich der Bachelorarbeit.\n(3) Das Studium in den Studiengängen nach § 1 Abs."
|
||||
]
|
||||
},
|
||||
{
|
||||
"questions": [
|
||||
"Wieviele Credits brauche ich für den erfolgreichen Abschluss eines Bachelorstudiums?",
|
||||
"Wieviele Credits sind notwendig, für den Abschluss des Bachelorstudiums?",
|
||||
"Wieviel Stunden Arbeitsaufwand enspricht 1 CR?",
|
||||
"Wieviel Stunden Arbeitsaufwand enspricht 1 Credit?"
|
||||
],
|
||||
"context": [
|
||||
"1 Nrn. 1-9, 11-15 und 18-22\ngliedert sich in das Grundstudium, das nach zwei Semestern mit der Bachelorvorprü-\nfung abschließt, und das Hauptstudium, das mit der Bachelorprüfung abschließt.\n(4) Der Gesamtumfang der für den erfolgreichen Abschluss des Studiums erforderli-\nchen Lehrveranstaltungen im Pflicht- und Wahlpflichtbereich in Semesterwochenstun-\nden mit den zugeordneten Anrechnungspunkten (Credits) ist im Besonderen Teil fest-\ngelegt.\n(5) Für den erfolgreichen Abschluss eines Bachelorstudiums ist der Nachweis von\nmindestens 210 Credits erforderlich. 1 Credit (CR) entspricht einem studentischen Ar-\nbeitsaufwand („workload“) von etwa 30 Stunden.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"questions": ["Welche Regelungen gelten während Mutterschutz?"],
|
||||
"context": [
|
||||
"(6) Durch Beschluss des für den Studiengang zuständigen Fakultätsrats können die\nim Besonderen Teil festgelegte Reihenfolge und Art der Lehrveranstaltungen und Prü-\nfungen aus triftigen Gründen im Einzelfall abgeändert werden.\n(7) Die Hochschule berücksichtigt die besonderen Bedürfnisse von Studierenden mit\nKindern oder pflegebedürftigen Angehörigen sowie von Studierenden mit Behinderung\noder chronischer Erkrankung.\n(8) In den Schutzfristen des Gesetzes zum Schutz von Müttern bei der Arbeit, in der\nAusbildung und im Studium (Mutterschutzgesetz – MuSchG) vom 23. Mai 2017 (BGBl.\nI S. 1228) in der jeweils geltenden Fassung sind Studierende nicht verpflichtet, Prü-\nfungsleistungen zu erbringen. ",
|
||||
"Das Recht, aus sonstigen während und nach einer\nSchwangerschaft eintretenden Umständen von einer Prüfungsleistung zurückzutreten,\nbleibt unberührt. Die Schutzfristen unterbrechen jede Frist nach dieser Prüfungsord-\nnung.\n(9) Liegen in der Person eines Studierenden Beeinträchtigungen aufgrund einer Be-\nhinderung oder chronischen Krankheit vor, die das Erbringen der Studienleistungen\ninnerhalb der Fristen gem. § 6 Absatz 2 Satz 1 in besonderer Weise erschweren, kann\nder zuständige Prüfungsausschuss im Benehmen mit dem Beauftragten für die Be-\nlange von Studierenden mit Behinderung auf Antrag eine individuelle Verlängerung\nder Frist für die Erbringung der Prüfungsleistungen genehmigen. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"questions": [
|
||||
"Werden Personen mit chronischen Krankheiten berücksichtigt?",
|
||||
"Welche Regelungen gelten bei chronischen Krankheiten?"
|
||||
],
|
||||
"context": [
|
||||
"Das Recht, aus sonstigen während und nach einer\nSchwangerschaft eintretenden Umständen von einer Prüfungsleistung zurückzutreten,\nbleibt unberührt. Die Schutzfristen unterbrechen jede Frist nach dieser Prüfungsord-\nnung.\n(9) Liegen in der Person eines Studierenden Beeinträchtigungen aufgrund einer Be-\nhinderung oder chronischen Krankheit vor, die das Erbringen der Studienleistungen\ninnerhalb der Fristen gem. § 6 Absatz 2 Satz 1 in besonderer Weise erschweren, kann\nder zuständige Prüfungsausschuss im Benehmen mit dem Beauftragten für die Be-\nlange von Studierenden mit Behinderung auf Antrag eine individuelle Verlängerung\nder Frist für die Erbringung der Prüfungsleistungen genehmigen. ",
|
||||
"(3) Vor der Entscheidung des zuständige Prüfungsausschusses nach Absatz 2 ist in\nstrittigen Fällen mit Einverständnis der Studierenden der Beauftragte für Studierende\nmit Behinderung oder chronischer Erkrankung beziehungsweise eine andere sachver-\nständige Person anzuhören.\n(4) Anträge auf Nachteilsausgleich sind spätestens acht Wochen vor der jeweiligen\nModulprüfung zu stellen. Die Beeinträchtigung ist von den Studierenden darzulegen\nund durch ein ärztliches Attest, das die für die Beurteilung erforderlichen Befundtatsa-\nchen enthalten muss, nachzuweisen. Diese Frist kann im Ausnahmefall verkürzt wer-\nden, wenn die Nichteinhaltung der Frist nicht selbst zu vertreten ist.\n",
|
||||
"(2) Bei prüfungsunabhängigen nicht nur vorübergehenden oder chronischen gesund-\nheitlichen Beeinträchtigungen von Studierenden, welche die Erbringung von Prüfungs-\nleistungen oder Studienleistungen erschweren, kann der zuständige Prüfungsaus-\nschuss auf Antrag in Textform angemessene Maßnahmen zum Ausgleich der Beein-\nträchtigungen treffen; auf den Nachweis von Fähigkeiten, die zum Leistungsbild der\nabgenommenen Prüfung gehören, darf nicht verzichtet werden. Als Ausgleichsmaß-\nnahmen können bei schriftlichen Prüfungen insbesondere die Bearbeitungszeit ange-\nmessen verlängert, nicht auf die Bearbeitungszeit anzurechnende Ruhepausen ge-\nwährt oder persönliche oder sächliche Hilfsmittel zugelassen werden.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"questions": [
|
||||
"Wie lange ist die Frist für die Bachelorvorprüfung?",
|
||||
"Bis wann muss die Bachlorvorprüfung bestanden werden?"
|
||||
],
|
||||
"context": [
|
||||
"Die in der Prüfungsvorleistung erbrachte Leistung kann\nin einem Umfang von bis zu einem Drittel auf die Modulprüfung angerechnet werden;\ndies ist den Studierenden zu Beginn der Lehrveranstaltung anzukündigen.\n§ 6 Verlust des Prüfungsanspruchs / Fristen\n(1) Die Modulprüfungen zur Bachelorvorprüfung sollen bis zum Ende des zweiten Se-\nmesters, die Modulprüfungen zur Bachelorprüfung bis zum Ende des siebten Semes-\nters abgelegt sein.\n(2) Der Prüfungsanspruch erlischt, wenn die Prüfungsleistungen für die Bachelorvor-\nprüfung nicht spätestens zwei Studiensemester nach dem in Abs. 1 festgelegten Zeit-\npunkt erbracht sind, es sei denn, die Fristenüberschreitung ist nicht zu vertreten. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"questions": [
|
||||
"Wie kann man Elternzeit beantragen?",
|
||||
"Was muss ich beachten, wenn ich Elternzeit beantragen will?"
|
||||
],
|
||||
"context": [
|
||||
"Im\nHauptstudium sowie in Studiengängen ohne Bachelorvorprüfung, gilt keine Höchstfrist\nzur Erbringung von Prüfungsleistungen mehr.\n(3) Auf Antrag sind die Fristen der Elternzeit entsprechend § 15 Absatz 1 bis 3 des\nGesetzes zum Elterngeld und zur Elternzeit (Bundeselterngeld- und Elternzeitgesetz\n– BEEG) in der Fassung der Bekanntmachung vom 27. Januar 2015 (BGBl. I S. 33) in\nder jeweils geltenden Fassung zu berücksichtigen. Studierende müssen spätestens\nvier Wochen vor dem Zeitpunkt, von dem ab sie die Elternzeit antreten wollen, dem\nzuständigen Prüfungsausschuss unter Beifügung der erforderlichen Nachweise in\nTextform mitteilen, für welchen Zeitraum sie Elternzeit nehmen wollen. ",
|
||||
"Der zuständige\nPrüfungsausschuss prüft, ob die gesetzlichen Voraussetzungen für die Inanspruch-\nnahme von Elternzeit vorliegen, und teilt das Ergebnis sowie gegebenenfalls die neu\nfestgesetzten Prüfungsfristen den Studierenden mit. Die Bearbeitungszeit der Ba-\nchelorarbeit kann nicht durch die Elternzeit unterbrochen werden. Das den Studieren-\nden gestellte Thema gilt als nicht vergeben. Nach Ablauf der Elternzeit wird den Stu-\ndierenden ein neues Thema für die Bachelorarbeit gestellt.\n(4) Absatz 3 gilt entsprechend für die Inanspruchnahme der Schutzbestimmungen für\ndie Pflege von nahen Angehörigen im Sinne von § 7 Absatz 3 des Gesetzes über die\nPflegezeit (Pflegezeitgesetz – PflegeZG) vom 28. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"questions": [
|
||||
"Wieviele Prüfungsleistungen darf ich während dem praktischen Studiensemester absolvieren?",
|
||||
"Wieviele Prüfungen darf ich während dem Pflichtpraktikum schreiben?",
|
||||
"Bis wann kann ich mich spätestens von Prüfungen wieder abmelden?"
|
||||
],
|
||||
"context": [
|
||||
"Studierende können\nsich bei Erstversuchen bis zum Tag vor der Prüfung wieder abmelden; für Wiederho-\nlungsprüfungen gilt § 14 Abs. 3.\n\n\f\n§ 8 Prüfungsleistungen\n(1) Die studienbegleitenden Modulprüfungen werden in der Regel während der Prü-\nfungswochen außerhalb der Vorlesungszeit des Studiensemesters erbracht. Zwischen\nden Prüfungen eines Fachsemesters soll jeweils mindestens ein Tag prüfungsfrei sein\nund Prüfungen unmittelbar aufeinander folgender Fachsemester sollen nicht am glei-\nchen Tag stattfinden. In einem praktischen Studiensemester können über die nach §\n4 zu erbringenden Leistungen hinaus höchstens zwei Studien- oder Prüfungsleistun-\ngen erbracht werden.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"questions": [
|
||||
"Wieviele Präsenztage muss ich im Pflichtpraktikum ableisten?",
|
||||
"Wieviele Arbeitstage muss ich im Pflichtpraktikum ableisten?"
|
||||
],
|
||||
"context": [
|
||||
"Der Umfang der Stu-\ndienzeitverlängerung ist auf Grundlage eines fachärztlichen Gutachtens zu bemessen.\f\nDer Antrag kann im ersten Studiensemester, bei nachträglichem Eintreten der Beein-\nträchtigung innerhalb von sechs Monaten, gestellt werden.\n§ 4 Praktisches Studiensemester\n(1) In die Studiengänge nach § 1 Abs. 1 Nr. 1 – 16 und 18 – 22 ist ein praktisches\nStudiensemester integriert; das praktische Studiensemester liegt in der Regel im fünf-\nten Fachsemester.\n(2) Im praktischen Studiensemester sind in einem Betrieb oder in einer anderen Ein-\nrichtung der Berufspraxis 100 Arbeitstage abzuleisten. Die Studierenden werden von\neinem Professor betreut. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"questions": [
|
||||
"Was für Tätigkeiten müssen im praktischen Studiensemester durchgeführt werden?",
|
||||
"Welche Tätigkeiten müssen im praktischen Studiensemester durchgeführt werden?",
|
||||
"Muss ich Berichte im praktischen Studiensemester schreiben?",
|
||||
"Muss ich Tätigkeitsberichte im praktischen Studiensemester schreiben?",
|
||||
"Wie wird entschieden, ob das praktische Studiensemester bestanden wurde?",
|
||||
"Auf welcher Grundlage wird entschieden, ob das praktische Studiensemester bestanden wurde?"
|
||||
],
|
||||
"context": [
|
||||
"Zum praktischen Studiensemester gehören begleitende\nLehrveranstaltungen, die an der Hochschule in der Regel in Form von Blockveranstal-\ntungen abgehalten werden.\n(3) Über die Tätigkeit während des praktischen Studiensemesters haben die Studie-\nrenden schriftliche Berichte zu erstellen und diese von der Praxisstelle bestätigen zu\nlassen. Am Ende des praktischen Studiensemesters stellt die Praxisstelle einen Tätig-\nkeitsnachweis aus, der Art und Inhalt der Tätigkeit, Beginn und Ende der Praxiszeit\nsowie Fehlzeiten ausweist. Auf der Grundlage der Praxisberichte, des Tätigkeitsnach-\nweises, sowie des Nachweises über den erfolgreichen Abschluss der begleitenden\nLehrveranstaltungen wird entschieden, ob die Studierenden das praktische Studien-\nsemester erfolgreich abgeleistet haben.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"questions": [
|
||||
"Was sind die Voraussetzungen für das Praktische Studiensemester?",
|
||||
"Wann kann das praktische Studiensemester begonnen werden?"
|
||||
],
|
||||
"context": [
|
||||
"(4) In den Fakultäten wird ein Praktikantenamt eingerichtet. Die Leitung übernimmt\nein Professor, der von Amts wegen Mitglied des Prüfungsausschusses ist (§ 16 Absatz\n2). Das Praktikantenamt entscheidet über die Anerkennung des praktischen Studien-\nsemester als erfolgreich erbrachte Studienleistung.\n(5) Die Beschaffung eines Platzes für das praktische Studiensemester obliegt den\nStudierenden. Die Praxisstellen sind von den Studierenden vorzuschlagen und vom\nPraktikantenamt zu genehmigen; in Zweifelsfällen entscheidet der zuständige Prü-\nfungsausschuss.\n(6) Ein praktisches Studiensemester soll nur begonnen werden, wenn die Studien-\nund Prüfungsleistungen der vorangegangenen Studiensemester erfolgreich erbracht\nwurden. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"questions": [
|
||||
"Was ist die Bachelorvorprüfung?",
|
||||
"Was ist der zweck der Bachelorvorprüfung?"
|
||||
],
|
||||
"context": [
|
||||
"Bachelorvorprüfung\n§ 19 Zweck und Durchführung der Bachelorvorprüfung\n(1) Durch die Bachelorvorprüfung soll nachgewiesen werden, dass das Studium mit\nAussicht auf Erfolg fortgesetzt werden kann und dass die inhaltlichen Grundlagen des\nStudienfaches, ein methodisches Instrumentarium und eine systematische Orientie-\nrung erworben wurden.\n(2) Die Prüfungsleistungen der Bachelorvorprüfung werden in der Regel studienbe-\ngleitend im Anschluss an die jeweiligen Lehrveranstaltungen des Grundstudiums\ndurchgeführt. Die Bachelorvorprüfung ist so auszugestalten, dass sie vor Beginn der\nVorlesungszeit des auf das Grundstudium folgenden Semesters abgeschlossen wer-\nden kann.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"questions": [
|
||||
"Bekommt man ein Zeugnis für die Bachelorvorprüfung?",
|
||||
"Was ist die Bachlorprüfung?",
|
||||
"Was ist der zweck der Bachlorprüfung?",
|
||||
"Was ist der Zweck und Durchführung der Bachelorprüfung?"
|
||||
],
|
||||
"context": [
|
||||
"2) wird auf Antrag unverzüg-\nlich, möglichst innerhalb von vier Wochen, ein Zeugnis ausgestellt, das die Modulnoten\nund die Gesamtnote enthält; die Noten sind mit dem nach § 11 Abs. 5 ermittelten De-\nzimalwert als Klammerzusatz zu versehen.\n\n\f\nIII. Bachelorprüfung\n§ 23 Zweck und Durchführung der Bachelorprüfung\n(1) Die Bachelorprüfung bildet den berufsqualifizierenden Abschluss des Bachelorstu-\ndiengangs. Durch die Bachelorprüfung wird festgestellt, ob die Zusammenhänge des\nFachs überblickt werden, die Fähigkeit vorhanden ist, dessen Methoden und Erkennt-\nnisse anzuwenden, und die für den Übergang in die Berufspraxis notwendigen Fach-\nkompetenzen erworben wurden.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"questions": [
|
||||
"Was sind die fachlichen Voraussetzungen für die Bachelorprüfung?"
|
||||
],
|
||||
"context": [
|
||||
"(2) Die Modulprüfungen der Bachelorprüfung werden in der Regel studienbegleitend\nim Anschluss an die jeweiligen Lehrveranstaltungen des Hauptstudiums durchgeführt.\n§ 24 Fachliche Voraussetzungen\n(1) Die Modulprüfungen der Bachelorprüfung kann nur ablegen, wer in dem Studien-\ngang, in dem die Bachelorprüfung abgelegt werden soll, die Bachelorvorprüfung be-\nstanden hat. Es ist zulässig, dass Studierende Prüfungsleistungen der Bachelorprü-\nfung ablegen, wenn zur vollständigen Bachelorvorprüfung maximal drei Prüfungsleis-\ntungen fehlen. Weitere Ausnahmen bedürfen der Zustimmung des Prüfungsausschus-\nses.\n(2) Die erfolgreiche Teilnahme an dem praktischen Studiensemester ist spätestens\nbei der Ausgabe der Bachelorarbeit nachzuweisen.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"questions": ["Was ist die Bachelorarbeit?"],
|
||||
"context": [
|
||||
"§ 25 Art und Umfang der Bachelorprüfung\n(1) Im Besonderen Teil wird für die Bachelorprüfung festgelegt, welche Modulprüfun-\ngen in den Pflicht- und Wahlpflichtbereichen abzulegen sind.\n(2) Gegenstand der Modulprüfungen sind die Inhalte der nach Maßgabe des Beson-\nderen Teils zugeordneten Lehrveranstaltungen.\n§ 26 Ausgabe und Bearbeitungszeit der Bachelorarbeit\n(1) Die Bachelorarbeit ist eine Prüfungsleistung. Sie soll zeigen, dass innerhalb einer\nvorgegebenen Frist ein Problem aus dem gewählten Fachgebiet selbstständig bear-\nbeitet werden kann. Voraussetzung für die Zulassung zur Bachelorarbeit ist der Nach-\nweis der Erbringung aller für die ersten fünf Studiensemester vorgesehenen Studien-\nund Prüfungsleistungen. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"questions": [
|
||||
"Was sind die Vorraussetzungen um die Bachelorarbeit schreiben zu dürfen?",
|
||||
"Wann darf die Bachelorarbeit geschrieben werden?"
|
||||
],
|
||||
"context": [
|
||||
"§ 25 Art und Umfang der Bachelorprüfung\n(1) Im Besonderen Teil wird für die Bachelorprüfung festgelegt, welche Modulprüfun-\ngen in den Pflicht- und Wahlpflichtbereichen abzulegen sind.\n(2) Gegenstand der Modulprüfungen sind die Inhalte der nach Maßgabe des Beson-\nderen Teils zugeordneten Lehrveranstaltungen.\n§ 26 Ausgabe und Bearbeitungszeit der Bachelorarbeit\n(1) Die Bachelorarbeit ist eine Prüfungsleistung. Sie soll zeigen, dass innerhalb einer\nvorgegebenen Frist ein Problem aus dem gewählten Fachgebiet selbstständig bear-\nbeitet werden kann. Voraussetzung für die Zulassung zur Bachelorarbeit ist der Nach-\nweis der Erbringung aller für die ersten fünf Studiensemester vorgesehenen Studien-\nund Prüfungsleistungen. ",
|
||||
"Wurden bereits 150 Credits erworben, kann der Prüfungs-\nausschuss auf Antrag eine Zulassung zur Bachelorarbeit aussprechen.\n(2) Mit Zustimmung der Prüfer kann die Bachelorarbeit auch in einer anderen Sprache\nals Deutsch angefertigt werden. Die Bachelorarbeit kann auch in Form einer Gruppen-\narbeit erbracht werden, wenn der als Prüfungsleistung zu bewertendem Beitrag der\nEinzelnen aufgrund von objektiven Kriterien, die eine eindeutige Abgrenzung ermögli-\nchen, deutlich unterscheidbar und bewertbar ist und die Anforderungen nach Absatz 1\nerfüllt.\n(3) Die Bachelorarbeit wird von einem Professor betreut."
|
||||
]
|
||||
},
|
||||
{
|
||||
"questions": [
|
||||
"In welcher Sprache darf die Bachelorarbeit geschrieben werden?",
|
||||
"Darf ich die Bachelorarbeit auf Englisch schreiben?",
|
||||
"Darf man die Bachelorarbeit als Gruppe bearbeiten?",
|
||||
"Ist Gruppenarbeit erlaubt in der Bachelorarbeit?"
|
||||
],
|
||||
"context": [
|
||||
"Wurden bereits 150 Credits erworben, kann der Prüfungs-\nausschuss auf Antrag eine Zulassung zur Bachelorarbeit aussprechen.\n(2) Mit Zustimmung der Prüfer kann die Bachelorarbeit auch in einer anderen Sprache\nals Deutsch angefertigt werden. Die Bachelorarbeit kann auch in Form einer Gruppen-\narbeit erbracht werden, wenn der als Prüfungsleistung zu bewertendem Beitrag der\nEinzelnen aufgrund von objektiven Kriterien, die eine eindeutige Abgrenzung ermögli-\nchen, deutlich unterscheidbar und bewertbar ist und die Anforderungen nach Absatz 1\nerfüllt.\n(3) Die Bachelorarbeit wird von einem Professor betreut."
|
||||
]
|
||||
},
|
||||
{
|
||||
"questions": [
|
||||
"Darf man die Bachelorarbeit in einem Unternehmen durchführen?",
|
||||
"Wie lange beträgt die bearbeitungszeit für die Bachelorarbeit?",
|
||||
"Darf ich die Frist der Bachelorthesis verlängern?",
|
||||
"Ist es möglich die Frist der Bachelorarbeit zu verlängern?"
|
||||
],
|
||||
"context": [
|
||||
"Soll die Bachelorarbeit in einer Ein-\nrichtung außerhalb der Hochschule durchgeführt werden, ist dies zusammen mit der\nAnmeldung dem Prüfungsamt anzuzeigen.\n(5) Die Bearbeitungszeit für die Bachelorarbeit beträgt drei Monate. Soweit dies zur\nGewährleistung gleicher Prüfungsbedingungen oder aus Gründen, die von der zu prü-\nfenden Person nicht zu vertreten sind, erforderlich ist, kann die Bearbeitungszeit um\nhöchstens zwei weitere Monate verlängert werden; die Entscheidung darüber trifft der\nPrüfungsausschuss auf der Grundlage einer Stellungnahme des Betreuers der Ba-\nchelorarbeit."
|
||||
]
|
||||
},
|
||||
{
|
||||
"questions": ["Wie findet die Abgabe der Bachelorarbeit statt?"],
|
||||
"context": [
|
||||
"Thema, Aufgabenstellung und Umfang der Bachelorarbeit sind vom Be-\ntreuer so zu begrenzen, dass die Frist zur Bearbeitung der Bachelorarbeit eingehalten\nwerden kann.\n§ 27 Abgabe und Bewertung der Bachelorarbeit\n(1) Die Bachelorarbeit ist fristgemäß beim Sekretariat der Fakultät, der der Studien-\ngang zugeordnet ist, einzureichen. Der Einreichungszeitpunkt ist aktenkundig zu ma-\nchen. Bei der Einreichung versichert der Kandidat in Textform: „Hiermit erkläre ich,\ndass ich die vorliegende Arbeit selbständig verfasst und keine anderen als die ange-\ngebenen Quellen und Hilfsmittel benutzt habe.“ Bei einer Gruppenarbeit ist der ent-\nsprechend gekennzeichnete Teil der Arbeit mit dieser Erklärung zu versehen.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"questions": [
|
||||
"Findet nach der Abgage der Bachelorarbeit eine münndliche Prüfung statt?",
|
||||
"Muss man nach der Abgabe der Bachelorarbei ein Kolloquium durchführen?",
|
||||
"Was ist das Kolloquium und was wird da geprüft?",
|
||||
"Wie läuft das Kolloquium ab?"
|
||||
],
|
||||
"context": [
|
||||
"(2) Die Bachelorarbeit ist von zwei Prüfern zu bewerten. Einer der Prüfer soll der Be-\ntreuer der Arbeit sein. Über den Inhalt der Bachelorarbeit findet eine mündliche Prü-\nfung (Kolloquium) statt, die in der Regel hochschulöffentlich ist. Das Bewertungsver-\nfahren soll vier Wochen nicht überschreiten.\n§ 28 Bildung der Gesamtnote und Zeugnis\n(1) Die Gesamtnote errechnet sich gemäß § 11 Abs. 2, 4 und 5 aus den Modulnoten\nund der Note der Bachelorarbeit. Im Besonderen Teil kann für einzelne Modulnoten\neine besondere, an den Anrechnungspunkten orientierte Gewichtung vorgesehen wer-\nden. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"questions": [
|
||||
"Muss ich bei einer Online Prüfung meine Identität nachweisen?"
|
||||
],
|
||||
"context": [
|
||||
"(3) Vor Beginn einer Online-Prüfung müssen an der Prüfung teilnehmende ihre Iden-\ntität auf Aufforderung in geeigneter Weise nachweisen, z.B. durch das Zeigen eines\namtlichen Lichtbildausweises oder eines Studierendenausweises mit Lichtbild.\n(4) An der Prüfung Teilnehmenden soll rechtzeitig vor der Prüfung die Möglichkeit ein-\ngeräumt werden, die Rahmenbedingungen der Online-Prüfung in Bezug auf Technik,\nAusstattung und räumliche Umgebung zu erproben.\n(5) Soweit in dieser und in den nachfolgenden Vorschriften über Online-Prüfungen\nnichts anderes bestimmt ist, sind die übrigen Vorschriften der Studien- und Prüfungs-\nordnung für die Online-Prüfungen anwendbar.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"questions": [
|
||||
"Was passiert, wenn ich eine Prüfung versäume?",
|
||||
"Was passiert, wenn ich nicht zur Prüfung antrete?"
|
||||
],
|
||||
"context": [
|
||||
"(7) Einwendungen gegen die inhaltliche Bewertung von Prüfungsleistungen sind spä-\ntestens binnen sechs Monaten nach Bekanntgabe der Prüfungsentscheidung zu erhe-\nben und im Einzelnen und nachvollziehbar in Textform zu begründen. Mit Ablauf der\nEinwendungsfrist sind alle Einwendungen gegen die Bewertung ausgeschlossen.\n§ 12 Versäumnis, Rücktritt, Täuschung, Ordnungsverstoß\n(1) Eine Prüfungsleistung gilt als mit „nicht ausreichend“ (5,0) bewertet, wenn ein Prü-\nfungstermin einer angemeldeten Prüfung ohne triftigen Grund versäumt wird, oder\nwenn jemand entgegen den Bestimmungen nach § 7 Abs. 2 ohne triftigen Grund zu-\nrücktritt oder die Prüfung abbricht. ",
|
||||
"Wird die Frist für die Durchführung der Wiederholungsprüfung versäumt,\nerlischt der Prüfungsanspruch, es sei denn, das Versäumnis ist von der zu prüfenden\nPerson nicht zu vertreten.\n(4) Voraussetzung für eine zweite Wiederholung (Drittversuch) von Prüfungsleistun-\ngen ist die Inanspruchnahme einer Studienfachberatung durch die Fakultät. Ein ent-\nsprechender Nachweis ist dem Prüfungsamt durch den Studierenden spätestens\nsechs Wochen nach Beginn der Vorlesungszeit des folgenden Semesters vorzulegen.\nWird der Nachweis nicht fristgerecht vorgelegt, erlischt der Prüfungsanspruch, es sei\ndenn, das Versäumnis ist von der zu prüfenden Person nicht zu vertreten.\n(5) Eine dritte Wiederholung einer Prüfungsleistung (Viertversuch) ist nicht möglich.\n",
|
||||
"Dasselbe gilt, wenn eine schriftliche Prüfungsleis-\ntung nicht innerhalb der vorgegebenen Bearbeitungszeit erbracht wird.\n(2) Der für das Versäumnis, den Rücktritt oder den Abbruch geltend gemachte Grund\nmuss unverzüglich schriftlich angezeigt und glaubhaft gemacht werden. Bei Krankheit\nist ein ärztliches Attest vorzulegen. Aus dem Attest muss hervorgehen, woraus sich\ndie Prüfungsunfähigkeit ergeben hat. In Zweifelsfällen ist ein Attest eines von der\nHochschule benannten Arztes vorzulegen. Der Krankheit der Studierenden steht die\nKrankheit eines von ihnen zu versorgenden Kindes oder zu pflegenden Angehörigen\ngleich.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"questions": [
|
||||
"Welche Konsequenzen hat eine Täuschung bei einer Prüfungsleistung?",
|
||||
"Was kann passieren, wenn ich bei einer Prüungsleistung Täusche?"
|
||||
],
|
||||
"context": [
|
||||
"(3) Versucht jemand, das Ergebnis einer Prüfungsleistung durch Täuschung oder Be-\nnutzung nicht zugelassener Hilfsmittel zu beeinflussen, wird die betreffende Prüfungs-\nleistung mit „nicht ausreichend“ (5,0) bewertet. Wer den ordnungsgemäßen Ablauf des\nPrüfungstermins stört, kann von dem jeweiligen Prüfer oder Aufsichtführenden von der\nFortsetzung der Prüfungsleistung ausgeschlossen werden; in diesem Fall wird die Prü-\nfungsleistung mit „nicht ausreichend“ (5,0) bewertet. In schwerwiegenden oder wieder-\nholten Fällen kann der Prüfungsausschuss die zu prüfende Person von der Erbringung\nweiterer Prüfungsleistungen in dem Studiengang ausschließen. In diesem Fall erlischt\nder Prüfungsanspruch in dem Studiengang. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"questions": [
|
||||
"Was passiert, wenn ich die Bachelorarbeit nicht bestanden habe?"
|
||||
],
|
||||
"context": [
|
||||
"(2) Die Bachelorvorprüfung ist bestanden, wenn sämtliche Modulprüfungen der Ba-\nchelorvorprüfung bestanden sind. Die Bachelorprüfung ist bestanden, wenn das prak-\ntische Studiensemester erfolgreich abgeschlossen ist, sämtliche Modulprüfungen der\nBachelorprüfung bestanden und die Bachelorarbeit mindestens mit der Note „ausrei-\nchend“ (4,0) bewertet wurde.\n(3) Wurde die Bachelorarbeit nicht bestanden, wird der geprüften Person Auskunft\ndarüber erteilt, ob und gegebenenfalls in welchem Umfang und in welcher Frist die\nBachelorarbeit wiederholt werden kann.\f\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"questions": [
|
||||
"Wie oft darf ich die Bachelorarbeit wiederholen?",
|
||||
"Was passiert wenn amn die Bachelorvorprüfung nicht besteht?",
|
||||
"Was passiert, wenn man die Bachelorprüfung nicht besteht?",
|
||||
"Wie oft dürfen Prüfungsleistungen wiederholt werden?"
|
||||
],
|
||||
"context": [
|
||||
"(4) Wurde die Bachelorvorprüfung oder die Bachelorprüfung nicht bestanden, wird auf\nAntrag und gegen Vorlage der entsprechenden Nachweise sowie der Exmatrikulati-\nonsbescheinigung eine Bescheinigung ausgestellt, die die erbrachten Prüfungsleistun-\ngen und deren Noten sowie die noch fehlenden Prüfungsleistungen enthält und erken-\nnen lässt, dass die Bachelorvorprüfung oder die Bachelorprüfung nicht bestanden ist.\n§ 14 Wiederholung der Prüfungsleistungen\n(1) Wiederholungsprüfungen sollen innerhalb eines halben Jahres bzw. spätestens im\nPrüfungszeitraum des Folgesemesters angeboten werden.\n(2) Nicht bestandene Prüfungsleistungen können – mit Ausnahme der Bachelorarbeit\nund des praktischen Studiensemesters – zweimal wiederholt werden. Bachelorarbeit\nund praktisches Studiensemester dürfen nur einmal wiederholt werden. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"questions": [
|
||||
"Unter welchen Umständen darf ein Drittversuch durchgeführt werden?"
|
||||
],
|
||||
"context": [
|
||||
"Wird die Frist für die Durchführung der Wiederholungsprüfung versäumt,\nerlischt der Prüfungsanspruch, es sei denn, das Versäumnis ist von der zu prüfenden\nPerson nicht zu vertreten.\n(4) Voraussetzung für eine zweite Wiederholung (Drittversuch) von Prüfungsleistun-\ngen ist die Inanspruchnahme einer Studienfachberatung durch die Fakultät. Ein ent-\nsprechender Nachweis ist dem Prüfungsamt durch den Studierenden spätestens\nsechs Wochen nach Beginn der Vorlesungszeit des folgenden Semesters vorzulegen.\nWird der Nachweis nicht fristgerecht vorgelegt, erlischt der Prüfungsanspruch, es sei\ndenn, das Versäumnis ist von der zu prüfenden Person nicht zu vertreten.\n(5) Eine dritte Wiederholung einer Prüfungsleistung (Viertversuch) ist nicht möglich.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"questions": [
|
||||
"Wie bildet sich die Gesamtnote des Endzeugnisses?",
|
||||
"Wie bildet sich die Gesamtnote des Zeugnisses?"
|
||||
],
|
||||
"context": [
|
||||
"(2) Die Bachelorarbeit ist von zwei Prüfern zu bewerten. Einer der Prüfer soll der Be-\ntreuer der Arbeit sein. Über den Inhalt der Bachelorarbeit findet eine mündliche Prü-\nfung (Kolloquium) statt, die in der Regel hochschulöffentlich ist. Das Bewertungsver-\nfahren soll vier Wochen nicht überschreiten.\n§ 28 Bildung der Gesamtnote und Zeugnis\n(1) Die Gesamtnote errechnet sich gemäß § 11 Abs. 2, 4 und 5 aus den Modulnoten\nund der Note der Bachelorarbeit. Im Besonderen Teil kann für einzelne Modulnoten\neine besondere, an den Anrechnungspunkten orientierte Gewichtung vorgesehen wer-\nden. ",
|
||||
"In Studiengängen mit Bachelorvorprüfung werden die Modulnoten des Grundstu-\ndiums nicht berücksichtigt.\n(2) Bei überragenden Leistungen (Gesamtnote 1,2 oder besser) wird das Gesamtur-\nteil „mit Auszeichnung bestanden“ erteilt.\n(3) Über die bestandene Bachelorprüfung wird unverzüglich, möglichst innerhalb von\nvier Wochen, auf Antrag des Studierenden ein Zeugnis ausgestellt. In das Zeugnis\nsind die Modulnoten, das Thema der Bachelorarbeit und deren Note sowie die Ge-\nsamtnote aufzunehmen; die Noten sind mit dem nach § 11 Abs. 5 ermittelten Dezimal-\nwert als Klammerzusatz zu versehen."
|
||||
]
|
||||
},
|
||||
{
|
||||
"questions": ["Was ist das Diploma Supplement?"],
|
||||
"context": [
|
||||
"Gegebenenfalls sind ferner die Studienrichtung,\ndie Studienschwerpunkte und die bis zum Abschluss der Bachelorprüfung benötigte\nFachstudiendauer in das Zeugnis aufzunehmen.\n(4) Das Zeugnis trägt das Datum des Tages, an dem die letzte Prüfungsleistung er-\nbracht worden ist.\n§ 29 Diploma Supplement\nÜber die wesentlichen Studieninhalte wird ein Diploma Supplement in deutscher oder\nin englischer Sprache ausgestellt, das Teil des Bachelorzeugnisses ist. Das Diploma\nSupplement enthält Informationen über Studieninhalte und -ergebnisse, erworbene\nQualifikationen und weitere Berechtigungen.\f\n(2) Das Diploma Supplement wird über ein elektronisches Selbstbedienungsverfahren\nausgestellt.\n"
|
||||
]
|
||||
}
|
||||
]
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,43 @@
|
|||
import requests
|
||||
import json
|
||||
from tqdm import tqdm
|
||||
|
||||
with open('questions.json') as f:
|
||||
questions = json.load(f)
|
||||
|
||||
headers = {
|
||||
'Authorization': 'Basic Og==',
|
||||
'Content-Type': 'application/json'
|
||||
}
|
||||
|
||||
|
||||
url = "http://127.0.0.1:5001/get_relevant_documents"
|
||||
|
||||
url_es_get_doc_by_id = "http://127.0.0.1:5001/get_document_by_id"
|
||||
|
||||
top_five_results=[]
|
||||
for question_item in tqdm(questions):
|
||||
question= question_item["question"]
|
||||
payload = json.dumps({
|
||||
"query": question,
|
||||
"index": "ib"
|
||||
})
|
||||
response = requests.request("POST", url, headers=headers, data=payload)
|
||||
results= response.json()
|
||||
ids = [result["meta"]["es_id"] for result in results[:5]]
|
||||
top_five_results=[]
|
||||
|
||||
for id in ids:
|
||||
payload = json.dumps({
|
||||
"id": id
|
||||
})
|
||||
response = requests.request("POST", url_es_get_doc_by_id, headers=headers, data=payload)
|
||||
|
||||
es_docs=response.json()
|
||||
for es_doc in es_docs:
|
||||
score= [result["score"] for result in results if result["meta"]["es_id"] == es_doc["id"]]
|
||||
top_five_results.append({"title": es_doc["meta"]["name_de"], "score":score, "description":es_doc["content"] })
|
||||
question_item["top_results"]= top_five_results
|
||||
|
||||
with open('answered_questions.json', 'w', encoding="utf-8") as outfile:
|
||||
json.dump(questions, outfile, ensure_ascii=False)
|
|
@ -0,0 +1,158 @@
|
|||
[
|
||||
{
|
||||
"question": "Ich interessiere mich für Techniken und Methoden im Bereich Softwareentwicklung.",
|
||||
"label": "Ausgewählte Probleme des Software Engineerings (APS)"
|
||||
},
|
||||
{
|
||||
"question": "Ich möchte meine wissenschaftlichen Arbeitsmethoden verbessern und eine Abschlussarbeit verfassen.",
|
||||
"label": "Bachelorarbeit (BA)"
|
||||
},
|
||||
{
|
||||
"question": "Ich möchte mehr über relationale Datenbanken und Datenmodellierung lernen.",
|
||||
"label": "Datenmanagement (DM)"
|
||||
},
|
||||
{
|
||||
"question": "Ich habe ein Interesse an der Geschichte der Informatik und den Grundlagen von Betriebssystemen.",
|
||||
"label": "Einführung in die Informatik (EI)"
|
||||
},
|
||||
{
|
||||
"question": "Ich möchte mein Wissen über maschinelles Lernen und Künstliche Intelligenz vertiefen.",
|
||||
"label": "Künstliche Intelligenz (KI)"
|
||||
},
|
||||
{
|
||||
"question": "Ich möchte mehr über Netzwerke und Datenübertragung erfahren. Welches Modul sollte ich wählen?",
|
||||
"label": "Kommunikation und Netze (KN)"
|
||||
},
|
||||
{
|
||||
"question": "Ich bin an Methoden zur Lösung mathematischer Probleme mit Hilfe von Computern interessiert.",
|
||||
"label": "Numerische Verfahren (NV)"
|
||||
},
|
||||
{
|
||||
"question": "Ich möchte mehr über die Entwicklung von Anwendungen für mobile Geräte lernen.",
|
||||
"label": "Mobile Anwendungen (MA)"
|
||||
},
|
||||
{
|
||||
"question": "Ich interessiere mich für IT-Sicherheit und Verschlüsselungstechnologien.",
|
||||
"label": "IT-Sicherheit (ITS)"
|
||||
},
|
||||
{
|
||||
"question": "Ich möchte mein Wissen über Projektmanagement und Agile Methoden erweitern.",
|
||||
"label": "Software-Projektmanagement (SPM)"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul lerne ich, wie man in kleinen Teams an der Realisierung eines Produktes arbeitet und alle Phasen von der Produktidee bis zur Einführung beim Kunden durchführt?",
|
||||
"label": "Software-Entwicklungsprojekt (SEP)"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul werde ich in der Lage sein, ein Software-Entwicklungsprojekt von ersten Anforderungen bis zur Produkteinführung aus der Sicht von Projektleiter und Entwickler zu beschreiben?",
|
||||
"label": "Softwareprojekt (SP)"
|
||||
},
|
||||
{
|
||||
"question": "Welches Modul befasst sich mit Schaltungstechnischen Grundlagen und der Struktur der CPU?",
|
||||
"label": "Technische Informatik 1 (TEI1)"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul kann ich mehr über hardwarenahe C/C++ Konstrukte und Prozessortypen lernen?",
|
||||
"label": "Technische Informatik 2 (TEI2)"
|
||||
},
|
||||
{
|
||||
"question": "Welches Modul bietet einen Einführungs- und Supervisionsworkshops, um aktuelle Themen nach den Wünschen/Erfordernissen der Teams zu besprechen?",
|
||||
"label": "Teamentwicklungs-Workshop (TEW)"
|
||||
},
|
||||
{
|
||||
"question": "Welches Modul behandelt die Grundlagen der Logik, formale Sprachen und die Automatentheorie?",
|
||||
"label": "Theoretische Informatik (THI)"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul unterstütze ich Übungen und Projekte durch das Vorstellen von Themen und aktive Betreuung der Studierenden?",
|
||||
"label": "Tutorium (TUT)"
|
||||
},
|
||||
{
|
||||
"question": "Welches Modul ermöglicht den Studierenden, sich außerhalb ihrer Lehrveranstaltungen mit sozialen oder anderen nicht informatischen Themen zu beschäftigen, die jedoch für den Informatik-Beruf relevant sind?",
|
||||
"label": "Überfachliche Kompetenzen (UK)"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul lerne ich die Grundlagen der Virtualisierung und den Umgang mit Serverbetriebssystemen?",
|
||||
"label": "Virtualisierung (VIR)"
|
||||
},
|
||||
{
|
||||
"question": "Welches Modul bietet eine Einführung in verteilte Architekturen und die Entfernte Methodenaufrufe?",
|
||||
"label": "Verteilte Systeme (VS)"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul kann ich mehr über das Client/Server-Modell und Web Frameworks lernen?",
|
||||
"label": "Webbasierte Systeme (WEB)"
|
||||
},
|
||||
{
|
||||
"question": "Welches Modul bietet eine Einführung in die Methoden des wissenschaftlichen Arbeitens und den Umgang mit Quellen und Literatur?",
|
||||
"label": "Wissenschaftliches Arbeiten (WIA)"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul lerne ich die Grundlagen der 3D-Modellierung und der Spieleentwicklung?",
|
||||
"label": "3D-Modellierung und Spieleentwicklung (3MS)"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul lerne ich die Grundlagen von Automaten und formalen Sprachen?",
|
||||
"label": "Automatentheorie und formale Sprachen (AFS)"
|
||||
},
|
||||
{
|
||||
"question": "Welches Modul beschäftigt sich mit agilen Methoden und Techniken der Softwareentwicklung?",
|
||||
"label": "Agile Softwareentwicklung (AGI)"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul kann ich mehr über die Virtualisierung von Anwendungen und die Nutzung von Docker lernen?",
|
||||
"label": "Anwendungscontainer und Docker (ACD)"
|
||||
},
|
||||
{
|
||||
"question": "Welches Modul bietet Informationen zu Komponenten, Services und Micro Services?",
|
||||
"label": "Anwendungscontainer und Docker (ACD)"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul lerne ich über die Struktur und Funktion einer CPU und RISC-Prozessoren?",
|
||||
"label": "Technische Informatik 1 (TEI1)"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul lerne ich über Algorithmen und Datenstrukturen?",
|
||||
"label": "Algorithmen und Datenstrukturen (ALD)"
|
||||
},
|
||||
{
|
||||
"question": "Welches Modul bietet Informationen über verteilte Systeme und Netzwerke?",
|
||||
"label": "Netzwerke und verteilte Systeme (NVS)"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul kann ich mehr über Softwarequalität und Testing lernen?",
|
||||
"label": "Softwarequalität und Testing (SQT)"
|
||||
},
|
||||
{
|
||||
"question": "Welches Modul bietet Informationen zu Künstlicher Intelligenz und Machine Learning?",
|
||||
"label": "Künstliche Intelligenz und Machine Learning (KIML)"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul lerne ich mehr über Datenbanken und SQL?",
|
||||
"label": "Datenbanken (DB)"
|
||||
},
|
||||
{
|
||||
"question": "Ich interessiere mich für moderne Rechnerarchitekturen und GPU-Programmierung.",
|
||||
"label": "Algorithmen für moderne Rechnerarchitekturen (ALR)"
|
||||
},
|
||||
{
|
||||
"question": "Ich möchte mehr über fortgeschrittene Webarchitekturen und Node.js lernen.",
|
||||
"label": "Angular und Node.js (ANO)"
|
||||
},
|
||||
{
|
||||
"question": "Ich interessiere mich für agile Methoden der Softwarentwicklung und Visual Analytics.",
|
||||
"label": "Angewandte Projektarbeit: Visualisierung (APV)"
|
||||
},
|
||||
{
|
||||
"question": "Ich möchte mich mit Big Data und Big-Data-Architekturen auseinandersetzen.",
|
||||
"label": "Big Data Engineering and Analysis (BDEA)"
|
||||
},
|
||||
{
|
||||
"question": "Ich bin fasziniert von Molekularbiologie und der Anwendung der Genomik in der personalisierten Medizin",
|
||||
"label": "Bioinformatik (BIM)"
|
||||
},
|
||||
{
|
||||
"question": "Ich möchte mehr über digitale Bildverarbeitung und Anwendungen von Deep Learning in diesem Bereich erfahren",
|
||||
"label": "Bildverarbeitung (BIV)"
|
||||
}
|
||||
]
|
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,197 @@
|
|||
[
|
||||
{
|
||||
"question": "Ich interessiere mich f\u00fcr Techniken und Methoden im Bereich Softwareentwicklung. Welches Modul sollte ich w\u00e4hlen?",
|
||||
"label": "Ausgew\u00e4hlte Probleme des Software Engineerings (APS)",
|
||||
"title": "Ausgew\u00e4hlte Probleme des Software Engineerings (APS)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich m\u00f6chte meine wissenschaftlichen Arbeitsmethoden verbessern und eine Abschlussarbeit verfassen. Welches Modul ist am besten f\u00fcr mich geeignet?",
|
||||
"label": "Bachelorarbeit (BA)",
|
||||
"title": "Teamentwicklungs-Workshop (TEW)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich m\u00f6chte mehr \u00fcber relationale Datenbanken und Datenmodellierung lernen. Welches Modul passt zu meinen Interessen?",
|
||||
"label": "Datenmanagement (DM)",
|
||||
"title": "Teamentwicklungs-Workshop (TEW)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich habe ein Interesse an der Geschichte der Informatik und den Grundlagen von Betriebssystemen. Welches Modul sollte ich w\u00e4hlen?",
|
||||
"label": "Einf\u00fchrung in die Informatik (EI)",
|
||||
"title": "\u00dcberfachliche Kompetenzen (UK)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich m\u00f6chte mein Wissen \u00fcber maschinelles Lernen und K\u00fcnstliche Intelligenz vertiefen. Welches Modul passt am besten zu mir?",
|
||||
"label": "K\u00fcnstliche Intelligenz (KI)",
|
||||
"title": "Teamentwicklungs-Workshop (TEW)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich m\u00f6chte mehr \u00fcber Netzwerke und Daten\u00fcbertragung erfahren. Welches Modul sollte ich w\u00e4hlen?",
|
||||
"label": "Kommunikation und Netze (KN)",
|
||||
"title": "Teamentwicklungs-Workshop (TEW)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich bin an Methoden zur L\u00f6sung mathematischer Probleme mit Hilfe von Computern interessiert. Welches Modul w\u00e4re das richtige f\u00fcr mich?",
|
||||
"label": "Numerische Verfahren (NV)",
|
||||
"title": "Mathematische Biologie (MBI)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich m\u00f6chte mehr \u00fcber die Entwicklung von Anwendungen f\u00fcr mobile Ger\u00e4te lernen. Welches Modul sollte ich w\u00e4hlen?",
|
||||
"label": "Mobile Anwendungen (MA)",
|
||||
"title": "Teamentwicklungs-Workshop (TEW)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich interessiere mich f\u00fcr IT-Sicherheit und Verschl\u00fcsselungstechnologien. Welches Modul w\u00fcrde mir helfen, mehr dar\u00fcber zu erfahren?",
|
||||
"label": "IT-Sicherheit (ITS)",
|
||||
"title": "Teamentwicklungs-Workshop (TEW)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich m\u00f6chte mein Wissen \u00fcber Projektmanagement und Agile Methoden erweitern. Welches Modul w\u00e4re am besten geeignet?",
|
||||
"label": "Software-Projektmanagement (SPM)",
|
||||
"title": "Teamentwicklungs-Workshop (TEW)\n"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul lerne ich, wie man in kleinen Teams an der Realisierung eines Produktes arbeitet und alle Phasen von der Produktidee bis zur Einf\u00fchrung beim Kunden durchf\u00fchrt?",
|
||||
"label": "Software-Entwicklungsprojekt (SEP)",
|
||||
"title": "Software-Entwicklungsprojekt (SEP)\n"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul werde ich in der Lage sein, ein Software-Entwicklungsprojekt von ersten Anforderungen bis zur Produkteinf\u00fchrung aus der Sicht von Projektleiter und Entwickler zu beschreiben?",
|
||||
"label": "Softwareprojekt (SP)",
|
||||
"title": "Softwareprojekt (SP)\n"
|
||||
},
|
||||
{
|
||||
"question": "Welches Modul befasst sich mit Schaltungstechnischen Grundlagen und der Struktur der CPU?",
|
||||
"label": "Technische Informatik 1 (TEI1)",
|
||||
"title": "Robotik (ROB)\n"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul kann ich mehr \u00fcber hardwarenahe C/C++ Konstrukte und Prozessortypen lernen?",
|
||||
"label": "Technische Informatik 2 (TEI2)",
|
||||
"title": "Technische Informatik 2 (TEI2)\n"
|
||||
},
|
||||
{
|
||||
"question": "Welches Modul bietet einen Einf\u00fchrungs- und Supervisionsworkshops, um aktuelle Themen nach den W\u00fcnschen/Erfordernissen der Teams zu besprechen?",
|
||||
"label": "Teamentwicklungs-Workshop (TEW)",
|
||||
"title": "Tutorium (TUT)\n"
|
||||
},
|
||||
{
|
||||
"question": "Welches Modul behandelt die Grundlagen der Logik, formale Sprachen und die Automatentheorie?",
|
||||
"label": "Theoretische Informatik (THI)",
|
||||
"title": "Mathematische Biologie (MBI)\n"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul unterst\u00fctze ich \u00dcbungen und Projekte durch das Vorstellen von Themen und aktive Betreuung der Studierenden?",
|
||||
"label": "Tutorium (TUT)",
|
||||
"title": "Teamentwicklungs-Workshop (TEW)\n"
|
||||
},
|
||||
{
|
||||
"question": "Welches Modul erm\u00f6glicht den Studierenden, sich au\u00dferhalb ihrer Lehrveranstaltungen mit sozialen oder anderen nicht informatischen Themen zu besch\u00e4ftigen, die jedoch f\u00fcr den Informatik-Beruf relevant sind?",
|
||||
"label": "\u00dcberfachliche Kompetenzen (UK)",
|
||||
"title": "Interaction Design (IAD)\n"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul lerne ich die Grundlagen der Virtualisierung und den Umgang mit Serverbetriebssystemen?",
|
||||
"label": "Virtualisierung (VIR)",
|
||||
"title": "Campusmanagement als Anwendungskontext f\u00fcr Webanwendungen (CAW)\n"
|
||||
},
|
||||
{
|
||||
"question": "Welches Modul bietet eine Einf\u00fchrung in verteilte Architekturen und die Entfernte Methodenaufrufe?",
|
||||
"label": "Verteilte Systeme (VS)",
|
||||
"title": "Campusmanagement als Anwendungskontext f\u00fcr Webanwendungen (CAW)\n"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul kann ich mehr \u00fcber das Client/Server-Modell und Web Frameworks lernen?",
|
||||
"label": "Webbasierte Systeme (WEB)",
|
||||
"title": "Teamentwicklungs-Workshop (TEW)\n"
|
||||
},
|
||||
{
|
||||
"question": "Welches Modul bietet eine Einf\u00fchrung in die Methoden des wissenschaftlichen Arbeitens und den Umgang mit Quellen und Literatur?",
|
||||
"label": "Wissenschaftliches Arbeiten (WIA)",
|
||||
"title": "Bachelorarbeit (BA)\n"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul lerne ich die Grundlagen der 3D-Modellierung und der Spieleentwicklung?",
|
||||
"label": "3D-Modellierung und Spieleentwicklung (3MS)",
|
||||
"title": "Game Engineering (GAE)\n"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul lerne ich die Grundlagen von Automaten und formalen Sprachen?",
|
||||
"label": "Automatentheorie und formale Sprachen (AFS)",
|
||||
"title": "Ausgew\u00e4hlte Probleme des Software Engineerings (APS)\n"
|
||||
},
|
||||
{
|
||||
"question": "Welches Modul besch\u00e4ftigt sich mit agilen Methoden und Techniken der Softwareentwicklung?",
|
||||
"label": "Agile Softwareentwicklung (AGI)",
|
||||
"title": "Ausgew\u00e4hlte Probleme des Software Engineerings (APS)\n"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul kann ich mehr \u00fcber die Virtualisierung von Anwendungen und die Nutzung von Docker lernen?",
|
||||
"label": "Anwendungscontainer und Docker (ACD)",
|
||||
"title": "Campusmanagement als Anwendungskontext f\u00fcr Webanwendungen (CAW)\n"
|
||||
},
|
||||
{
|
||||
"question": "Welches Modul bietet Informationen zu Komponenten, Services und Micro Services?",
|
||||
"label": "Anwendungscontainer und Docker (ACD)",
|
||||
"title": "Software Engineering 1 (SE1)\n"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul lerne ich \u00fcber die Struktur und Funktion einer CPU und RISC-Prozessoren?",
|
||||
"label": "Technische Informatik 1 (TEI1)",
|
||||
"title": "Teamentwicklungs-Workshop (TEW)\n"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul lerne ich \u00fcber Algorithmen und Datenstrukturen?",
|
||||
"label": "Algorithmen und Datenstrukturen (ALD)",
|
||||
"title": "Teamentwicklungs-Workshop (TEW)\n"
|
||||
},
|
||||
{
|
||||
"question": "Welches Modul bietet Informationen \u00fcber verteilte Systeme und Netzwerke?",
|
||||
"label": "Netzwerke und verteilte Systeme (NVS)",
|
||||
"title": "Software Engineering 1 (SE1)\n"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul kann ich mehr \u00fcber Softwarequalit\u00e4t und Testing lernen?",
|
||||
"label": "Softwarequalit\u00e4t und Testing (SQT)",
|
||||
"title": "Teamentwicklungs-Workshop (TEW)\n"
|
||||
},
|
||||
{
|
||||
"question": "Welches Modul bietet Informationen zu K\u00fcnstlicher Intelligenz und Machine Learning?",
|
||||
"label": "K\u00fcnstliche Intelligenz und Machine Learning (KIML)",
|
||||
"title": "Tutorium (TUT)\n"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul lerne ich mehr \u00fcber Datenbanken und SQL?",
|
||||
"label": "Datenbanken (DB)",
|
||||
"title": "Teamentwicklungs-Workshop (TEW)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich interessiere mich f\u00fcr moderne Rechnerarchitekturen und GPU-Programmierung. Welches Modul w\u00e4re passend f\u00fcr mich?",
|
||||
"label": "Algorithmen f\u00fcr moderne Rechnerarchitekturen (ALR)",
|
||||
"title": "Projekte in der Informatik (PI-IB)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich m\u00f6chte mehr \u00fcber fortgeschrittene Webarchitekturen und Node.js lernen. Gibt es ein Modul dazu?",
|
||||
"label": "Angular und Node.js (ANO)",
|
||||
"title": "Teamentwicklungs-Workshop (TEW)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich interessiere mich f\u00fcr agile Methoden der Softwarentwicklung und Visual Analytics. Welches Modul passt dazu?",
|
||||
"label": "Angewandte Projektarbeit: Visualisierung (APV)",
|
||||
"title": "Ausgew\u00e4hlte Probleme des Software Engineerings (APS)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich m\u00f6chte mich mit Big Data und Big-Data-Architekturen auseinandersetzen. Welches Modul sollte ich w\u00e4hlen?",
|
||||
"label": "Big Data Engineering and Analysis (BDEA)",
|
||||
"title": "Teamentwicklungs-Workshop (TEW)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich bin fasziniert von Molekularbiologie und der Anwendung der Genomik in der personalisierten Medizin. Welches Modul w\u00e4re geeignet f\u00fcr mich?",
|
||||
"label": "Bioinformatik (BIM)",
|
||||
"title": "Mathematische Biologie (MBI)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich m\u00f6chte mehr \u00fcber digitale Bildverarbeitung und Anwendungen von Deep Learning in diesem Bereich erfahren. Gibt es ein passendes Modul?",
|
||||
"label": "Bildverarbeitung (BIV)",
|
||||
"title": "Teamentwicklungs-Workshop (TEW)\n"
|
||||
}
|
||||
]
|
|
@ -0,0 +1,197 @@
|
|||
[
|
||||
{
|
||||
"question": "Ich interessiere mich f\u00fcr Techniken und Methoden im Bereich Softwareentwicklung. Welches Modul sollte ich w\u00e4hlen?",
|
||||
"label": "Ausgew\u00e4hlte Probleme des Software Engineerings (APS)",
|
||||
"title": "Projekte in der Informatik (PI-IB)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich m\u00f6chte meine wissenschaftlichen Arbeitsmethoden verbessern und eine Abschlussarbeit verfassen. Welches Modul ist am besten f\u00fcr mich geeignet?",
|
||||
"label": "Bachelorarbeit (BA)",
|
||||
"title": "Cybersicherheit in der Prozessindustrie (CPI)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich m\u00f6chte mehr \u00fcber relationale Datenbanken und Datenmodellierung lernen. Welches Modul passt zu meinen Interessen?",
|
||||
"label": "Datenmanagement (DM)",
|
||||
"title": "Cybersicherheit in der Prozessindustrie (CPI)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich habe ein Interesse an der Geschichte der Informatik und den Grundlagen von Betriebssystemen. Welches Modul sollte ich w\u00e4hlen?",
|
||||
"label": "Einf\u00fchrung in die Informatik (EI)",
|
||||
"title": "Projekte in der Informatik (PI-IB)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich m\u00f6chte mein Wissen \u00fcber maschinelles Lernen und K\u00fcnstliche Intelligenz vertiefen. Welches Modul passt am besten zu mir?",
|
||||
"label": "K\u00fcnstliche Intelligenz (KI)",
|
||||
"title": "Automatentheorie und formale Sprachen (AFS)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich m\u00f6chte mehr \u00fcber Netzwerke und Daten\u00fcbertragung erfahren. Welches Modul sollte ich w\u00e4hlen?",
|
||||
"label": "Kommunikation und Netze (KN)",
|
||||
"title": "Game Engineering (GAE)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich bin an Methoden zur L\u00f6sung mathematischer Probleme mit Hilfe von Computern interessiert. Welches Modul w\u00e4re das richtige f\u00fcr mich?",
|
||||
"label": "Numerische Verfahren (NV)",
|
||||
"title": "Projekte in der Informatik (PI-IB)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich m\u00f6chte mehr \u00fcber die Entwicklung von Anwendungen f\u00fcr mobile Ger\u00e4te lernen. Welches Modul sollte ich w\u00e4hlen?",
|
||||
"label": "Mobile Anwendungen (MA)",
|
||||
"title": "Automatentheorie und formale Sprachen (AFS)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich interessiere mich f\u00fcr IT-Sicherheit und Verschl\u00fcsselungstechnologien. Welches Modul w\u00fcrde mir helfen, mehr dar\u00fcber zu erfahren?",
|
||||
"label": "IT-Sicherheit (ITS)",
|
||||
"title": "Game Engineering (GAE)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich m\u00f6chte mein Wissen \u00fcber Projektmanagement und Agile Methoden erweitern. Welches Modul w\u00e4re am besten geeignet?",
|
||||
"label": "Software-Projektmanagement (SPM)",
|
||||
"title": "Automatentheorie und formale Sprachen (AFS)\n"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul lerne ich, wie man in kleinen Teams an der Realisierung eines Produktes arbeitet und alle Phasen von der Produktidee bis zur Einf\u00fchrung beim Kunden durchf\u00fchrt?",
|
||||
"label": "Software-Entwicklungsprojekt (SEP)",
|
||||
"title": "Projekte in der Informatik (PI-IB)\n"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul werde ich in der Lage sein, ein Software-Entwicklungsprojekt von ersten Anforderungen bis zur Produkteinf\u00fchrung aus der Sicht von Projektleiter und Entwickler zu beschreiben?",
|
||||
"label": "Softwareprojekt (SP)",
|
||||
"title": "Softwareprojekt (SP)\n"
|
||||
},
|
||||
{
|
||||
"question": "Welches Modul befasst sich mit Schaltungstechnischen Grundlagen und der Struktur der CPU?",
|
||||
"label": "Technische Informatik 1 (TEI1)",
|
||||
"title": "Projekte in der Informatik (PI-IB)\n"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul kann ich mehr \u00fcber hardwarenahe C/C++ Konstrukte und Prozessortypen lernen?",
|
||||
"label": "Technische Informatik 2 (TEI2)",
|
||||
"title": "Game Engineering (GAE)\n"
|
||||
},
|
||||
{
|
||||
"question": "Welches Modul bietet einen Einf\u00fchrungs- und Supervisionsworkshops, um aktuelle Themen nach den W\u00fcnschen/Erfordernissen der Teams zu besprechen?",
|
||||
"label": "Teamentwicklungs-Workshop (TEW)",
|
||||
"title": "Teamentwicklungs-Workshop (TEW)\n"
|
||||
},
|
||||
{
|
||||
"question": "Welches Modul behandelt die Grundlagen der Logik, formale Sprachen und die Automatentheorie?",
|
||||
"label": "Theoretische Informatik (THI)",
|
||||
"title": "Automatentheorie und formale Sprachen (AFS)\n"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul unterst\u00fctze ich \u00dcbungen und Projekte durch das Vorstellen von Themen und aktive Betreuung der Studierenden?",
|
||||
"label": "Tutorium (TUT)",
|
||||
"title": "Projekte in der Informatik (PI-IB)\n"
|
||||
},
|
||||
{
|
||||
"question": "Welches Modul erm\u00f6glicht den Studierenden, sich au\u00dferhalb ihrer Lehrveranstaltungen mit sozialen oder anderen nicht informatischen Themen zu besch\u00e4ftigen, die jedoch f\u00fcr den Informatik-Beruf relevant sind?",
|
||||
"label": "\u00dcberfachliche Kompetenzen (UK)",
|
||||
"title": "\u00dcberfachliche Kompetenzen (UK)\n"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul lerne ich die Grundlagen der Virtualisierung und den Umgang mit Serverbetriebssystemen?",
|
||||
"label": "Virtualisierung (VIR)",
|
||||
"title": "Cybersicherheit in der Prozessindustrie (CPI)\n"
|
||||
},
|
||||
{
|
||||
"question": "Welches Modul bietet eine Einf\u00fchrung in verteilte Architekturen und die Entfernte Methodenaufrufe?",
|
||||
"label": "Verteilte Systeme (VS)",
|
||||
"title": "Cybersicherheit in der Prozessindustrie (CPI)\n"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul kann ich mehr \u00fcber das Client/Server-Modell und Web Frameworks lernen?",
|
||||
"label": "Webbasierte Systeme (WEB)",
|
||||
"title": "Teamentwicklungs-Workshop (TEW)\n"
|
||||
},
|
||||
{
|
||||
"question": "Welches Modul bietet eine Einf\u00fchrung in die Methoden des wissenschaftlichen Arbeitens und den Umgang mit Quellen und Literatur?",
|
||||
"label": "Wissenschaftliches Arbeiten (WIA)",
|
||||
"title": "Projekte in der Informatik (PI-IB)\n"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul lerne ich die Grundlagen der 3D-Modellierung und der Spieleentwicklung?",
|
||||
"label": "3D-Modellierung und Spieleentwicklung (3MS)",
|
||||
"title": "Teamentwicklungs-Workshop (TEW)\n"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul lerne ich die Grundlagen von Automaten und formalen Sprachen?",
|
||||
"label": "Automatentheorie und formale Sprachen (AFS)",
|
||||
"title": "Automatentheorie und formale Sprachen (AFS)\n"
|
||||
},
|
||||
{
|
||||
"question": "Welches Modul besch\u00e4ftigt sich mit agilen Methoden und Techniken der Softwareentwicklung?",
|
||||
"label": "Agile Softwareentwicklung (AGI)",
|
||||
"title": "Projekte in der Informatik (PI-IB)\n"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul kann ich mehr \u00fcber die Virtualisierung von Anwendungen und die Nutzung von Docker lernen?",
|
||||
"label": "Anwendungscontainer und Docker (ACD)",
|
||||
"title": "Projekte in der Informatik (PI-IB)\n"
|
||||
},
|
||||
{
|
||||
"question": "Welches Modul bietet Informationen zu Komponenten, Services und Micro Services?",
|
||||
"label": "Anwendungscontainer und Docker (ACD)",
|
||||
"title": "Projekte in der Informatik (PI-IB)\n"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul lerne ich \u00fcber die Struktur und Funktion einer CPU und RISC-Prozessoren?",
|
||||
"label": "Technische Informatik 1 (TEI1)",
|
||||
"title": "Campusmanagement als Anwendungskontext f\u00fcr Webanwendungen (CAW)\n"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul lerne ich \u00fcber Algorithmen und Datenstrukturen?",
|
||||
"label": "Algorithmen und Datenstrukturen (ALD)",
|
||||
"title": "Projekte in der Informatik (PI-IB)\n"
|
||||
},
|
||||
{
|
||||
"question": "Welches Modul bietet Informationen \u00fcber verteilte Systeme und Netzwerke?",
|
||||
"label": "Netzwerke und verteilte Systeme (NVS)",
|
||||
"title": "Projekte in der Informatik (PI-IB)\n"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul kann ich mehr \u00fcber Softwarequalit\u00e4t und Testing lernen?",
|
||||
"label": "Softwarequalit\u00e4t und Testing (SQT)",
|
||||
"title": "Projekte in der Informatik (PI-IB)\n"
|
||||
},
|
||||
{
|
||||
"question": "Welches Modul bietet Informationen zu K\u00fcnstlicher Intelligenz und Machine Learning?",
|
||||
"label": "K\u00fcnstliche Intelligenz und Machine Learning (KIML)",
|
||||
"title": "Projekte in der Informatik (PI-IB)\n"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul lerne ich mehr \u00fcber Datenbanken und SQL?",
|
||||
"label": "Datenbanken (DB)",
|
||||
"title": "Projekte in der Informatik (PI-IB)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich interessiere mich f\u00fcr moderne Rechnerarchitekturen und GPU-Programmierung. Welches Modul w\u00e4re passend f\u00fcr mich?",
|
||||
"label": "Algorithmen f\u00fcr moderne Rechnerarchitekturen (ALR)",
|
||||
"title": "Projekte in der Informatik (PI-IB)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich m\u00f6chte mehr \u00fcber fortgeschrittene Webarchitekturen und Node.js lernen. Gibt es ein Modul dazu?",
|
||||
"label": "Angular und Node.js (ANO)",
|
||||
"title": "Game Engineering (GAE)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich interessiere mich f\u00fcr agile Methoden der Softwarentwicklung und Visual Analytics. Welches Modul passt dazu?",
|
||||
"label": "Angewandte Projektarbeit: Visualisierung (APV)",
|
||||
"title": "Projekte in der Informatik (PI-IB)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich m\u00f6chte mich mit Big Data und Big-Data-Architekturen auseinandersetzen. Welches Modul sollte ich w\u00e4hlen?",
|
||||
"label": "Big Data Engineering and Analysis (BDEA)",
|
||||
"title": "Teamentwicklungs-Workshop (TEW)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich bin fasziniert von Molekularbiologie und der Anwendung der Genomik in der personalisierten Medizin. Welches Modul w\u00e4re geeignet f\u00fcr mich?",
|
||||
"label": "Bioinformatik (BIM)",
|
||||
"title": "Cybersicherheit in der Prozessindustrie (CPI)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich m\u00f6chte mehr \u00fcber digitale Bildverarbeitung und Anwendungen von Deep Learning in diesem Bereich erfahren. Gibt es ein passendes Modul?",
|
||||
"label": "Bildverarbeitung (BIV)",
|
||||
"title": "Automatentheorie und formale Sprachen (AFS)\n"
|
||||
}
|
||||
]
|
|
@ -0,0 +1,197 @@
|
|||
[
|
||||
{
|
||||
"question": "Ich interessiere mich f\u00fcr Techniken und Methoden im Bereich Softwareentwicklung. Welches Modul sollte ich w\u00e4hlen?",
|
||||
"label": "Ausgew\u00e4hlte Probleme des Software Engineerings (APS)",
|
||||
"title": "Software Engineering 1 (SE1)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich m\u00f6chte meine wissenschaftlichen Arbeitsmethoden verbessern und eine Abschlussarbeit verfassen. Welches Modul ist am besten f\u00fcr mich geeignet?",
|
||||
"label": "Bachelorarbeit (BA)",
|
||||
"title": "Kolloquium zum Praktischen Studiensemester (KPS)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich m\u00f6chte mehr \u00fcber relationale Datenbanken und Datenmodellierung lernen. Welches Modul passt zu meinen Interessen?",
|
||||
"label": "Datenmanagement (DM)",
|
||||
"title": "Software Engineering 1 (SE1)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich habe ein Interesse an der Geschichte der Informatik und den Grundlagen von Betriebssystemen. Welches Modul sollte ich w\u00e4hlen?",
|
||||
"label": "Einf\u00fchrung in die Informatik (EI)",
|
||||
"title": "Kolloquium zum Praktischen Studiensemester (KPS)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich m\u00f6chte mein Wissen \u00fcber maschinelles Lernen und K\u00fcnstliche Intelligenz vertiefen. Welches Modul passt am besten zu mir?",
|
||||
"label": "K\u00fcnstliche Intelligenz (KI)",
|
||||
"title": "Praktisches Studiensemester (PSS)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich m\u00f6chte mehr \u00fcber Netzwerke und Daten\u00fcbertragung erfahren. Welches Modul sollte ich w\u00e4hlen?",
|
||||
"label": "Kommunikation und Netze (KN)",
|
||||
"title": "Grundlagen Neuronaler Netze (GNN)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich bin an Methoden zur L\u00f6sung mathematischer Probleme mit Hilfe von Computern interessiert. Welches Modul w\u00e4re das richtige f\u00fcr mich?",
|
||||
"label": "Numerische Verfahren (NV)",
|
||||
"title": "Praktisches Studiensemester (PSS)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich m\u00f6chte mehr \u00fcber die Entwicklung von Anwendungen f\u00fcr mobile Ger\u00e4te lernen. Welches Modul sollte ich w\u00e4hlen?",
|
||||
"label": "Mobile Anwendungen (MA)",
|
||||
"title": "Kolloquium zum Praktischen Studiensemester (KPS)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich interessiere mich f\u00fcr IT-Sicherheit und Verschl\u00fcsselungstechnologien. Welches Modul w\u00fcrde mir helfen, mehr dar\u00fcber zu erfahren?",
|
||||
"label": "IT-Sicherheit (ITS)",
|
||||
"title": "Praktisches Studiensemester (PSS)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich m\u00f6chte mein Wissen \u00fcber Projektmanagement und Agile Methoden erweitern. Welches Modul w\u00e4re am besten geeignet?",
|
||||
"label": "Software-Projektmanagement (SPM)",
|
||||
"title": "Praktisches Studiensemester (PSS)\n"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul lerne ich, wie man in kleinen Teams an der Realisierung eines Produktes arbeitet und alle Phasen von der Produktidee bis zur Einf\u00fchrung beim Kunden durchf\u00fchrt?",
|
||||
"label": "Software-Entwicklungsprojekt (SEP)",
|
||||
"title": "Praktisches Studiensemester (PSS)\n"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul werde ich in der Lage sein, ein Software-Entwicklungsprojekt von ersten Anforderungen bis zur Produkteinf\u00fchrung aus der Sicht von Projektleiter und Entwickler zu beschreiben?",
|
||||
"label": "Softwareprojekt (SP)",
|
||||
"title": "Software Engineering 1 (SE1)\n"
|
||||
},
|
||||
{
|
||||
"question": "Welches Modul befasst sich mit Schaltungstechnischen Grundlagen und der Struktur der CPU?",
|
||||
"label": "Technische Informatik 1 (TEI1)",
|
||||
"title": "Grundlagen Neuronaler Netze (GNN)\n"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul kann ich mehr \u00fcber hardwarenahe C/C++ Konstrukte und Prozessortypen lernen?",
|
||||
"label": "Technische Informatik 2 (TEI2)",
|
||||
"title": "Software Engineering 1 (SE1)\n"
|
||||
},
|
||||
{
|
||||
"question": "Welches Modul bietet einen Einf\u00fchrungs- und Supervisionsworkshops, um aktuelle Themen nach den W\u00fcnschen/Erfordernissen der Teams zu besprechen?",
|
||||
"label": "Teamentwicklungs-Workshop (TEW)",
|
||||
"title": "Tutorium (TUT)\n"
|
||||
},
|
||||
{
|
||||
"question": "Welches Modul behandelt die Grundlagen der Logik, formale Sprachen und die Automatentheorie?",
|
||||
"label": "Theoretische Informatik (THI)",
|
||||
"title": "Grundlagen Neuronaler Netze (GNN)\n"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul unterst\u00fctze ich \u00dcbungen und Projekte durch das Vorstellen von Themen und aktive Betreuung der Studierenden?",
|
||||
"label": "Tutorium (TUT)",
|
||||
"title": "Praktisches Studiensemester (PSS)\n"
|
||||
},
|
||||
{
|
||||
"question": "Welches Modul erm\u00f6glicht den Studierenden, sich au\u00dferhalb ihrer Lehrveranstaltungen mit sozialen oder anderen nicht informatischen Themen zu besch\u00e4ftigen, die jedoch f\u00fcr den Informatik-Beruf relevant sind?",
|
||||
"label": "\u00dcberfachliche Kompetenzen (UK)",
|
||||
"title": "Praktisches Studiensemester (PSS)\n"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul lerne ich die Grundlagen der Virtualisierung und den Umgang mit Serverbetriebssystemen?",
|
||||
"label": "Virtualisierung (VIR)",
|
||||
"title": "Software Engineering 1 (SE1)\n"
|
||||
},
|
||||
{
|
||||
"question": "Welches Modul bietet eine Einf\u00fchrung in verteilte Architekturen und die Entfernte Methodenaufrufe?",
|
||||
"label": "Verteilte Systeme (VS)",
|
||||
"title": "Software Engineering 1 (SE1)\n"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul kann ich mehr \u00fcber das Client/Server-Modell und Web Frameworks lernen?",
|
||||
"label": "Webbasierte Systeme (WEB)",
|
||||
"title": "Software Engineering 1 (SE1)\n"
|
||||
},
|
||||
{
|
||||
"question": "Welches Modul bietet eine Einf\u00fchrung in die Methoden des wissenschaftlichen Arbeitens und den Umgang mit Quellen und Literatur?",
|
||||
"label": "Wissenschaftliches Arbeiten (WIA)",
|
||||
"title": "Praktisches Studiensemester (PSS)\n"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul lerne ich die Grundlagen der 3D-Modellierung und der Spieleentwicklung?",
|
||||
"label": "3D-Modellierung und Spieleentwicklung (3MS)",
|
||||
"title": "Software Engineering 1 (SE1)\n"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul lerne ich die Grundlagen von Automaten und formalen Sprachen?",
|
||||
"label": "Automatentheorie und formale Sprachen (AFS)",
|
||||
"title": "Software Engineering 1 (SE1)\n"
|
||||
},
|
||||
{
|
||||
"question": "Welches Modul besch\u00e4ftigt sich mit agilen Methoden und Techniken der Softwareentwicklung?",
|
||||
"label": "Agile Softwareentwicklung (AGI)",
|
||||
"title": "Software Engineering 1 (SE1)\n"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul kann ich mehr \u00fcber die Virtualisierung von Anwendungen und die Nutzung von Docker lernen?",
|
||||
"label": "Anwendungscontainer und Docker (ACD)",
|
||||
"title": "Software Engineering 1 (SE1)\n"
|
||||
},
|
||||
{
|
||||
"question": "Welches Modul bietet Informationen zu Komponenten, Services und Micro Services?",
|
||||
"label": "Anwendungscontainer und Docker (ACD)",
|
||||
"title": "Grundlagen Neuronaler Netze (GNN)\n"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul lerne ich \u00fcber die Struktur und Funktion einer CPU und RISC-Prozessoren?",
|
||||
"label": "Technische Informatik 1 (TEI1)",
|
||||
"title": "Software Engineering 1 (SE1)\n"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul lerne ich \u00fcber Algorithmen und Datenstrukturen?",
|
||||
"label": "Algorithmen und Datenstrukturen (ALD)",
|
||||
"title": "Grundlagen Neuronaler Netze (GNN)\n"
|
||||
},
|
||||
{
|
||||
"question": "Welches Modul bietet Informationen \u00fcber verteilte Systeme und Netzwerke?",
|
||||
"label": "Netzwerke und verteilte Systeme (NVS)",
|
||||
"title": "Grundlagen Neuronaler Netze (GNN)\n"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul kann ich mehr \u00fcber Softwarequalit\u00e4t und Testing lernen?",
|
||||
"label": "Softwarequalit\u00e4t und Testing (SQT)",
|
||||
"title": "Software Engineering 1 (SE1)\n"
|
||||
},
|
||||
{
|
||||
"question": "Welches Modul bietet Informationen zu K\u00fcnstlicher Intelligenz und Machine Learning?",
|
||||
"label": "K\u00fcnstliche Intelligenz und Machine Learning (KIML)",
|
||||
"title": "Grundlagen Neuronaler Netze (GNN)\n"
|
||||
},
|
||||
{
|
||||
"question": "In welchem Modul lerne ich mehr \u00fcber Datenbanken und SQL?",
|
||||
"label": "Datenbanken (DB)",
|
||||
"title": "Software Engineering 1 (SE1)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich interessiere mich f\u00fcr moderne Rechnerarchitekturen und GPU-Programmierung. Welches Modul w\u00e4re passend f\u00fcr mich?",
|
||||
"label": "Algorithmen f\u00fcr moderne Rechnerarchitekturen (ALR)",
|
||||
"title": "Grundlagen Neuronaler Netze (GNN)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich m\u00f6chte mehr \u00fcber fortgeschrittene Webarchitekturen und Node.js lernen. Gibt es ein Modul dazu?",
|
||||
"label": "Angular und Node.js (ANO)",
|
||||
"title": "Software Engineering 1 (SE1)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich interessiere mich f\u00fcr agile Methoden der Softwarentwicklung und Visual Analytics. Welches Modul passt dazu?",
|
||||
"label": "Angewandte Projektarbeit: Visualisierung (APV)",
|
||||
"title": "Software Engineering 1 (SE1)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich m\u00f6chte mich mit Big Data und Big-Data-Architekturen auseinandersetzen. Welches Modul sollte ich w\u00e4hlen?",
|
||||
"label": "Big Data Engineering and Analysis (BDEA)",
|
||||
"title": "Kolloquium zum Praktischen Studiensemester (KPS)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich bin fasziniert von Molekularbiologie und der Anwendung der Genomik in der personalisierten Medizin. Welches Modul w\u00e4re geeignet f\u00fcr mich?",
|
||||
"label": "Bioinformatik (BIM)",
|
||||
"title": "Kolloquium zum Praktischen Studiensemester (KPS)\n"
|
||||
},
|
||||
{
|
||||
"question": "Ich m\u00f6chte mehr \u00fcber digitale Bildverarbeitung und Anwendungen von Deep Learning in diesem Bereich erfahren. Gibt es ein passendes Modul?",
|
||||
"label": "Bildverarbeitung (BIV)",
|
||||
"title": "Kolloquium zum Praktischen Studiensemester (KPS)\n"
|
||||
}
|
||||
]
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,116 @@
|
|||
import json
|
||||
from reranker import ReRanker
|
||||
from reader import Reader
|
||||
from retriever.retriever import Retriever
|
||||
from retriever.retriever_pipeline import CustomPipeline
|
||||
from helper.openai import openai_expert_search
|
||||
from haystack.nodes import FARMReader
|
||||
|
||||
AUTHOR_MAPPING = {
|
||||
"Wolf": "Prof. Dr. Ivo Wolf",
|
||||
"Hummel": "Prof. Dr. Oliver Hummel",
|
||||
"Fimmel": "Prof. Dr. Elena Fimmel",
|
||||
"Eckert": "Prof. Dr. rer. nat. Kai Eckert",
|
||||
"Fischer": "Prof. Dr. Jörn Fischer",
|
||||
"Gröschel": "Prof. Dr. Michael Gröschel",
|
||||
"Gumbel": "Prof. Dr. Markus Gumbel",
|
||||
"Nagel": "Prof. Dr. Till Nagel",
|
||||
"Specht": "Prof. Dr. Thomas Specht",
|
||||
"Steinberger": "Prof. Dr. Jessica Steinberger",
|
||||
"Dietrich": "Prof. Dr. Gabriele Roth-Dietrich",
|
||||
"Dopatka": "Prof. Dr. rer. nat. Frank Dopatka",
|
||||
"Kraus": "Prof. Dr. Stefan Kraus",
|
||||
"Leuchter": "Prof. Dr.-Ing. Sandro Leuchter",
|
||||
"Paulus": "Prof. Dr. Sachar Paulus",
|
||||
}
|
||||
|
||||
class ExpertSearch:
|
||||
def __init__(
|
||||
self,
|
||||
pipeline: CustomPipeline,
|
||||
retriever: Retriever,
|
||||
reader: Reader,
|
||||
reRanker: ReRanker,
|
||||
farm_reader: FARMReader,
|
||||
) -> None:
|
||||
"""
|
||||
Initializes the ExpertSearch class with components for searching, retrieving, reranking, and reading documents.
|
||||
|
||||
Args:
|
||||
pipeline (CustomPipeline): A pipeline for document retrieval and processing.
|
||||
retriever (Retriever): A component for retrieving documents based on queries.
|
||||
reader (Reader): A component for interpreting and processing documents.
|
||||
reRanker (ReRanker): A component for reranking documents based on relevance.
|
||||
farm_reader (FARMReader): A FARM-based reader for additional document processing.
|
||||
"""
|
||||
self.pipeline = pipeline
|
||||
self.retriever = retriever
|
||||
self.reader = reader
|
||||
self.reRanker = reRanker
|
||||
self.farm_reader = farm_reader
|
||||
|
||||
def search_experts(
|
||||
self,
|
||||
query: str,
|
||||
search_method="classic_retriever_reader",
|
||||
retrieval_method="mpnet",
|
||||
rerank_documents=True,
|
||||
generate_anwser=False,
|
||||
):
|
||||
"""
|
||||
Performs an expert search based on a given query and specified method.
|
||||
|
||||
Args:
|
||||
query (str): The search query.
|
||||
search_method (str, optional): The method of search (e.g., 'classic_retriever_reader', 'sort_llm'). Defaults to "classic_retriever_reader".
|
||||
retrieval_method (str, optional): The retrieval method to be used. Defaults to "mpnet".
|
||||
rerank_documents (bool, optional): Flag to rerank documents post-retrieval. Defaults to True.
|
||||
generate_anwser (bool, optional): Flag to generate answers using a reader. Defaults to False.
|
||||
|
||||
Returns:
|
||||
Varies: Returns different types of outputs based on the search method chosen.
|
||||
"""
|
||||
|
||||
|
||||
if search_method == "sort_llm":
|
||||
result = self.pipeline.doc_store.get_all_documents(index="paper")
|
||||
prompt = f"""<s>[INST] <<SYS>>s
|
||||
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
|
||||
|
||||
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
|
||||
<</SYS>>
|
||||
|
||||
Your task is to sort the list of papers:\n{json.dumps([{"title":doc.meta["title"], "id": doc.id, "author":doc.meta["author"]} for doc in result])} \n\n according to their relevance to the request: '{query}'. Your answer should only contain a python list of the top five paper ids and nothing more. [/INST] \n\n Top five:"""
|
||||
payload = json.dumps({"question": query, "model": "HF", "prompt": prompt})
|
||||
|
||||
return [
|
||||
{"title": doc.meta["title"], "id": doc.id, "author": doc.meta["author"]}
|
||||
for doc in result
|
||||
]
|
||||
|
||||
top_k_documents = self.retriever.get_top_k_passages(
|
||||
query=query, index="paper", method=retrieval_method
|
||||
)["documents"]
|
||||
final_references = top_k_documents
|
||||
if rerank_documents:
|
||||
reranked_top_k = self.reRanker.rerank_documents_with_gpt35(
|
||||
documents=top_k_documents, query=query
|
||||
)
|
||||
final_references = self.reRanker.get_final_references(
|
||||
reranked_documents=reranked_top_k,
|
||||
retrieved_documents=top_k_documents,
|
||||
)
|
||||
if search_method == "classic_retriever_reader":
|
||||
if generate_anwser:
|
||||
return self.reader.get_gpt_expert_search_answer(
|
||||
prompt=openai_expert_search,
|
||||
top_k_passages=final_references,
|
||||
query=query,
|
||||
)
|
||||
else:
|
||||
return final_references
|
||||
for doc in final_references:
|
||||
current_author = doc.meta.get("author")
|
||||
new_author = AUTHOR_MAPPING.get(current_author, "Unbekannt")
|
||||
doc.meta["author"] = new_author or current_author
|
||||
return final_references
|
|
@ -0,0 +1,26 @@
|
|||
import tiktoken
|
||||
MAX_GPT4_TOKENS= 7500
|
||||
MAX_GPT35_TURBO_TOKENS= 16000
|
||||
GPT4_COMPLETION_TOKENS=1000
|
||||
|
||||
RERANKING_TOKENS= 400
|
||||
encoding = tiktoken.get_encoding("cl100k_base")
|
||||
encoding_gpt4 = tiktoken.encoding_for_model("gpt-4")
|
||||
encoding_gpt35 = tiktoken.encoding_for_model("gpt-3.5-turbo")
|
||||
|
||||
openai_doc_reference_prompt_v2='Use the articles provided, which are delimited by triple quotes, to answer questions. If the answer cannot be found in the articles, do not provide false information; instead, write "Ich konnte keine Antwort finden.". If a question is incoherent or not factually accurate, explain the issue rather than providing an incorrect response. When you find a suitable answer, reply to the question as if you already knew the answer and refrain from referencing the provided articles. Please respond in German.'
|
||||
|
||||
openai_doc_reference_prompt_v1='Use the provided articles delimited by triple quotes to answer questions. If the answer cannot be found in the articles, write "I could not find an answer.". If you find a suitable answer, then reply to the question as if you already knew the answer and do not mention the provided articles in your response. Answer in German.'
|
||||
|
||||
openai_doc_citation_prompt_v1="You will be provided with a list of documents and a question. Your task is to reorder all documents in descending order of relevance to the given question. The ordered list should contain the id(s) of all documents, whether or not they are directly relevant to the question. Ensure you return only the array, without any additional information."
|
||||
openai_doc_citation_prompt_v2="You will be provided with a list of documents and a question. Your task is to reorder all documents in descending order of relevance to the given question. The ordered list should only contain the top five id(s) of the documents. Ensure you return only the python array, without any additional information."
|
||||
|
||||
|
||||
openai_expert_search='You will be provided with a list of paper titles, abstracts and author informations. Your task is to identify top 3 experts (authors) which fits to a given question. If a author is already in your top 3 then dont repeat the author again. Please respond in German.'
|
||||
openai_wpm_recommendation="You will be provided with a list of courses delimited by triple quotes. Your task is to recommend the top 5 courses that best fit the user's interests. For each course, provide a short description, the course title, and the name of the professor offering the course. If a course is already in your top 5, don't repeat it. Please respond in German."
|
||||
|
||||
def count_prompt_tokens_gpt4(text):
|
||||
return len(list(encoding.encode(text)))
|
||||
|
||||
def count_prompt_tokens_gpt35(text):
|
||||
return len(list(encoding.encode(text)))
|
|
@ -0,0 +1,113 @@
|
|||
from typing import List
|
||||
from haystack.schema import Document
|
||||
from reranker import ReRanker
|
||||
from reader import Reader
|
||||
from retriever.retriever import Retriever
|
||||
from haystack.nodes import FARMReader
|
||||
|
||||
|
||||
class WPMRecommendation:
|
||||
def __init__(
|
||||
self,
|
||||
retriever: Retriever,
|
||||
reader: Reader,
|
||||
reRanker: ReRanker,
|
||||
farm_reader: FARMReader,
|
||||
) -> None:
|
||||
"""
|
||||
Initializes the WPMRecommendation class with required components for retrieving, reranking, and reading documents.
|
||||
Args:
|
||||
retriever (Retriever): An instance of Retriever for fetching relevant documents.
|
||||
reader (Reader): An instance of Reader for interpreting and processing documents.
|
||||
reRanker (ReRanker): An instance of ReRanker for reranking documents based on relevance.
|
||||
farm_reader (FARMReader): An instance of FARMReader for additional reading capabilities.
|
||||
"""
|
||||
self.retriever = retriever
|
||||
self.reader = reader
|
||||
self.reranker = reRanker
|
||||
self.farm_reader = farm_reader
|
||||
|
||||
def _filter_wpms(self, documents: List[Document]):
|
||||
"""
|
||||
Filters documents to include only those marked as Wahlpflichtmodule (WPM).
|
||||
|
||||
Args:
|
||||
documents (List[Document]): A list of documents to be filtered.
|
||||
|
||||
Returns:
|
||||
List[Document]: Filtered documents marked as WPM.
|
||||
"""
|
||||
return [doc for doc in documents if doc.meta.get("is_wpm") is True]
|
||||
|
||||
def _build_query_for_prompt(
|
||||
self, interets: str, future_carrer: str, previous_courses: str
|
||||
):
|
||||
"""
|
||||
Constructs a query based on the user's interests, future career plans, and previously taken courses.
|
||||
|
||||
Args:
|
||||
interets (str): User's interests.
|
||||
future_carrer (str): User's future career aspirations.
|
||||
previous_courses (str): Previously taken courses by the user.
|
||||
|
||||
Returns:
|
||||
str: A constructed query based on the provided information.
|
||||
"""
|
||||
|
||||
query = ""
|
||||
if interets:
|
||||
query += f"Ich habe folgende Interessen: \n{interets}.\n"
|
||||
if future_carrer:
|
||||
query += f"Zudem möchte ich zukünftig im folgenden Bereich arbeiten:\n{future_carrer}.\n"
|
||||
if previous_courses:
|
||||
query += f"Ich habe bereits schon folgenden Wahlplfichtmodule belegt:\n{previous_courses}.\n"
|
||||
return query
|
||||
|
||||
def recommend_wpms(
|
||||
self,
|
||||
interets: str,
|
||||
future_carrer: str,
|
||||
previous_courses: str,
|
||||
retrieval_model_or_method="mpnet",
|
||||
recommendation_method: str = "get_retrieved_results",
|
||||
rerank_retrieved_results=True,
|
||||
):
|
||||
"""
|
||||
Recommends Wahlpflichtmodule (WPM) based on the user's interests, future career plans, and previous courses.
|
||||
|
||||
Args:
|
||||
interets (str): User's interests.
|
||||
future_carrer (str): User's future career aspirations.
|
||||
previous_courses (str): Previously taken courses by the user.
|
||||
retrieval_model_or_method (str, optional): The retrieval model or method to use. Defaults to "mpnet".
|
||||
recommendation_method (str, optional): The method for generating recommendations. Defaults to "get_retrieved_results".
|
||||
rerank_retrieved_results (bool, optional): Flag to determine if reranking should be done on retrieved results. Defaults to True.
|
||||
|
||||
Returns:
|
||||
Varies: Returns different types of outputs based on the recommendation method chosen.
|
||||
"""
|
||||
top_k_docs = self.retriever.get_top_k_passages(
|
||||
query=interets, index="ib", method=retrieval_model_or_method
|
||||
)["documents"]
|
||||
retrieved_wpms = self._filter_wpms(top_k_docs)
|
||||
final_references = retrieved_wpms
|
||||
query = self._build_query_for_prompt(
|
||||
interets=interets,
|
||||
future_carrer=future_carrer,
|
||||
previous_courses=previous_courses,
|
||||
)
|
||||
if rerank_retrieved_results:
|
||||
reranked_top_k = self.reranker.rerank_documents_with_gpt35(
|
||||
documents=retrieved_wpms, query=query
|
||||
)
|
||||
final_references = self.reranker.get_final_references(
|
||||
reranked_documents=reranked_top_k, retrieved_documents=retrieved_wpms
|
||||
)
|
||||
if recommendation_method == "generate_llm_answer":
|
||||
return self.reader.get_gpt_wpm_recommendation(
|
||||
query=query, top_k_wpms=final_references
|
||||
)
|
||||
if recommendation_method == "generate_farm_reader_answer":
|
||||
pass
|
||||
|
||||
return final_references
|
|
@ -0,0 +1,243 @@
|
|||
from typing import Dict, List
|
||||
from api.embeddingsServiceCaller import EmbeddingServiceCaller
|
||||
from retriever.retriever import Retriever
|
||||
from reader import Reader
|
||||
from embeddings.transformer_llama import LlamaTransformerEmbeddings
|
||||
from retriever.retriever_pipeline import CustomPipeline
|
||||
from embeddings.llama import Embedder
|
||||
from haystack import Document
|
||||
import json
|
||||
import ast
|
||||
import numpy as np
|
||||
from scipy.special import softmax
|
||||
from helper.openai import (
|
||||
openai_doc_reference_prompt_v1,
|
||||
openai_doc_citation_prompt_v2,
|
||||
MAX_GPT4_TOKENS,
|
||||
GPT4_COMPLETION_TOKENS,
|
||||
MAX_GPT35_TURBO_TOKENS,
|
||||
RERANKING_TOKENS,
|
||||
count_prompt_tokens_gpt4,
|
||||
count_prompt_tokens_gpt35,
|
||||
)
|
||||
from reranker import ReRanker
|
||||
from expert_search import ExpertSearch
|
||||
from module_recommendation import WPMRecommendation
|
||||
from haystack.nodes import FARMReader
|
||||
|
||||
B_INST, E_INST = "[INST]", "[/INST]"
|
||||
B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"
|
||||
|
||||
|
||||
class QuestionAnswering:
|
||||
"""
|
||||
The QuestionAnswering class serves as a comprehensive manager for handling various aspects of question answering, including expert search and module recommendations. It integrates multiple components like retrievers, rerankers, and readers to facilitate efficient information retrieval and processing.
|
||||
|
||||
Attributes:
|
||||
qa_pipeline (CustomPipeline): A pipeline for document retrieval and processing.
|
||||
caller (LlamaTransformerEmbeddings | EmbeddingServiceCaller): MODEL SERVICE Caller
|
||||
reranker (ReRanker): A component for reranking documents based on relevance.
|
||||
retriever (Retriever): A component for retrieving documents.
|
||||
reader (Reader): A component for reading and interpreting documents.
|
||||
bert_reader (FARMReader): A FARM-based reader for additional Reader.
|
||||
expert_search (ExpertSearch): A component for conducting expert searches.
|
||||
wpm_recommendation (WPMRecommendation): A component for recommending Wahlpflichtmodule (elective modules).
|
||||
"""
|
||||
|
||||
THRESHOLD = 0.5
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
pipeline: CustomPipeline,
|
||||
embedder: LlamaTransformerEmbeddings | EmbeddingServiceCaller,
|
||||
):
|
||||
"""
|
||||
Initializes the QuestionAnswering class with required components.
|
||||
|
||||
Args:
|
||||
pipeline (CustomPipeline): A pipeline for document retrieval and processing.
|
||||
embedder (LlamaTransformerEmbeddings | EmbeddingServiceCaller): MODEL SERVICE CALLER.
|
||||
"""
|
||||
self.qa_pipeline = pipeline
|
||||
self.caller = embedder
|
||||
self.reranker = ReRanker()
|
||||
self.retriever = Retriever(pipeline=self.qa_pipeline, caller=self.caller)
|
||||
self.reader = Reader(caller=self.caller)
|
||||
# NOTE: The BERT Reader is here and not in reader.py
|
||||
# TODO: Shift this to reader.py
|
||||
self.bert_reader = FARMReader(
|
||||
model_name_or_path="deepset/gelectra-base-germanquad-distilled",
|
||||
use_gpu=True,
|
||||
use_confidence_scores=False,
|
||||
)
|
||||
self.expert_search = ExpertSearch(
|
||||
pipeline=self.qa_pipeline,
|
||||
retriever=self.retriever,
|
||||
reader=self.reader,
|
||||
reRanker=self.reranker,
|
||||
farm_reader= self.bert_reader
|
||||
|
||||
)
|
||||
self.wpm_recommendation = WPMRecommendation(
|
||||
reader=self.reader, retriever=self.retriever, reRanker=self.reranker, farm_reader=self.bert_reader
|
||||
)
|
||||
|
||||
|
||||
def search_experts(
|
||||
self,
|
||||
query: str,
|
||||
search_method: str,
|
||||
retriever_model: str,
|
||||
generate_answer: bool,
|
||||
rerank: bool,
|
||||
):
|
||||
"""
|
||||
Conducts an expert search based on the specified parameters.
|
||||
|
||||
Args:
|
||||
query (str): The search query.
|
||||
search_method (str): The method of search.
|
||||
retriever_model (str): The retrieval model to be used.
|
||||
generate_answer (bool): Whether to generate an answer using a reader.
|
||||
rerank (bool): Whether to rerank the retrieved documents.
|
||||
|
||||
Returns:
|
||||
Varies: The result of the expert search.
|
||||
"""
|
||||
return self.expert_search.search_experts(
|
||||
query=query,
|
||||
rerank_documents=rerank,
|
||||
retrieval_method=retriever_model,
|
||||
generate_anwser=generate_answer,
|
||||
search_method=search_method
|
||||
)
|
||||
|
||||
def recommend_wpm(
|
||||
self,
|
||||
interets: str,
|
||||
future_carrer: str,
|
||||
previous_courses: str,
|
||||
retrieval_method_or_model: str,
|
||||
recommendation_method: str,
|
||||
rerank_retrieved_results: bool,
|
||||
):
|
||||
"""
|
||||
Provides recommendations for elective modules (Wahlpflichtmodule, WPM) based on user input.
|
||||
|
||||
Args:
|
||||
interets (str): User's interests.
|
||||
future_carrer (str): User's future career aspirations.
|
||||
previous_courses (str): Previously taken courses.
|
||||
retrieval_method_or_model (str): The retrieval model/method.
|
||||
recommendation_method (str): The recommendation method.
|
||||
rerank_retrieved_results (bool): Whether to rerank retrieved results.
|
||||
|
||||
Returns:
|
||||
Varies: Recommendations for elective modules.
|
||||
"""
|
||||
return self.wpm_recommendation.recommend_wpms(
|
||||
interets=interets,
|
||||
future_carrer=future_carrer,
|
||||
previous_courses=previous_courses,
|
||||
recommendation_method=recommendation_method,
|
||||
rerank_retrieved_results=rerank_retrieved_results,
|
||||
retrieval_model_or_method=retrieval_method_or_model,
|
||||
)
|
||||
|
||||
def get_top_k(self, query, index, meta, retrieval_method_or_model):
|
||||
"""
|
||||
Retrieves the top k documents based on the query and retrieval method.
|
||||
|
||||
Args:
|
||||
query (str): The search query.
|
||||
index (str): The index to search in.
|
||||
meta (Dict): Additional metadata for the query.
|
||||
retrieval_method_or_model (str): The retrieval method or model.
|
||||
|
||||
Returns:
|
||||
List[Document]: A list of retrieved documents.
|
||||
"""
|
||||
return self.retriever.get_top_k_passages(
|
||||
index=index, query=query, meta=meta, method=retrieval_method_or_model
|
||||
)
|
||||
|
||||
# Answers for STUPO and Crawled Data
|
||||
def get_answers(
|
||||
self,
|
||||
query: str,
|
||||
index: str = "",
|
||||
meta: Dict = {},
|
||||
retrieval_method_or_model: str = "mpnet",
|
||||
reader_model: str = "",
|
||||
rerank_documents=True,
|
||||
):
|
||||
"""
|
||||
Retrieves answers for a given query using various models and methods.
|
||||
NOTE: This is only providing answers for stupo or crawled data questions. Expert Search and WPMs have own functions.
|
||||
|
||||
|
||||
Args:
|
||||
query (str): The query to answer.
|
||||
index (str, optional): The index to search in.
|
||||
meta (Dict, optional): Additional metadata.
|
||||
retrieval_method_or_model (str, optional): Retrieval method/model.
|
||||
reader_model (str, optional): Reader model for generating answers.
|
||||
rerank_documents (bool, optional): Whether to rerank documents.
|
||||
|
||||
Returns:
|
||||
Varies: The generated answers.
|
||||
"""
|
||||
|
||||
top_k_passages = self.retriever.get_top_k_passages(
|
||||
query=query, index=index, meta=meta, method=retrieval_method_or_model
|
||||
)["documents"]
|
||||
reranked_passages = None
|
||||
if rerank_documents:
|
||||
reranked_passages = self.reranker.rerank_documents_with_gpt35(
|
||||
documents=top_k_passages, query=query
|
||||
)
|
||||
final_passages = self.reranker.get_final_references(
|
||||
reranked_documents=reranked_passages or [],
|
||||
retrieved_documents=top_k_passages,
|
||||
)
|
||||
if index in ["stupo", "crawled_hsma"]:
|
||||
if reader_model == "GPT":
|
||||
return self.reader.get_gpt_answer(
|
||||
top_k_passages=final_passages, query=query
|
||||
)
|
||||
elif reader_model == "Bert":
|
||||
return (
|
||||
self.bert_reader.predict(
|
||||
query=query,
|
||||
documents=final_passages,
|
||||
top_k=10,
|
||||
),
|
||||
final_passages,
|
||||
)
|
||||
elif reader_model == "Llama":
|
||||
return {
|
||||
"answers": [
|
||||
{
|
||||
"answer": self.reader.generate_llama_answer(
|
||||
top_k_passages=final_passages, query=query
|
||||
)
|
||||
}
|
||||
]
|
||||
}, final_passages
|
||||
else:
|
||||
return {"choices": [{"text": "Ich weiß die Antwort nicht"}]}
|
||||
|
||||
def get_module_credits(self, module: str, index: str = "ib"):
|
||||
return self.retriever.get_module_credits(
|
||||
query="", index=index, params={"title": [module]}
|
||||
)
|
||||
|
||||
def apply_softmax(self, documents: Dict):
|
||||
"""Applies Softmax to the scores of the answers
|
||||
Args:
|
||||
documents (Dict): Responses from a pipeline in Haystack format
|
||||
"""
|
||||
scores = softmax(np.array([answer.score for answer in documents["documents"]]))
|
||||
for answer, score in zip(documents["documents"], scores):
|
||||
answer.score = score
|
||||
return softmax(scores)
|
|
@ -0,0 +1,200 @@
|
|||
from typing import Dict, List
|
||||
from haystack.schema import Document
|
||||
from api.embeddingsServiceCaller import EmbeddingServiceCaller
|
||||
from helper.openai import (
|
||||
openai_doc_reference_prompt_v1,
|
||||
openai_doc_citation_prompt_v2,
|
||||
openai_wpm_recommendation,
|
||||
MAX_GPT4_TOKENS,
|
||||
GPT4_COMPLETION_TOKENS,
|
||||
MAX_GPT35_TURBO_TOKENS,
|
||||
RERANKING_TOKENS,
|
||||
count_prompt_tokens_gpt4,
|
||||
count_prompt_tokens_gpt35,
|
||||
)
|
||||
import json
|
||||
|
||||
AUTHOR_MAPPING = {
|
||||
"Wolf": "Prof. Dr. Ivo Wolf",
|
||||
"Hummel": "Prof. Dr. Oliver Hummel",
|
||||
"Fimmel": "Prof. Dr. Elena Fimmel",
|
||||
"Eckert": "Prof. Dr. rer. nat. Kai Eckert",
|
||||
"Fischer": "Prof. Dr. Jörn Fischer",
|
||||
"Gröschel": "Prof. Dr. Michael Gröschel",
|
||||
"Gumbel": "Prof. Dr. Markus Gumbel",
|
||||
"Nagel": "Prof. Dr. Till Nagel",
|
||||
"Specht": "Prof. Dr. Thomas Specht",
|
||||
"Steinberger": "Prof. Dr. Jessica Steinberger",
|
||||
"Dietrich": "Prof. Dr. Gabriele Roth-Dietrich",
|
||||
"Dopatka": "Prof. Dr. rer. nat. Frank Dopatka",
|
||||
"Kraus": "Prof. Dr. Stefan Kraus",
|
||||
"Leuchter": "Prof. Dr.-Ing. Sandro Leuchter",
|
||||
"Paulus": "Prof. Dr. Sachar Paulus",
|
||||
}
|
||||
|
||||
|
||||
class Reader:
|
||||
THRESHOLD = 0.5
|
||||
|
||||
def __init__(self, caller: EmbeddingServiceCaller) -> None:
|
||||
"""
|
||||
NOTE: The BERT Reader is in question_answering.py
|
||||
Initializes the Reader class for generating answers.
|
||||
Args:
|
||||
caller (EmbeddingServiceCaller): for calling MODEL SERVICE.
|
||||
"""
|
||||
self.caller = caller
|
||||
pass
|
||||
|
||||
def get_gpt_wpm_recommendation(self, top_k_wpms: List[Document], query: str):
|
||||
"""
|
||||
Generates GPT-based recommendations for WPMS using the provided top K documents. Also prompt is being build, and tokens are counted.
|
||||
"""
|
||||
current_token_count = count_prompt_tokens_gpt4(openai_wpm_recommendation)
|
||||
reference = ""
|
||||
picked_references = []
|
||||
for doc in top_k_wpms:
|
||||
current_token_count += count_prompt_tokens_gpt4(doc.content)
|
||||
if current_token_count < MAX_GPT4_TOKENS - GPT4_COMPLETION_TOKENS:
|
||||
meta = doc.meta
|
||||
title = meta.get("name_de")
|
||||
description = meta.get("inhalte_de")
|
||||
profs = meta.get("dozenten")
|
||||
reference += f'"""\Course Title:\n{title}\nCourse Descripiton:\n{description}\Professoren:\n{profs}\n"""\n\n'
|
||||
picked_references.append(doc)
|
||||
|
||||
payload = json.dumps(
|
||||
{
|
||||
"reference": reference,
|
||||
"question": query,
|
||||
"model": "GPT",
|
||||
"prompt": openai_wpm_recommendation,
|
||||
}
|
||||
)
|
||||
return self.caller.get_answer(payload=payload), picked_references
|
||||
|
||||
def get_gpt_expert_search_answer(
|
||||
self,
|
||||
top_k_passages: List[Document],
|
||||
query: str,
|
||||
prompt: str = openai_doc_reference_prompt_v1,
|
||||
):
|
||||
"""
|
||||
Generates an answer using GPT for expert search based on the provided top K passages. Also prompt is being build, and tokens are counted.
|
||||
Args:
|
||||
top_k_passages (List[Document]): Top K documents retrieved from the search.
|
||||
query (str): User query string.
|
||||
prompt (str, optional): System Prompt for general instructions
|
||||
|
||||
"""
|
||||
current_token_count = count_prompt_tokens_gpt4(prompt)
|
||||
reference = ""
|
||||
picked_references = []
|
||||
for doc in top_k_passages:
|
||||
current_token_count += count_prompt_tokens_gpt4(doc.content)
|
||||
if current_token_count < MAX_GPT4_TOKENS - GPT4_COMPLETION_TOKENS:
|
||||
title = doc.meta.get("title", "")
|
||||
abstract = doc.meta.get("abstract", "")
|
||||
author = AUTHOR_MAPPING.get(doc.meta.get("author", ""), "unknown")
|
||||
reference += f'"""\nTitle:\n{title}\nAuthor:\n{author}\nAbstract:\n{abstract}\n"""\n\n'
|
||||
picked_references.append(doc)
|
||||
|
||||
payload = json.dumps(
|
||||
{
|
||||
"reference": reference,
|
||||
"question": query,
|
||||
"model": "GPT",
|
||||
"prompt": prompt,
|
||||
}
|
||||
)
|
||||
return self.caller.get_answer(payload=payload), picked_references
|
||||
|
||||
def get_gpt_answer(
|
||||
self,
|
||||
top_k_passages: List[Document],
|
||||
query: str,
|
||||
prompt: str = openai_doc_reference_prompt_v1,
|
||||
):
|
||||
"""
|
||||
Generates a generic GPT-based answer using the provided top K passages. For scenarios like questions about Stupo or crawled web data.
|
||||
|
||||
Args:
|
||||
top_k_passages (List[Document]): Top K documents retrieved from the search.
|
||||
query (str): User query string.
|
||||
prompt (str, optional): System Prompt for general instructions. Defaults to openai_doc_reference_prompt_v1.
|
||||
|
||||
Returns:
|
||||
Tuple: A tuple of the answer from GPT model and the documents used for generating the answer.
|
||||
"""
|
||||
current_token_count = count_prompt_tokens_gpt4(prompt)
|
||||
reference = ""
|
||||
picked_references = []
|
||||
for doc in top_k_passages:
|
||||
current_token_count += count_prompt_tokens_gpt4(doc.content)
|
||||
if current_token_count < MAX_GPT4_TOKENS - GPT4_COMPLETION_TOKENS:
|
||||
reference += f'"""\n{doc.content}\n"""\n\n'
|
||||
picked_references.append(doc)
|
||||
|
||||
payload = json.dumps(
|
||||
{
|
||||
"reference": reference,
|
||||
"question": query,
|
||||
"model": "GPT",
|
||||
"prompt": prompt,
|
||||
}
|
||||
)
|
||||
return self.caller.get_answer(payload=payload), picked_references
|
||||
|
||||
def generate_llama_answer(self, top_k_passages: List[Document], query: str):
|
||||
"""
|
||||
Generates an answer using the Llama model based on the provided top K passages.
|
||||
"""
|
||||
picked_references = []
|
||||
reference = ""
|
||||
if top_k_passages:
|
||||
for doc in top_k_passages[:2]:
|
||||
if doc.score >= self.THRESHOLD:
|
||||
picked_references.append(doc)
|
||||
reference += f'"""\n{doc.content}\n"""\n\n'
|
||||
if reference:
|
||||
prompt = f"""
|
||||
Your Task is to use the provided articles delimited by triple quotes to answer questions. If the answer cannot be found in the articles, write "I could not find an answer.". If you find a suitable answer, then reply to the question as if you already knew the answer and do not mention the provided articles in your response. Answer in German.
|
||||
[INST] User: {query}\n\nArticles:{reference}[/INST]\n\nAssistant:"""
|
||||
payload = json.dumps(
|
||||
{
|
||||
"reference": reference,
|
||||
"question": query,
|
||||
"model": "HF",
|
||||
"prompt": prompt,
|
||||
}
|
||||
)
|
||||
return self.caller.get_answer(payload=payload, llama=True), picked_references
|
||||
|
||||
def get_answers(
|
||||
self,
|
||||
top_k_passages: List[Document],
|
||||
query: str,
|
||||
index: str,
|
||||
model: str = "",
|
||||
):
|
||||
"""
|
||||
Retrieves answers based on the specified model (GPT or Llama) and the top K passages.
|
||||
|
||||
Args:
|
||||
top_k_passages (List[Document]): Top K documents retrieved from the search.
|
||||
query (str): User query string.
|
||||
index (str): The index , which documents got retrieved previously. This clarifies if we have an expert search, wpm recomm. or anything else.
|
||||
model (str, optional): The model to use for generating answers (GPT or Llama). Defaults to an empty string.
|
||||
|
||||
Returns:
|
||||
Dict: The response containing the answer.
|
||||
"""
|
||||
if index in ["stupo", "crawled_hsma"]:
|
||||
if model == "GPT":
|
||||
return self.get_gpt_answer(documents=top_k_passages, query=query)
|
||||
elif model == "Llama":
|
||||
return self.generate_llama_answer(
|
||||
top_k_passages=top_k_passages, query=query
|
||||
)
|
||||
else:
|
||||
return {"choices": [{"text": "Ich weiß die Antwort nicht"}]}
|
|
@ -0,0 +1,15 @@
|
|||
farm-haystack[all]
|
||||
Flask
|
||||
# flask-cors
|
||||
llama-cpp-python
|
||||
numpy
|
||||
pdfminer.six
|
||||
# pymilvus==2.2.8
|
||||
# ray[serve]>=1.13.0
|
||||
beautifulsoup4
|
||||
pdfplumber
|
||||
# camelot-py
|
||||
opencv-python
|
||||
Scrapy
|
||||
python-dotenv
|
||||
gunicorn
|
|
@ -0,0 +1,100 @@
|
|||
from typing import Dict, List
|
||||
from haystack.schema import Document
|
||||
from api.embeddingsServiceCaller import EmbeddingServiceCaller
|
||||
from helper.openai import (
|
||||
openai_doc_reference_prompt_v1,
|
||||
openai_doc_citation_prompt_v2,
|
||||
MAX_GPT4_TOKENS,
|
||||
GPT4_COMPLETION_TOKENS,
|
||||
MAX_GPT35_TURBO_TOKENS,
|
||||
RERANKING_TOKENS,
|
||||
count_prompt_tokens_gpt4,
|
||||
count_prompt_tokens_gpt35,
|
||||
)
|
||||
import json
|
||||
import ast
|
||||
|
||||
|
||||
class ReRanker:
|
||||
def __init__(self) -> None:
|
||||
"""
|
||||
Initializes the ReRanker class with an caller for MODEL SERVICE.
|
||||
"""
|
||||
self.caller = EmbeddingServiceCaller()
|
||||
|
||||
def rerank_documents_with_gpt35(self, documents: List[Document], query: str):
|
||||
"""
|
||||
Reranks a list of documents using GPT-3.5 based on a given query.
|
||||
|
||||
Args:
|
||||
documents (List[Document]): A list of Document objects to be reranked.
|
||||
query (str): The query string used for reranking.
|
||||
|
||||
Returns:
|
||||
List[Document]: A list of reranked Document objects.
|
||||
"""
|
||||
formatted_documents = []
|
||||
reranked_documents_token_count = count_prompt_tokens_gpt35(
|
||||
openai_doc_citation_prompt_v2
|
||||
)
|
||||
for doc in documents:
|
||||
reranked_documents_token_count += count_prompt_tokens_gpt35(doc.content)
|
||||
if (
|
||||
reranked_documents_token_count
|
||||
< MAX_GPT35_TURBO_TOKENS - RERANKING_TOKENS
|
||||
):
|
||||
formatted_documents.append({"content": doc.content, "id": doc.id})
|
||||
|
||||
payload = json.dumps(
|
||||
{
|
||||
"system_prompt": openai_doc_citation_prompt_v2,
|
||||
"documents": formatted_documents,
|
||||
"query": query,
|
||||
}
|
||||
)
|
||||
sorted_document_ids = self.caller.rerank_documents_gpt(payload=payload)
|
||||
print(sorted_document_ids, "sorted_document_ids")
|
||||
message_content = sorted_document_ids["choices"][0]["message"]["content"]
|
||||
|
||||
# Check if the message content is a string representation of a list. If not then return empty list.
|
||||
# If yes then parse it, and check if the returned ids exists.
|
||||
try:
|
||||
content_list = ast.literal_eval(message_content)
|
||||
if isinstance(content_list, list):
|
||||
# Proceed with further processing
|
||||
return [doc for doc in documents for id in content_list if id == doc.id]
|
||||
else:
|
||||
return []
|
||||
except (SyntaxError, ValueError):
|
||||
return []
|
||||
|
||||
def get_final_references(
|
||||
self, reranked_documents: List[Document], retrieved_documents: List[Document]
|
||||
) -> List[Document]:
|
||||
"""
|
||||
Combines reranked and retrieved documents, ensuring no duplicates and maintaining order.
|
||||
|
||||
Args:
|
||||
reranked_documents (List[Document]): The documents after reranking.
|
||||
retrieved_documents (List[Document]): The original set of retrieved documents.
|
||||
|
||||
Returns:
|
||||
List[Document]: A combined list of reranked and retrieved documents.
|
||||
"""
|
||||
final_references = list(reranked_documents)
|
||||
if not reranked_documents:
|
||||
return retrieved_documents
|
||||
# If The model in the Re-Ranking process did not return all document ids.
|
||||
# In that Case, we create a new sorted list. The first indexes are the existing documents
|
||||
# from the re-ranking, followed by the missing ones from the retriever.
|
||||
elif len(reranked_documents) < len(retrieved_documents):
|
||||
reranked_ids = set(doc.id for doc in reranked_documents)
|
||||
missing_documents = [
|
||||
doc for doc in retrieved_documents if doc.id not in reranked_ids
|
||||
]
|
||||
final_references.extend(missing_documents)
|
||||
return final_references
|
||||
elif len(reranked_documents) == len(retrieved_documents):
|
||||
return final_references
|
||||
else:
|
||||
return retrieved_documents
|
|
@ -0,0 +1,252 @@
|
|||
# pylint: disable=ungrouped-imports
|
||||
"""
|
||||
---------------------------------------------------------------------------
|
||||
NOTE:
|
||||
Custom Implementation of an Retriever based on the LLaMA Model, which is compatible with Haystack Retriever Pipeline.
|
||||
Calls under the hood the MODEL SERVICE.
|
||||
|
||||
NOTE: SEE functions embed_queries and embed_documents for pooling strategy and layer extraction
|
||||
---------------------------------------------------------------------------
|
||||
"""
|
||||
from typing import List, Dict, Union, Optional, Any, Literal, Callable
|
||||
|
||||
import logging
|
||||
from pathlib import Path
|
||||
from copy import deepcopy
|
||||
from requests.exceptions import HTTPError
|
||||
|
||||
import numpy as np
|
||||
from tqdm import tqdm
|
||||
|
||||
import pandas as pd
|
||||
from huggingface_hub import hf_hub_download
|
||||
|
||||
from haystack.errors import HaystackError
|
||||
from haystack.schema import Document, FilterType
|
||||
from haystack.document_stores import BaseDocumentStore
|
||||
from haystack.telemetry import send_event
|
||||
from haystack.lazy_imports import LazyImport
|
||||
from haystack.nodes.retriever import DenseRetriever
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
with LazyImport(message="Run 'pip install farm-haystack[inference]'") as torch_and_transformers_import:
|
||||
import torch
|
||||
from haystack.modeling.utils import initialize_device_settings # pylint: disable=ungrouped-imports
|
||||
from transformers import AutoConfig
|
||||
|
||||
import sys
|
||||
sys.path.append("../..")
|
||||
from api.embeddingsServiceCaller import EmbeddingServiceCaller
|
||||
|
||||
_EMBEDDING_ENCODERS: Dict[str, Callable] = {
|
||||
"llama": {}
|
||||
}
|
||||
|
||||
class LlamaRetriever(DenseRetriever):
|
||||
def __init__(
|
||||
self,
|
||||
model_format = "llama",
|
||||
document_store: Optional[BaseDocumentStore] = None,
|
||||
model_version: Optional[str] = None,
|
||||
use_gpu: bool = True,
|
||||
batch_size: int = 32,
|
||||
max_seq_len: int = 512,
|
||||
pooling_strategy: str = "reduce_mean",
|
||||
emb_extraction_layer: int = -1,
|
||||
top_k: int = 10,
|
||||
progress_bar: bool = True,
|
||||
devices: Optional[List[Union[str, "torch.device"]]] = None,
|
||||
use_auth_token: Optional[Union[str, bool]] = None,
|
||||
scale_score: bool = True,
|
||||
embed_meta_fields: Optional[List[str]] = None,
|
||||
api_key: Optional[str] = None,
|
||||
azure_api_version: str = "2022-12-01",
|
||||
azure_base_url: Optional[str] = None,
|
||||
azure_deployment_name: Optional[str] = None,
|
||||
api_base: str = "https://api.openai.com/v1",
|
||||
openai_organization: Optional[str] = None,
|
||||
):
|
||||
torch_and_transformers_import.check()
|
||||
|
||||
if embed_meta_fields is None:
|
||||
embed_meta_fields = []
|
||||
super().__init__()
|
||||
|
||||
self.devices, _ = initialize_device_settings(devices=devices, use_cuda=use_gpu, multi_gpu=True)
|
||||
|
||||
if batch_size < len(self.devices):
|
||||
logger.warning("Batch size is less than the number of devices.All gpus will not be utilized.")
|
||||
|
||||
self.document_store = document_store
|
||||
self.model_version = model_version
|
||||
self.use_gpu = use_gpu
|
||||
self.batch_size = batch_size
|
||||
self.max_seq_len = max_seq_len
|
||||
self.pooling_strategy = pooling_strategy
|
||||
self.emb_extraction_layer = emb_extraction_layer
|
||||
self.top_k = top_k
|
||||
self.progress_bar = progress_bar
|
||||
self.use_auth_token = use_auth_token
|
||||
self.scale_score = scale_score
|
||||
self.api_key = api_key
|
||||
self.api_base = api_base
|
||||
self.api_version = azure_api_version
|
||||
self.azure_base_url = azure_base_url
|
||||
self.azure_deployment_name = azure_deployment_name
|
||||
self.openai_organization = openai_organization
|
||||
self.model_format= model_format
|
||||
self.emb_caller= EmbeddingServiceCaller()
|
||||
|
||||
|
||||
|
||||
|
||||
self.embed_meta_fields = embed_meta_fields
|
||||
|
||||
def retrieve(
|
||||
self,
|
||||
query: str,
|
||||
filters: Optional[FilterType] = None,
|
||||
top_k: Optional[int] = None,
|
||||
index: Optional[str] = None,
|
||||
headers: Optional[Dict[str, str]] = None,
|
||||
scale_score: Optional[bool] = None,
|
||||
document_store: Optional[BaseDocumentStore] = None,
|
||||
) -> List[Document]:
|
||||
document_store = document_store or self.document_store
|
||||
if document_store is None:
|
||||
raise ValueError(
|
||||
"This Retriever was not initialized with a Document Store. Provide one to the retrieve() method."
|
||||
)
|
||||
if top_k is None:
|
||||
top_k = self.top_k
|
||||
if index is None:
|
||||
index = document_store.index
|
||||
if scale_score is None:
|
||||
scale_score = self.scale_score
|
||||
query_emb = self.embed_queries(queries=[query])
|
||||
documents = document_store.query_by_embedding(
|
||||
query_emb=query_emb, filters=filters, top_k=top_k, index=index, headers=headers, scale_score=scale_score
|
||||
)
|
||||
return documents
|
||||
|
||||
def retrieve_batch(
|
||||
self,
|
||||
queries: List[str],
|
||||
filters: Optional[Union[FilterType, List[Optional[FilterType]]]] = None,
|
||||
top_k: Optional[int] = None,
|
||||
index: Optional[str] = None,
|
||||
headers: Optional[Dict[str, str]] = None,
|
||||
batch_size: Optional[int] = None,
|
||||
scale_score: Optional[bool] = None,
|
||||
document_store: Optional[BaseDocumentStore] = None,
|
||||
) -> List[List[Document]]:
|
||||
document_store = document_store or self.document_store
|
||||
if document_store is None:
|
||||
raise ValueError(
|
||||
"This Retriever was not initialized with a Document Store. Provide one to the retrieve_batch() method."
|
||||
)
|
||||
if top_k is None:
|
||||
top_k = self.top_k
|
||||
|
||||
if batch_size is None:
|
||||
batch_size = self.batch_size
|
||||
|
||||
if index is None:
|
||||
index = document_store.index
|
||||
if scale_score is None:
|
||||
scale_score = self.scale_score
|
||||
|
||||
query_embs: np.ndarray = self.embed_queries(queries=queries)
|
||||
batched_query_embs: List[np.ndarray] = []
|
||||
for i in range(0, len(query_embs), batch_size):
|
||||
batched_query_embs.extend(query_embs[i : i + batch_size])
|
||||
documents = document_store.query_by_embedding_batch(
|
||||
query_embs=batched_query_embs,
|
||||
top_k=top_k,
|
||||
filters=filters,
|
||||
index=index,
|
||||
headers=headers,
|
||||
scale_score=scale_score,
|
||||
)
|
||||
|
||||
return documents
|
||||
|
||||
def embed_queries(self, queries: List[str]) -> np.ndarray:
|
||||
if isinstance(queries, str):
|
||||
queries = [queries]
|
||||
assert isinstance(queries, list), "Expecting a list of texts, i.e. create_embeddings(texts=['text1',...])"
|
||||
return np.array(self.emb_caller.get_embeddings(queries[0] ))
|
||||
|
||||
def embed_documents(self, documents: List[Document]) -> np.ndarray:
|
||||
documents = self._preprocess_documents(documents)
|
||||
embeddings=[]
|
||||
for doc in documents:
|
||||
embeddings.append(self.emb_caller.get_embeddings(doc.content))
|
||||
return np.array(embeddings)
|
||||
|
||||
def _preprocess_documents(self, docs: List[Document]) -> List[Document]:
|
||||
linearized_docs = []
|
||||
for doc in docs:
|
||||
doc = deepcopy(doc)
|
||||
if doc.content_type == "table":
|
||||
if isinstance(doc.content, pd.DataFrame):
|
||||
doc.content = doc.content.to_csv(index=False)
|
||||
else:
|
||||
raise HaystackError("Documents of type 'table' need to have a pd.DataFrame as content field")
|
||||
meta_data_fields = []
|
||||
for key in self.embed_meta_fields:
|
||||
if key in doc.meta and doc.meta[key]:
|
||||
if isinstance(doc.meta[key], list):
|
||||
meta_data_fields.extend([item for item in doc.meta[key]])
|
||||
else:
|
||||
meta_data_fields.append(doc.meta[key])
|
||||
meta_data_fields = [str(field) for field in meta_data_fields]
|
||||
doc.content = "\n".join(meta_data_fields + [doc.content])
|
||||
linearized_docs.append(doc)
|
||||
return linearized_docs
|
||||
|
||||
@staticmethod
|
||||
def _infer_model_format(model_name_or_path: str, use_auth_token: Optional[Union[str, bool]]) -> str:
|
||||
valid_openai_model_name = model_name_or_path in ["ada", "babbage", "davinci", "curie"] or any(
|
||||
m in model_name_or_path for m in ["-ada-", "-babbage-", "-davinci-", "-curie-"]
|
||||
)
|
||||
if valid_openai_model_name:
|
||||
return "openai"
|
||||
if model_name_or_path in ["small", "medium", "large", "multilingual-22-12", "finance-sentiment"]:
|
||||
return "cohere"
|
||||
if Path(model_name_or_path).exists():
|
||||
if Path(f"{model_name_or_path}/config_sentence_transformers.json").exists():
|
||||
return "sentence_transformers"
|
||||
else:
|
||||
try:
|
||||
hf_hub_download(
|
||||
repo_id=model_name_or_path,
|
||||
filename="config_sentence_transformers.json",
|
||||
use_auth_token=use_auth_token,
|
||||
)
|
||||
return "sentence_transformers"
|
||||
except HTTPError:
|
||||
pass
|
||||
|
||||
config = AutoConfig.from_pretrained(model_name_or_path, use_auth_token=use_auth_token)
|
||||
if config.model_type == "retribert":
|
||||
return "retribert"
|
||||
|
||||
return "farm"
|
||||
|
||||
def train(
|
||||
self,
|
||||
training_data: List[Dict[str, Any]],
|
||||
learning_rate: float = 2e-5,
|
||||
n_epochs: int = 1,
|
||||
num_warmup_steps: Optional[int] = None,
|
||||
batch_size: int = 16,
|
||||
train_loss: Literal["mnrl", "margin_mse"] = "mnrl",
|
||||
num_workers: int = 0,
|
||||
use_amp: bool = False,
|
||||
**kwargs,
|
||||
) -> None:
|
||||
pass
|
||||
|
|
@ -0,0 +1,15 @@
|
|||
from haystack.nodes.base import BaseComponent
|
||||
from typing import List, Optional
|
||||
|
||||
class JoinDocConverter(BaseComponent):
|
||||
"""Manipulates the output from JoinDocuments so that the DensePassageRetriever can process the results from the join.
|
||||
See Haystack Custom Nodes for more information about custom nodes: https://haystack.deepset.ai/pipeline_nodes/custom-nodes
|
||||
"""
|
||||
outgoing_edges = 1
|
||||
|
||||
def run(self, documents):
|
||||
output = {"documents": documents, "root_node": "Query"}
|
||||
return output, "output_1"
|
||||
def run_batch(self,documents, queries: List[str], my_arg: Optional[int] = 10):
|
||||
output = {"documents": documents, "root_node": "Query"}
|
||||
return output, "output_1"
|
|
@ -0,0 +1,46 @@
|
|||
from haystack.nodes.base import BaseComponent
|
||||
from typing import List, Optional
|
||||
|
||||
|
||||
class MethodRetrieverClassifier(BaseComponent):
|
||||
"""
|
||||
The MethodRetrieverClassifier class, serves as a routing component within a pipeline, determining which retriever to use based on a specified method.
|
||||
It supports different retrieval techniques such as "mpnet", "distilbert", "ada", and "llama", assigning queries to the corresponding retriever.
|
||||
|
||||
Args:
|
||||
BaseComponent (_type_): Haystack BaseComponent for compability
|
||||
"""
|
||||
outgoing_edges = 7
|
||||
|
||||
def run(self, method: str, index: str, query: str, intent=None, top_k=5):
|
||||
params = {"top_k": top_k, "index": index}
|
||||
if method == "mpnet":
|
||||
return params, "output_1"
|
||||
elif method == "distilbert":
|
||||
return params, "output_2"
|
||||
elif method == "ada":
|
||||
return params, "output_3"
|
||||
elif method == "llama":
|
||||
return params, "output_4"
|
||||
else:
|
||||
return params, "output_1"
|
||||
|
||||
def run_batch(
|
||||
self,
|
||||
method: str,
|
||||
index: str,
|
||||
queries: List[str],
|
||||
top_k=5,
|
||||
my_arg: Optional[int] = 10,
|
||||
):
|
||||
params = {"top_k": top_k, "index": index}
|
||||
if method == "mpnet":
|
||||
return params, "output_1"
|
||||
elif method == "distilbert":
|
||||
return params, "output_2"
|
||||
elif method == "ada":
|
||||
return params, "output_3"
|
||||
elif method == "llama":
|
||||
return params, "output_4"
|
||||
else:
|
||||
return params, "output_1"
|
|
@ -0,0 +1,45 @@
|
|||
from typing import Dict
|
||||
from api.embeddingsServiceCaller import EmbeddingServiceCaller
|
||||
from retriever.retriever_pipeline import CustomPipeline
|
||||
import numpy as np
|
||||
class Retriever:
|
||||
def __init__(self, pipeline:CustomPipeline, caller:EmbeddingServiceCaller) -> None:
|
||||
"""
|
||||
Initializes the Retriever class with a CustomPipeline and an EmbeddingServiceCaller.
|
||||
|
||||
Args:
|
||||
pipeline (CustomPipeline): An instance of the CustomPipeline to handle query retrieval.
|
||||
caller (EmbeddingServiceCaller): An instance of EmbeddingServiceCaller to fetch embeddings for the **MODEL SERVICE**.
|
||||
"""
|
||||
self.pipeline=pipeline
|
||||
self.caller= caller
|
||||
|
||||
def get_top_k_passages(
|
||||
self, query: str, index: str = "", meta: Dict = {}, method: str = "mpnet"
|
||||
):
|
||||
"""
|
||||
Retrieves the top K passages for a given query using the specified retrieval method.
|
||||
|
||||
Args:
|
||||
query (str): The search query.
|
||||
index (str, optional): The index to search in. Defaults to "".
|
||||
meta (Dict, optional): Additional metadata for the query. Defaults to {}.
|
||||
method (str, optional): The retrieval method (e.g., 'mpnet', 'llama'). Defaults to "mpnet".
|
||||
|
||||
Returns:
|
||||
[type]: The retrieved results.
|
||||
"""
|
||||
emb_query = None
|
||||
results = None
|
||||
if method == "llama":
|
||||
emb_query = np.array(self.caller.get_embeddings(query))
|
||||
results = self.pipeline.query_by_emb(index=index, emb=emb_query)
|
||||
else:
|
||||
results = self.pipeline.run(query=query, index=index, retrieval_method=method)
|
||||
# self.apply_softmax(results)
|
||||
return results
|
||||
|
||||
def get_module_credits(self, module: str, index: str = ""):
|
||||
return self.pipeline.filter_query(
|
||||
query="", index="ib", params={"title": [module]}
|
||||
)
|
|
@ -0,0 +1,279 @@
|
|||
from haystack.document_stores.elasticsearch import ElasticsearchDocumentStore
|
||||
from haystack.nodes import (
|
||||
EmbeddingRetriever,
|
||||
BM25Retriever,
|
||||
SentenceTransformersRanker,
|
||||
FilterRetriever,
|
||||
)
|
||||
from haystack import Pipeline
|
||||
from typing import List, Dict, Optional
|
||||
import os
|
||||
from dotenv import load_dotenv
|
||||
from haystack.document_stores import WeaviateDocumentStore
|
||||
from .LlamaRetriever import LlamaRetriever
|
||||
from .custom_components.retrieval_model_classifier import MethodRetrieverClassifier
|
||||
|
||||
load_dotenv()
|
||||
sys_path = os.environ.get("SYS_PATH")
|
||||
es_host = os.environ.get("ELASTIC_HOST", "localhost")
|
||||
PORT = 9210 if es_host == "localhost" else 9200
|
||||
|
||||
|
||||
# Custom Elasticsearch mapping for multiple embedding fields for every retrieval model
|
||||
custom_mapping = {
|
||||
"mappings": {
|
||||
"properties": {
|
||||
"content": {"type": "text"},
|
||||
"content_type": {"type": "text"},
|
||||
"ada_embedding": {"type": "dense_vector", "dims": 1536},
|
||||
"mpnet_embedding": {"type": "dense_vector", "dims": 768},
|
||||
"distilbert_embedding": {"type": "dense_vector", "dims": 512},
|
||||
"name": {"type": "keyword"},
|
||||
},
|
||||
"dynamic_templates": [
|
||||
{
|
||||
"strings": {
|
||||
"path_match": "*",
|
||||
"match_mapping_type": "string",
|
||||
"mapping": {"type": "keyword"},
|
||||
}
|
||||
}
|
||||
],
|
||||
},
|
||||
"settings": {"analysis": {"analyzer": {"default": {"type": "german"}}}},
|
||||
}
|
||||
|
||||
|
||||
class CustomPipeline:
|
||||
"""
|
||||
The CustomPipeline class orchestrates a variety of retrievers and document stores, utilizing a MethodRetrieverClassifier to direct queries based on defined parameters.
|
||||
It integrates multiple embedding-based retrieval methods and reranking methods.
|
||||
"""
|
||||
def __init__(self, doc_index="document", label_index="label", api_key="") -> None:
|
||||
"""Initializes the Question Answering Pipeline with retrievers, Document Stores for DB Connections and reranking components.
|
||||
|
||||
Args:
|
||||
doc_index (str, optional): Default Elasticsearch / Weaviate Index. Defaults to "document".
|
||||
label_index (str, optional): Label index for evaluation purposes. Defaults to "label".
|
||||
api_key (str, optional): API Key for external Provider Services: Defaults to "".
|
||||
"""
|
||||
self.doc_store_ada = ElasticsearchDocumentStore(
|
||||
host=es_host,
|
||||
port=PORT,
|
||||
analyzer="german",
|
||||
index=doc_index,
|
||||
label_index=label_index,
|
||||
embedding_dim=1536,
|
||||
similarity="dot_product",
|
||||
embedding_field="ada_embedding",
|
||||
custom_mapping=custom_mapping,
|
||||
)
|
||||
self.doc_store_mpnet = ElasticsearchDocumentStore(
|
||||
host=es_host,
|
||||
port=PORT,
|
||||
analyzer="german",
|
||||
index=doc_index,
|
||||
label_index=label_index,
|
||||
embedding_dim=768,
|
||||
similarity="dot_product",
|
||||
embedding_field="mpnet_embedding",
|
||||
custom_mapping=custom_mapping,
|
||||
)
|
||||
self.doc_store_distilbert = ElasticsearchDocumentStore(
|
||||
host=es_host,
|
||||
port=PORT,
|
||||
analyzer="german",
|
||||
index=doc_index,
|
||||
label_index=label_index,
|
||||
embedding_dim=512,
|
||||
similarity="dot_product",
|
||||
embedding_field="distilbert_embedding",
|
||||
custom_mapping=custom_mapping,
|
||||
)
|
||||
|
||||
# self.vector_doc_store_llama = WeaviateDocumentStore(
|
||||
# host="http://localhost", port=3434, embedding_dim=4096
|
||||
# )
|
||||
|
||||
self.emb_retriever_ada = EmbeddingRetriever(
|
||||
document_store=self.doc_store_ada,
|
||||
batch_size=8,
|
||||
embedding_model="text-embedding-ada-002",
|
||||
api_key=api_key,
|
||||
max_seq_len=1536,
|
||||
)
|
||||
|
||||
self.emb_retriever_mpnet = EmbeddingRetriever(
|
||||
document_store=self.doc_store_mpnet,
|
||||
embedding_model="sentence-transformers/multi-qa-mpnet-base-dot-v1",
|
||||
model_format="sentence_transformers",
|
||||
)
|
||||
self.retriever_distilbert = EmbeddingRetriever(
|
||||
document_store=self.doc_store_distilbert,
|
||||
embedding_model="sentence-transformers/distiluse-base-multilingual-cased-v2",
|
||||
model_format="sentence_transformers",
|
||||
)
|
||||
# self.llama_retriever = LlamaRetriever(
|
||||
# document_store=self.vector_doc_store_llama
|
||||
# )
|
||||
self.bm25_retriever = BM25Retriever(document_store=self.doc_store_mpnet)
|
||||
self.ranker = SentenceTransformersRanker(
|
||||
model_name_or_path="svalabs/cross-electra-ms-marco-german-uncased",
|
||||
use_gpu=True,
|
||||
)
|
||||
self.init_qa_pipeline()
|
||||
self.filter_retriever = FilterRetriever(
|
||||
document_store=self.doc_store_mpnet, all_terms_must_match=True
|
||||
)
|
||||
|
||||
def __init_doc_store(
|
||||
self,
|
||||
host: str = os.getenv("ES_HOSTNAME"),
|
||||
port: int = 9200,
|
||||
analyzer: str = "german",
|
||||
index: str = "",
|
||||
embedding_dim: int = 768,
|
||||
similarity: str = "dot_product",
|
||||
custom_mapping: Optional[dict] = None,
|
||||
):
|
||||
""" Helper Function to u a document store with the provided configuration.
|
||||
|
||||
Args:
|
||||
host (str, optional): hostname where the DB is running e.g. es01 or localhost. Defaults to os.getenv("ES_HOSTNAME").
|
||||
port (int, optional): Port where the DB is running. Defaults to 9200.
|
||||
analyzer (str, optional): Elasticsearch analyzer. Defaults to "german".
|
||||
index (str, optional): Index which the Document Store referes to. Defaults to "".
|
||||
embedding_dim (int, optional): Dimenstions of the Embeding Model. Defaults to 768.
|
||||
similarity (str, optional): Similarity function for retrieval. Defaults to "dot_product".
|
||||
custom_mapping (Optional[dict], optional): Custom DB Mapping. Defaults to None.
|
||||
|
||||
Returns:
|
||||
_type_: _description_
|
||||
"""
|
||||
doc_store = ElasticsearchDocumentStore(
|
||||
host=host,
|
||||
port=port,
|
||||
analyzer=analyzer,
|
||||
index=index,
|
||||
embedding_dim=embedding_dim,
|
||||
similarity=similarity,
|
||||
custom_mapping=custom_mapping,
|
||||
)
|
||||
self.doc_stores[index] = doc_store
|
||||
return doc_store
|
||||
|
||||
def init_qa_pipeline(self):
|
||||
"""
|
||||
Initializes the question-answering pipeline by adding necessary retriever nodes , reranking nodes, and Custom Components for retriever routing .
|
||||
|
||||
Returns:
|
||||
Pipeline: The initialized QA pipeline.
|
||||
"""
|
||||
pipe = Pipeline()
|
||||
pipe.add_node(
|
||||
component=MethodRetrieverClassifier(),
|
||||
name="RetrieverClassifier",
|
||||
inputs=["Query"],
|
||||
)
|
||||
pipe.add_node(
|
||||
component=self.emb_retriever_mpnet,
|
||||
name="EMBRetrieverMPNET",
|
||||
inputs=["RetrieverClassifier.output_1"],
|
||||
)
|
||||
pipe.add_node(
|
||||
component=self.retriever_distilbert,
|
||||
name="EMBRetrieverDISTILBERT",
|
||||
inputs=["RetrieverClassifier.output_2"],
|
||||
)
|
||||
pipe.add_node(
|
||||
component=self.emb_retriever_ada,
|
||||
name="EMBRetrieverADA",
|
||||
inputs=["RetrieverClassifier.output_3"],
|
||||
)
|
||||
# pipe.add_node(
|
||||
# component=self.llama_retriever,
|
||||
# name="EMBRetrieverLLAMA",
|
||||
# inputs=["RetrieverClassifier.output_4"],
|
||||
# )
|
||||
pipe.add_node(component=self.ranker, name="Ranker",
|
||||
inputs=["EMBRetrieverADA","EMBRetrieverDISTILBERT","EMBRetrieverMPNET"])
|
||||
self.qa_pipeline = pipe
|
||||
return self.qa_pipeline
|
||||
|
||||
def filter_query(self, query, index, params):
|
||||
"""
|
||||
Filters a query based on specified parameters.
|
||||
|
||||
Args:
|
||||
query (str): The query string.
|
||||
index (str): The index to search in.
|
||||
params (dict): Additional parameters for filtering.
|
||||
|
||||
Returns:
|
||||
list: The filtered query results.
|
||||
"""
|
||||
return self.filter_retriever.retrieve(query=query, index=index, filters=params)
|
||||
|
||||
def query_by_ids(self, ids):
|
||||
return self.doc_store_mpnet.get_documents_by_id(ids)
|
||||
|
||||
def query_by_emb(self, index, emb):
|
||||
return self.vector_doc_store_llama.query_by_embedding(
|
||||
query_emb=emb, index=index
|
||||
)
|
||||
|
||||
def get_qa_pipeline(self):
|
||||
"""
|
||||
Gets the question-answering pipeline.
|
||||
|
||||
Returns:
|
||||
Pipeline: The QA pipeline.
|
||||
"""
|
||||
return self.qa_pipeline
|
||||
|
||||
def get_all_weaviate_data(self, index):
|
||||
"""
|
||||
Retrieves all documents from a Weaviate document store.
|
||||
|
||||
Args:
|
||||
index (str): The index to retrieve documents from.
|
||||
|
||||
Returns:
|
||||
list: All documents from the specified index in Weaviate.
|
||||
"""
|
||||
return self.vector_doc_store_llama.get_all_documents(index=index)
|
||||
|
||||
def get_all_elastic_data(self, index):
|
||||
"""
|
||||
Retrieves all documents from an Elasticsearch document store.
|
||||
|
||||
Args:
|
||||
index (str): The Elasticsearch index to retrieve documents from.
|
||||
|
||||
Returns:
|
||||
list: All documents from the specified index in Elasticsearch.
|
||||
"""
|
||||
return self.doc_store_mpnet.get_all_documents(index=index)
|
||||
|
||||
def run(self, query, index, retrieval_method):
|
||||
"""
|
||||
Runs the QA pipeline with the given query, index, and retrieval method.
|
||||
|
||||
Args:
|
||||
query (str): The query string.
|
||||
index (str): The index to search in.
|
||||
retrieval_method (str): The retrieval method to use.
|
||||
|
||||
Returns:
|
||||
dict: The results from running the QA pipeline.
|
||||
"""
|
||||
return self.qa_pipeline.run(
|
||||
query=query,
|
||||
params={
|
||||
"RetrieverClassifier": {
|
||||
"method": retrieval_method,
|
||||
"index": index,
|
||||
"top_k": 10,
|
||||
},
|
||||
},
|
||||
)
|
|
@ -0,0 +1,344 @@
|
|||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="UTF-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>Chatbot HSMA</title>
|
||||
<style>
|
||||
body {
|
||||
margin: 0;
|
||||
padding: 0;
|
||||
font-family: Arial, sans-serif;
|
||||
}
|
||||
|
||||
.main-container {
|
||||
display: flex;
|
||||
width: 100%;
|
||||
padding: 20px;
|
||||
}
|
||||
|
||||
.sample-questions {
|
||||
padding: 20px;
|
||||
border-radius: 8px;
|
||||
background-color: #e6f2ff; /* Hintergrundfarbe */
|
||||
height: 30%;
|
||||
}
|
||||
.aufgabenstellung {
|
||||
padding: 20px;
|
||||
border-radius: 8px;
|
||||
background-color: #e6f2ff; /* Hintergrundfarbe */
|
||||
}
|
||||
|
||||
.sample-questions h3,
|
||||
.sample-questions h4 {
|
||||
font-weight: bold;
|
||||
margin-top: 15px;
|
||||
color: #2c3e50; /* Dunkles Blau für Überschriften */
|
||||
}
|
||||
.aufgabe-title {
|
||||
font-weight: bold;
|
||||
margin-top: 15px;
|
||||
font-size: 20px;
|
||||
color: #2c3e50; /* Dunkles Blau für Überschriften */
|
||||
}
|
||||
.sample-questions ul,
|
||||
.aufgabenstellung ul {
|
||||
list-style-type: none;
|
||||
padding-left: 0;
|
||||
}
|
||||
|
||||
.sample-questions li,
|
||||
.aufgabenstellung li {
|
||||
margin-bottom: 10px;
|
||||
}
|
||||
|
||||
.container {
|
||||
display: flex;
|
||||
align-items: flex-start;
|
||||
width: 100%;
|
||||
max-width: 100%;
|
||||
}
|
||||
iframe {
|
||||
position: relative;
|
||||
left: -20px;
|
||||
margin-right: -100px;
|
||||
}
|
||||
.aufgabenstellung {
|
||||
flex: 1;
|
||||
margin-right: 20px;
|
||||
}
|
||||
|
||||
#resetChat {
|
||||
display: inline-block;
|
||||
padding: 10px 15px 10px;
|
||||
margin: 10px 0;
|
||||
background-color: #007bff;
|
||||
color: white;
|
||||
border: none;
|
||||
cursor: pointer;
|
||||
border-radius: 4px;
|
||||
}
|
||||
|
||||
.content {
|
||||
width: 100%;
|
||||
flex: 1;
|
||||
margin-right: 20px;
|
||||
}
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<div class="main-container">
|
||||
<div class="container">
|
||||
<!-- Content-Bereich -->
|
||||
<div class="content">
|
||||
<div class="aufgabenstellung">
|
||||
<h2>Aufgabenstellung</h2>
|
||||
<p>
|
||||
Du hast die Gelegenheit, mit einem spezialisierten Chatbot zu
|
||||
interagieren, der entwickelt wurde, um Informationen und
|
||||
Empfehlungen bezogen auf Ihre Hochschule zu liefern. Der Chatbot
|
||||
verfügt über eine Vielzahl von Daten und kann bei vielen
|
||||
verschiedenen Anfragen assistieren. Im Folgenden findest du eine
|
||||
Liste von Aufgaben, die dem Chatbot gestellt werden können , um
|
||||
seine Fähigkeiten zu evaluieren. Bitte beachte die vorgegebenen
|
||||
Formulierungen und Szenarien, um einheitliche Ergebnisse zu
|
||||
gewährleisten.
|
||||
</p>
|
||||
<p>
|
||||
Wenn du bei der Durchführung der Aufgaben auf Probleme stoßt, etwa
|
||||
wenn der Chatbot nicht reagiert, dann kannst du die Konversation
|
||||
mit dem folgenden Button neu starten. Beachten jedoch, dass der
|
||||
Chatbot bei intensiven KI-Suchen für Empfehlungen und Ähnliches
|
||||
bis zu 30 Sekunden benötigen kann, um passende Ergebnisse zu
|
||||
liefern.
|
||||
</p>
|
||||
<button id="resetChat">
|
||||
Neustarten (Chat-Session zurücksetzen)
|
||||
</button>
|
||||
<br />
|
||||
<br />
|
||||
|
||||
<div class="aufgabe-title">
|
||||
Aufgabe 1: Empfehlungen Wahlpflichtmodule
|
||||
</div>
|
||||
<p>
|
||||
Der Chatbot kann Wahlpflichtmodule aus dem Informatik Bachelor
|
||||
Studiengang Empfehlen.
|
||||
</p>
|
||||
<p><b>Aufgabe 1.1</b></p>
|
||||
<p>
|
||||
Schreibe dem Chatbot die Nachricht:
|
||||
<i>"Empfehlungen für Wahlpflichtmodule"</i> oder klicke auf den
|
||||
Button.
|
||||
</p>
|
||||
<p>
|
||||
Antworte auf die Frage
|
||||
<i>"Welche Themen interessieren dich besonders?"</i> <br />
|
||||
mit: <br />
|
||||
<i
|
||||
>"Ich habe Interesse an Natural Language Processing und
|
||||
Künstliche Intelligenz im Medizinischen Bereich."</i
|
||||
>
|
||||
</p>
|
||||
<p>
|
||||
Antworte auf die Frage
|
||||
<i>"Welche Kurse hast du in der Vergangenheit belegt?"</i>
|
||||
<br />mit:<br />
|
||||
<i
|
||||
>"Ich habe bereits die Kurse Grundlagen der Neuronalen Netze
|
||||
(GNN) und Machine Learning belegt"</i
|
||||
>"
|
||||
</p>
|
||||
<p>
|
||||
Anworte auf die Frage <br />
|
||||
<i
|
||||
>"Welche Art von Karriere möchtest du nach dem Studium
|
||||
anstreben?"</i
|
||||
>
|
||||
<br />mit:<br />
|
||||
<i
|
||||
>"Ich möchte als Softwareentwickler arbeiten. Dabei möchte ich
|
||||
insbesondere AI Systeme entwickeln."</i
|
||||
>
|
||||
</p>
|
||||
Nun solltest du nach kurzer Zeit (ca. 30 Sekunden) eine Liste an
|
||||
Modulen erhalten.
|
||||
<br />
|
||||
<br />
|
||||
<p><b>Aufgabe 1.2</b></p>
|
||||
|
||||
Wiederhole den Prozess. Dabei kannst du nun frei Wählen, nach
|
||||
welchen Modulen du suchen möchtest.
|
||||
<br />
|
||||
<br />
|
||||
<div class="aufgabe-title">Aufgabe 2: Studienprüfungsordnung</div>
|
||||
<p><b> Aufgabe 2.1</b></p>
|
||||
<p>
|
||||
Frage den Chatbot: <i>"Wie kann man Elternzeit beantragen?"</i>
|
||||
</p>
|
||||
<p><b> Aufgabe 2.2 </b></p>
|
||||
<p>
|
||||
Wiederhole den Prozess. Dabei kannst du die Frage über inhaltliche
|
||||
Themen in der Studienprüfungsordnung frei Formulieren.
|
||||
</p>
|
||||
<br />
|
||||
<div class="aufgabe-title">
|
||||
Aufgabe 3: Experten für ein Fachgebiet finden
|
||||
</div>
|
||||
<p><b> Aufgabe 3.1 </b></p>
|
||||
<p>
|
||||
Der Chatbot kann Experten aus der Faktuktät für Informatik für
|
||||
eine gegebene Fachrichtung identifizieren und vorschlagen.
|
||||
</p>
|
||||
<p>
|
||||
Schreibe dem Chatbot die Nachricht "Expertensuche durchführen"
|
||||
oder klicke auf den Button.
|
||||
</p>
|
||||
<p>
|
||||
Antworte auf die Frage <br />
|
||||
<br />
|
||||
<i>"In welchem Bereich suchst du einen Experten?"</i> <br />
|
||||
<br />
|
||||
mit:
|
||||
</p>
|
||||
<p>
|
||||
"<i
|
||||
>Ich suche jemanden der sich im Bereich Natural Language
|
||||
Processing auskennt und Machine Learning. Insbesondere für den
|
||||
Fachbereich Medizin."</i
|
||||
>
|
||||
</p>
|
||||
<p>
|
||||
Der Chatbot sucht nun mittels KI Techniken nach passenden
|
||||
Expertern. Dieser Prozess kann bis zu 30 Sekunden dauern
|
||||
</p>
|
||||
<p><b> Aufgabe 3.2 </b></p>
|
||||
<p>
|
||||
Wiederhole den Prozess. Dabei kannst du nun frei Wählen, in
|
||||
welchem Fachbereich du Experten suchen möchtest.
|
||||
</p>
|
||||
<br />
|
||||
<br />
|
||||
<div class="aufgabe-title">
|
||||
Aufgabe 4: Allgemeine Hochschulfragen
|
||||
</div>
|
||||
<p><b>Aufgabe 4.1</b></p>
|
||||
<p>
|
||||
Bitte den Chatbot die etwas über Mars oder Inno.space zu erzählen:
|
||||
<i>"Erzähle mir etwas über Mars"</i> <br />
|
||||
<i>"Wo finde ich den Mars-Raum?"</i>
|
||||
<br />
|
||||
Frage anschließend nach, wo du einen 3D Drucker findest.
|
||||
</p>
|
||||
<p><b>Aufgabe 4.2</b></p>
|
||||
|
||||
Wiederhole den Prozess. Dabei kannst du nun frei Wählen wonach du
|
||||
Fragen möchtest
|
||||
<br />
|
||||
<br />
|
||||
<div class="aufgabe-title">
|
||||
Vielen Dank für die Teilnahme! <br />
|
||||
|
||||
Bitte fülle nun die folgende Umfrage aus:
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="sample-questions">
|
||||
<h3>Beispielfragen:</h3>
|
||||
|
||||
<h4>Allgemeine Fragen über die Hochschule:</h4>
|
||||
<ul>
|
||||
<li>Erzähle mir etwas über Mars</li>
|
||||
<li>Erzähle mir etwas über Inno Space</li>
|
||||
<li>Wo befindet sich der Mars-Raum?</li>
|
||||
<li>Wo finde ich einen 3D-Drucker?</li>
|
||||
<li>Wo finde ich den Lehrveranstaltungsplan?</li>
|
||||
<li>Welche Studienangebote hat die Hochschule?</li>
|
||||
<li>Wie setze ich mein Passwort für das Hochschulportal zurück?</li>
|
||||
<li>
|
||||
An wen kann ich mich wenden, wenn ich mein Passwort und meine
|
||||
zentrale Kennung vergessen habe?
|
||||
</li>
|
||||
<li>
|
||||
An wen kann ich mich wenden, wenn ich die Matrikelnummer vergessen
|
||||
habe?
|
||||
</li>
|
||||
<li>Erzähle mir etwas über Mars</li>
|
||||
<li>Wie beantrage ich Elterngeld?</li>
|
||||
<li>Wo befindet sich der Lasercutter?</li>
|
||||
</ul>
|
||||
|
||||
<h4>Studienprüfungsordnung Fragen:</h4>
|
||||
<ul>
|
||||
<li>Darf ich die Frist der Bachelorthesis verlängern?</li>
|
||||
<li>Wieviele Präsenztage muss ich im Pflichtpraktikum ableisten?</li>
|
||||
<li>
|
||||
Was sind die Voraussetzungen für das Praktische Studiensemester?
|
||||
</li>
|
||||
<li>Welche Vorteile haben Studierende mit Kindern?</li>
|
||||
<li>Wie kann man Elternzeit beantragen?</li>
|
||||
<li>Muss ich bei einer Online Prüfung meine Identität nachweisen?</li>
|
||||
<li>
|
||||
In welchem Absatz finde ich Informationen zu "Muss ich bei einer
|
||||
Online Prüfung meine Identität nachweisen?"
|
||||
</li>
|
||||
<li>Wie oft darf ich eine Prüfung wiederholen?</li>
|
||||
<li>
|
||||
Wann wird mein Zeugnis, nach bestehen der Bachelorarbeit,
|
||||
ausgestellt?
|
||||
</li>
|
||||
<li>Wie lange dauert die Ausstellung meines Zeugnisses?</li>
|
||||
<li>Was weißt du über Mutterschutz im Studium?</li>
|
||||
<li>
|
||||
Wie lange werden meine Daten und Prüfungsleistungen von der
|
||||
Hochschule gepeichert?
|
||||
</li>
|
||||
<li>Welche Zuständigkeiten hat der Prüfungsausschuss?</li>
|
||||
</ul>
|
||||
</div>
|
||||
</div>
|
||||
<script>
|
||||
document.addEventListener("DOMContentLoaded", function () {
|
||||
localStorage.removeItem("chat_session");
|
||||
});
|
||||
document
|
||||
.getElementById("resetChat")
|
||||
.addEventListener("click", function () {
|
||||
window.location.reload();
|
||||
});
|
||||
!(function () {
|
||||
let e = document.createElement("script"),
|
||||
t = document.head || document.getElementsByTagName("head")[0];
|
||||
(e.src =
|
||||
"https://cdn.jsdelivr.net/npm/rasa-webchat@1.x.x/lib/index.js"),
|
||||
// Replace 1.x.x with the version that you want
|
||||
(e.async = !0),
|
||||
(e.onload = () => {
|
||||
window.WebChat.default(
|
||||
{
|
||||
customData: { language: "en" },
|
||||
hideWhenNotConnected: false,
|
||||
|
||||
initPayload:
|
||||
'/get_started{"reader_model":"GPT","retrieval_method_or_model":"distilbert", "rerank":false}',
|
||||
title: "Chatbot HSMA",
|
||||
socketUrl: "http://localhost:5005",
|
||||
},
|
||||
null
|
||||
);
|
||||
setTimeout(() => {
|
||||
const launcher = document.querySelector(".rw-launcher");
|
||||
const localStorageChatSession =
|
||||
localStorage.getItem("chat_session");
|
||||
const data = JSON.parse(localStorageChatSession);
|
||||
if (launcher && !data?.params?.isChatOpen) {
|
||||
launcher.click();
|
||||
}
|
||||
}, 4000);
|
||||
}),
|
||||
t.insertBefore(e, t.firstChild);
|
||||
})();
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
|
@ -0,0 +1,14 @@
|
|||
# Extend the official Rasa SDK image
|
||||
FROM rasa/rasa-sdk:3.6.2
|
||||
|
||||
# Use subdirectory as working directory
|
||||
WORKDIR /app
|
||||
COPY actions/requirements-actions.txt ./
|
||||
# Change back to root user to install dependencies
|
||||
USER root
|
||||
RUN pip install -r requirements-actions.txt
|
||||
# Copy actions folder to working directory
|
||||
COPY ./actions /app/actions
|
||||
|
||||
# By best practices, don't run the code with root user
|
||||
USER root
|
|
@ -0,0 +1,720 @@
|
|||
"""
|
||||
This file contains a series of custom actions for a Rasa-based chatbot, designed to interact with a question-answering component.
|
||||
The actions cater to various scenarios including expert searches, module recommendations (WPM Recommendations),
|
||||
and providing answers related to the Studienprüfungsordnung (examination regulations) and crawled web data from Hochschule Mannheim.
|
||||
The actions are structured to check slot values for different parameters like reader model, retrieval model, and reranking, allowing for dynamic configuration during runtime.
|
||||
These actions enable the chatbot to adapt its response strategy based on user input and context.
|
||||
Additionally, the chatbot includes a feedback mechanism and a function to provide top-k document references, enhancing user interaction and information retrieval effectiveness.
|
||||
Some actions are equipped with fallback mechanisms to search in alternative indexes if the primary search doesn't yield an answer,
|
||||
ensuring the bot can still provide relevant information in various situations.
|
||||
"""
|
||||
import datetime
|
||||
from typing import Any, Text, Dict, List
|
||||
from rasa_sdk import Action, Tracker
|
||||
from rasa_sdk.executor import CollectingDispatcher
|
||||
from rasa_sdk.events import FollowupAction, SlotSet, ReminderScheduled
|
||||
import requests
|
||||
import json
|
||||
from .helper.credit_mappings import CREDITS
|
||||
import os
|
||||
|
||||
BACKEND_HOST = os.environ.get("FLASK_HOST", "localhost")
|
||||
|
||||
FEEDBACK_BUTTONS = [
|
||||
{"title": "👍", "payload": "/liked_answer"},
|
||||
{"title": "👎", "payload": "/disliked_answer"},
|
||||
]
|
||||
COULD_NOT_FIND_ANSWER = "Ich konnte keine Antwort finden"
|
||||
NO_ANSWER = "Keine Antwort"
|
||||
NO_INFO = "keine Information"
|
||||
|
||||
|
||||
def get_intent_before_last(tracker: Tracker) -> str:
|
||||
user_events = [event for event in tracker.events if event.get("event") == "user"]
|
||||
if len(user_events) < 2:
|
||||
return None
|
||||
intent_before_last = (
|
||||
user_events[-2].get("parse_data", {}).get("intent", {}).get("name")
|
||||
)
|
||||
return intent_before_last
|
||||
|
||||
|
||||
def get_last_executed_action(tracker: Tracker, domain) -> str:
|
||||
lastAction = next(
|
||||
e["name"]
|
||||
for e in reversed(tracker.events)
|
||||
if e["event"] == "action"
|
||||
and "name" in e
|
||||
and (e["name"] in domain["actions"] or e["name"] in domain["forms"])
|
||||
and e["name"] != "action_set_reminder"
|
||||
)
|
||||
return lastAction
|
||||
|
||||
|
||||
def extract_answer_from_response(response, reader_model):
|
||||
if reader_model == "GPT":
|
||||
answer = response.json()["answer"]["choices"][0]["message"]["content"]
|
||||
else:
|
||||
answer = response.json()["answer"]["answers"][0]["answer"]
|
||||
return answer
|
||||
|
||||
|
||||
class ActionGreet(Action):
|
||||
"""Sends a greeting message to the user and follows up with 'action_listen'."""
|
||||
def name(self) -> Text:
|
||||
return "action_greet"
|
||||
|
||||
def run(
|
||||
self,
|
||||
dispatcher: CollectingDispatcher,
|
||||
tracker: Tracker,
|
||||
domain: Dict[Text, Any],
|
||||
) -> List[Dict[Text, Any]]:
|
||||
dispatcher.utter_message(response="utter_greet")
|
||||
return [FollowupAction("action_listen")]
|
||||
|
||||
|
||||
class ActionGetCreditsForModule(Action):
|
||||
"""Fetches and responds with credit information for a specific module based on user query."""
|
||||
def name(self) -> Text:
|
||||
return "action_get_credits"
|
||||
|
||||
def run(
|
||||
self,
|
||||
dispatcher: CollectingDispatcher,
|
||||
tracker: Tracker,
|
||||
domain: Dict[Text, Any],
|
||||
) -> List[Dict[Text, Any]]:
|
||||
entities = tracker.latest_message["entities"]
|
||||
print(entities, flush=True)
|
||||
module = next(
|
||||
(
|
||||
entity["value"]
|
||||
for entity in entities
|
||||
if "entity" in entity and "module" == entity["entity"]
|
||||
),
|
||||
None,
|
||||
)
|
||||
print(module, flush=True)
|
||||
print("entity", flush=True)
|
||||
if module:
|
||||
payload = json.dumps({"module": module + "\n"})
|
||||
headers = {
|
||||
"Content-Type": "application/json",
|
||||
}
|
||||
resp = requests.post(
|
||||
f"http://{BACKEND_HOST}:8000/get_module_credits",
|
||||
headers=headers,
|
||||
data=payload,
|
||||
)
|
||||
result = resp.json()
|
||||
found_credits = None
|
||||
if result:
|
||||
print(result, flush=True)
|
||||
found_module = result[0]
|
||||
found_title = found_module["meta"]["title"].strip()
|
||||
if "credits" in found_module["meta"]:
|
||||
found_credits = found_module["meta"]["credits"]
|
||||
if not found_credits and module in CREDITS:
|
||||
found_credits = CREDITS[module] + " credits"
|
||||
if found_credits:
|
||||
dispatcher.utter_message(
|
||||
f"Das Module: **{found_title}** gibt {found_credits} "
|
||||
)
|
||||
return []
|
||||
|
||||
dispatcher.utter_message(
|
||||
f"Konnte zu dem Module: {module} keine Informationen finden..."
|
||||
)
|
||||
|
||||
return []
|
||||
|
||||
|
||||
class ActionRecommendModule(Action):
|
||||
"""Provides module recommendations based on user's interests, career goals, and previous courses."""
|
||||
def name(self) -> Text:
|
||||
return "action_recommend_module"
|
||||
|
||||
def run(
|
||||
self,
|
||||
dispatcher: CollectingDispatcher,
|
||||
tracker: Tracker,
|
||||
domain: Dict[Text, Any],
|
||||
) -> List[Dict[Text, Any]]:
|
||||
events = []
|
||||
semester = tracker.get_slot("semester")
|
||||
interests = tracker.get_slot("interests")
|
||||
previous_courses = tracker.get_slot("previous_courses")
|
||||
future_carrer = tracker.get_slot("future_carrer")
|
||||
retrieval_method_or_model = (
|
||||
tracker.get_slot("retrieval_method_or_model") or "ada"
|
||||
)
|
||||
rerank = (
|
||||
tracker.get_slot("rerank")
|
||||
if tracker.get_slot("rerank") is not None
|
||||
else False
|
||||
)
|
||||
reader_model = tracker.get_slot("reader_model") or "GPT"
|
||||
payload = json.dumps(
|
||||
{
|
||||
"interests": interests,
|
||||
"future_carrer": future_carrer,
|
||||
"previous_courses": previous_courses,
|
||||
"index": "ib",
|
||||
"retrieval_method_or_model": retrieval_method_or_model,
|
||||
"recommendation_method": "generate_llm_answer" if reader_model == "GPT" else "generate_farm_reader_answer",
|
||||
"rerank_retrieved_results": rerank,
|
||||
}
|
||||
)
|
||||
headers = {
|
||||
"Content-Type": "application/json",
|
||||
}
|
||||
resp = requests.post(
|
||||
f"http://{BACKEND_HOST}:8080/recommend_wpms",
|
||||
headers=headers,
|
||||
data=payload,
|
||||
)
|
||||
if reader_model =="Bert":
|
||||
courses = resp.json()
|
||||
print(courses,'courses')
|
||||
if courses:
|
||||
dispatcher.utter_message(
|
||||
"Basierend auf deinen Interessen kann ich dir folgende Wahlpflichtmodule empfehlen:"
|
||||
)
|
||||
title = ""
|
||||
profs = ""
|
||||
credits = ""
|
||||
semester = None
|
||||
uttered_modules = 0
|
||||
for course in courses[:3]:
|
||||
meta = course.get("meta", {})
|
||||
is_wmp = meta.get("is_wpm")
|
||||
if not is_wmp:
|
||||
continue
|
||||
title = meta.get("name_de")
|
||||
description = meta.get("inhalte_de")
|
||||
credits = meta.get("credits")
|
||||
profs = meta.get("dozenten")
|
||||
semester = meta.get("semester")
|
||||
if uttered_modules > 3:
|
||||
break
|
||||
if (
|
||||
semester
|
||||
and semester in ["6", "7", "6/7"]
|
||||
and title.strip()
|
||||
not in ["Bachelorarbeit (BA)", "Wissenschaftliches Arbeiten (WIA)"]
|
||||
):
|
||||
dispatcher.utter_message(
|
||||
text=f"**Titel** : {title}\n **ECTS**: {credits}\n **Beschreibung**:\n{description}\n\n **Dozenten**:{profs}"
|
||||
)
|
||||
uttered_modules += 1
|
||||
if uttered_modules == 0:
|
||||
dispatcher.utter_message(
|
||||
"Basierend auf deinen Anfragen konnte ich leider keine Wahlpflichtmodule finden"
|
||||
)
|
||||
else:
|
||||
dispatcher.utter_message(text="Haben dir die Empfehlungen gefallen?", buttons= FEEDBACK_BUTTONS )
|
||||
|
||||
else:
|
||||
answer = resp.json()["answer"]["choices"][0]["message"]["content"]
|
||||
events.append(SlotSet("wpm_recommendation_answer", answer))
|
||||
dispatcher.utter_message(text=answer, buttons=FEEDBACK_BUTTONS)
|
||||
return events
|
||||
|
||||
|
||||
class ActionAnswerStupo(Action):
|
||||
"""Finds and provides answers related to the Studienprüfungsordnung based on the user's query."""
|
||||
def name(self) -> Text:
|
||||
return "action_answer_stupo"
|
||||
|
||||
def run(
|
||||
self,
|
||||
dispatcher: CollectingDispatcher,
|
||||
tracker: Tracker,
|
||||
domain: Dict[Text, Any],
|
||||
) -> List[Dict[Text, Any]]:
|
||||
url = f"http://{BACKEND_HOST}:8080/get_answer"
|
||||
retrieval_method_or_model = (
|
||||
tracker.get_slot("retrieval_method_or_model") or "ada"
|
||||
)
|
||||
reader_model = tracker.get_slot("reader_model") or "GPT"
|
||||
rerank = (
|
||||
tracker.get_slot("rerank")
|
||||
if tracker.get_slot("rerank") is not None
|
||||
else False
|
||||
)
|
||||
|
||||
events = []
|
||||
buttons = []
|
||||
latest_message = tracker.latest_message
|
||||
index = "stupo"
|
||||
user_text = None
|
||||
if latest_message:
|
||||
user_text = latest_message["text"]
|
||||
payload = {
|
||||
"query": user_text,
|
||||
"index": index,
|
||||
"retrieval_method_or_model": retrieval_method_or_model,
|
||||
"reader_model": reader_model,
|
||||
"rerank_documents": rerank,
|
||||
}
|
||||
|
||||
headers = {
|
||||
"Content-Type": "application/json",
|
||||
"Authorization": "Basic Og==",
|
||||
}
|
||||
response = requests.request(
|
||||
"POST", url, headers=headers, data=json.dumps(payload)
|
||||
)
|
||||
buttons.extend(FEEDBACK_BUTTONS)
|
||||
buttons.append(
|
||||
{"title": "Liste die Referenzen auf.", "payload": "/ask_for_references"}
|
||||
)
|
||||
answer = extract_answer_from_response(
|
||||
reader_model=reader_model, response=response
|
||||
)
|
||||
if answer is not None and (
|
||||
COULD_NOT_FIND_ANSWER in answer
|
||||
or NO_ANSWER in answer
|
||||
or NO_INFO in answer
|
||||
):
|
||||
dispatcher.utter_message(
|
||||
"Ich konnte keine Antwort in der Studienprüfungsordnung finden..."
|
||||
)
|
||||
dispatcher.utter_message(
|
||||
"Ich suche nun nach Informationen auf den Hochschulseiten..."
|
||||
)
|
||||
index = "crawled_hsma"
|
||||
payload["index"] = index
|
||||
response = requests.request(
|
||||
"POST", url, headers=headers, data=json.dumps(payload)
|
||||
)
|
||||
answer = extract_answer_from_response(
|
||||
reader_model=reader_model, response=response
|
||||
)
|
||||
|
||||
dispatcher.utter_message(
|
||||
text=answer,
|
||||
buttons=buttons,
|
||||
)
|
||||
events.append(SlotSet("query_stupo", user_text))
|
||||
events.append(SlotSet("answer_stupo", answer))
|
||||
events.append(
|
||||
SlotSet("retrieval_method_or_model", retrieval_method_or_model)
|
||||
)
|
||||
events.append(SlotSet("reader_model", reader_model))
|
||||
events.append(SlotSet("last_searched_index", index))
|
||||
references = response.json().get("documents")
|
||||
if references:
|
||||
events.append(SlotSet("references", references))
|
||||
return events
|
||||
|
||||
|
||||
class ActionAskAboutCrawledHSMAData(Action):
|
||||
"""Addresses general inquiries by searching through crawled data from the Hochschule Mannheim website."""
|
||||
|
||||
def name(self) -> Text:
|
||||
return "action_ask_about_crawled_hsma_data"
|
||||
|
||||
def run(
|
||||
self,
|
||||
dispatcher: CollectingDispatcher,
|
||||
tracker: Tracker,
|
||||
domain: Dict[Text, Any],
|
||||
) -> List[Dict[Text, Any]]:
|
||||
url = f"http://{BACKEND_HOST}:8080/get_answer"
|
||||
latest_message = tracker.latest_message
|
||||
retrieval_method_or_model = (
|
||||
tracker.get_slot("retrieval_method_or_model") or "ada"
|
||||
)
|
||||
reader_model = tracker.get_slot("reader_model") or "GPT"
|
||||
rerank = (
|
||||
tracker.get_slot("rerank")
|
||||
if tracker.get_slot("rerank") is not None
|
||||
else False
|
||||
)
|
||||
user_text = None
|
||||
index = "crawled_hsma"
|
||||
events = []
|
||||
buttons = []
|
||||
if latest_message:
|
||||
user_text = latest_message["text"]
|
||||
payload = {
|
||||
"query": user_text,
|
||||
"index": index,
|
||||
"retrieval_method_or_model": retrieval_method_or_model,
|
||||
"reader_model": reader_model,
|
||||
"rerank_documents": rerank,
|
||||
}
|
||||
|
||||
headers = {
|
||||
"Authorization": "Basic Og==",
|
||||
"Content-Type": "application/json",
|
||||
}
|
||||
response = requests.request(
|
||||
"POST", url, headers=headers, data=json.dumps(payload)
|
||||
)
|
||||
references = response.json().get("documents")
|
||||
if references:
|
||||
events.append(SlotSet("references", references))
|
||||
buttons.extend(FEEDBACK_BUTTONS)
|
||||
buttons.append(
|
||||
{"title": "Liste die Referenzen auf", "payload": "/ask_for_references"}
|
||||
)
|
||||
answer = extract_answer_from_response(
|
||||
reader_model=reader_model, response=response
|
||||
)
|
||||
if answer is not None and (
|
||||
COULD_NOT_FIND_ANSWER in answer
|
||||
or NO_ANSWER in answer
|
||||
or NO_INFO in answer
|
||||
):
|
||||
dispatcher.utter_message(
|
||||
"Ich konnte keine Antwort in auf den Hochschulseiten finden..."
|
||||
)
|
||||
dispatcher.utter_message(
|
||||
"Ich suche nun nach Informationen in der Studienprüfungsordnung..."
|
||||
)
|
||||
index = "stupo"
|
||||
payload["index"] = index
|
||||
response = requests.request(
|
||||
"POST", url, headers=headers, data=json.dumps(payload)
|
||||
)
|
||||
answer = extract_answer_from_response(
|
||||
reader_model=reader_model, response=response
|
||||
)
|
||||
|
||||
dispatcher.utter_message(
|
||||
text=answer,
|
||||
buttons=buttons,
|
||||
)
|
||||
events.append(SlotSet("query_crawled_data", user_text))
|
||||
events.append(SlotSet("answer_crawled_data", answer))
|
||||
events.append(
|
||||
SlotSet("retrieval_method_or_model", retrieval_method_or_model)
|
||||
)
|
||||
events.append(SlotSet("reader_model", reader_model))
|
||||
return events
|
||||
|
||||
|
||||
class ActionProvideReferences(Action):
|
||||
"""Lists references related to the user's last query from either StuPO or crawled data."""
|
||||
|
||||
def name(self) -> Text:
|
||||
return "action_provide_references"
|
||||
|
||||
def run(
|
||||
self,
|
||||
dispatcher: CollectingDispatcher,
|
||||
tracker: Tracker,
|
||||
domain: Dict[Text, Any],
|
||||
) -> List[Dict[Text, Any]]:
|
||||
references = tracker.get_slot("references")
|
||||
intent_before_last = get_intent_before_last(tracker=tracker)
|
||||
header_text = "Hier sind die Referenzen: \n"
|
||||
if intent_before_last == "stupo_question":
|
||||
header_text = "Hier sind die Referenzen aus der [StuPO](https://www.hs-mannheim.de/fileadmin/user_upload/hauptseite/pdf/SCS/Satzungen/SPO/Bachelor/230824_SPO_Bachelor.pdf): \n"
|
||||
|
||||
# Max length for each reference content
|
||||
MAX_LEN = 150 # Adjust based on your requirements
|
||||
|
||||
markdown_references = []
|
||||
for ref in references:
|
||||
content = ref.get("content", "")
|
||||
meta = ref.get("meta")
|
||||
truncated_content = (
|
||||
(content[:MAX_LEN] + "...") if len(content) > MAX_LEN else content
|
||||
)
|
||||
|
||||
# Check if a URL is provided
|
||||
url = meta.get("url", None)
|
||||
if url:
|
||||
markdown_references.append(f"- {truncated_content} [Link]({url})")
|
||||
else:
|
||||
markdown_references.append(f"- {truncated_content}")
|
||||
|
||||
# Join the references and dispatch them in markdown format
|
||||
markdown_message = header_text + "\n".join(markdown_references)
|
||||
dispatcher.utter_message(markdown_message)
|
||||
|
||||
return []
|
||||
|
||||
|
||||
class ActionExampleStupoQuestions(Action):
|
||||
"""Offers example questions related to Studienprüfungsordnung for user guidance."""
|
||||
def name(self) -> Text:
|
||||
return "action_provide_stupo_example_questions"
|
||||
|
||||
def run(
|
||||
self,
|
||||
dispatcher: CollectingDispatcher,
|
||||
tracker: Tracker,
|
||||
domain: Dict[Text, Any],
|
||||
) -> List[Dict[Text, Any]]:
|
||||
events = []
|
||||
buttons = [
|
||||
{"title": question, "payload": question}
|
||||
for question in [
|
||||
"Wie kann man Elternzeit beantragen?",
|
||||
"Darf ich die Frist der Bachelorthesis verlängern?",
|
||||
"Welche Vorteile haben Studierende mit Kindern?",
|
||||
"Wieviele Präsenztage muss ich im Praktischen Studiensemester ableisten?",
|
||||
"Wie lange werden meine Daten und Prüfungsleistungen von der Hochschule gepeichert?",
|
||||
]
|
||||
]
|
||||
text = "Ich kann versuchen, dir bei inhaltichen Fragen über die StuPo zu helfen.\n Einige Beispielfragen:"
|
||||
dispatcher.utter_message(
|
||||
text=text,
|
||||
buttons=buttons,
|
||||
)
|
||||
return events
|
||||
|
||||
|
||||
class ActionExampleGeneralQuestions(Action):
|
||||
"""Presents sample general questions about Hochschule Mannheim to assist the user."""
|
||||
def name(self) -> Text:
|
||||
return "action_provide_general_example_questions"
|
||||
|
||||
def run(
|
||||
self,
|
||||
dispatcher: CollectingDispatcher,
|
||||
tracker: Tracker,
|
||||
domain: Dict[Text, Any],
|
||||
) -> List[Dict[Text, Any]]:
|
||||
buttons = [
|
||||
{"title": question, "payload": question}
|
||||
for question in [
|
||||
"Wo finde ich einen 3D-Drucker?",
|
||||
"Wo befindet sich der lasercutter?",
|
||||
"Wie setze ich mein Passwort für das Hochschulportal zurück?",
|
||||
"An wen kann ich mich wenden, wenn ich mein Passwort und meine zentrale Kennung vergessen habe?",
|
||||
"Wo befindet sich der Mars-Raum?",
|
||||
]
|
||||
]
|
||||
text = "Ich kann versuchen, dir bei allgemeinen Fragen über die Hochschule zu helfen.\n Einige Beispielfragen:"
|
||||
dispatcher.utter_message(
|
||||
text=text,
|
||||
buttons=buttons,
|
||||
)
|
||||
return []
|
||||
|
||||
|
||||
class ActionExpertSearch(Action):
|
||||
"""Conducts an expert search based on the user's query and provides relevant results."""
|
||||
def name(self) -> Text:
|
||||
return "action_expert_search"
|
||||
|
||||
def run(
|
||||
self,
|
||||
dispatcher: CollectingDispatcher,
|
||||
tracker: Tracker,
|
||||
domain: Dict[Text, Any],
|
||||
) -> List[Dict[Text, Any]]:
|
||||
url = f"http://{BACKEND_HOST}:8080/search_experts"
|
||||
buttons = [*FEEDBACK_BUTTONS]
|
||||
events = []
|
||||
query = tracker.get_slot("expert_search_query")
|
||||
events.append(SlotSet("expert_search_query", query))
|
||||
events.append(SlotSet("retrieval_method_or_model", "mpnet"))
|
||||
events.append(SlotSet("reader_model", "GPT"))
|
||||
events.append(FollowupAction("action_listen"))
|
||||
|
||||
retrieval_method_or_model = (
|
||||
tracker.get_slot("retrieval_method_or_model") or "ada"
|
||||
)
|
||||
reader_model = tracker.get_slot("reader_model") or "GPT"
|
||||
|
||||
print(reader_model,'reader_model')
|
||||
rerank = (
|
||||
tracker.get_slot("rerank")
|
||||
if tracker.get_slot("rerank") is not None
|
||||
else False
|
||||
)
|
||||
payload = json.dumps(
|
||||
{
|
||||
"query": query,
|
||||
"index": "stupo",
|
||||
"retriever_model": retrieval_method_or_model,
|
||||
"generate_answer": True,
|
||||
"rerank_retrieved_results": rerank,
|
||||
"search_method": "classic_retriever_reader" if reader_model =="GPT" else "retriever_farm_reader"
|
||||
}
|
||||
)
|
||||
headers = {"Authorization": "Basic Og==", "Content-Type": "application/json"}
|
||||
response = requests.request("POST", url, headers=headers, data=payload)
|
||||
if response is not None:
|
||||
response_json: Dict = response.json()
|
||||
if reader_model =="Bert":
|
||||
if response_json:
|
||||
dispatcher.utter_message(
|
||||
"Basierend auf deinen Interessen kann ich dir folgende Experten sortiert nach Relevanz empfehlen:"
|
||||
)
|
||||
expert = ""
|
||||
title_work = None
|
||||
uttered_experts = []
|
||||
for doc in response_json:
|
||||
meta = doc.get("meta", {})
|
||||
expert = meta.get("author")
|
||||
title = meta.get("title")
|
||||
description = meta.get("abstract")
|
||||
if len(uttered_experts) > 3:
|
||||
break
|
||||
if (
|
||||
expert and title and expert not in uttered_experts
|
||||
):
|
||||
dispatcher.utter_message(
|
||||
text= f"**Experte**: {expert}\n **Relevantes Paper** :{title}\n"
|
||||
)
|
||||
uttered_experts.append(expert)
|
||||
if len(uttered_experts) == 0:
|
||||
dispatcher.utter_message(
|
||||
"Basierend auf deinen Anfragen konnte ich leider keine Experten finden"
|
||||
)
|
||||
else:
|
||||
dispatcher.utter_message(text="Hat dir die Expertensuche gefallen?", buttons= FEEDBACK_BUTTONS )
|
||||
return events
|
||||
answer= extract_answer_from_response(response=response, reader_model=reader_model)
|
||||
documents = response_json.get("documents")
|
||||
if answer:
|
||||
events.append(
|
||||
SlotSet("expert_search_answer", answer)
|
||||
)
|
||||
answer = answer
|
||||
dispatcher.utter_message(
|
||||
text=answer,
|
||||
buttons=buttons,
|
||||
)
|
||||
return events
|
||||
|
||||
|
||||
class ActionHandleFeedback(Action):
|
||||
"""Collects user feedback on the bot's responses and forwards it for processing."""
|
||||
|
||||
def name(self) -> Text:
|
||||
return "action_handle_feedback"
|
||||
|
||||
def run(
|
||||
self,
|
||||
dispatcher: CollectingDispatcher,
|
||||
tracker: Tracker,
|
||||
domain: Dict[Text, Any],
|
||||
) -> List[Dict[Text, Any]]:
|
||||
last_intent = tracker.get_intent_of_latest_message()
|
||||
last_action = get_last_executed_action(tracker=tracker, domain=domain)
|
||||
query_stupo = tracker.get_slot("query_stupo")
|
||||
answer_stupo = tracker.get_slot("answer_stupo")
|
||||
query_crawled_data = tracker.get_slot("query_crawled_data")
|
||||
answer_crawled_data = tracker.get_slot("answer_crawled_data")
|
||||
expert_search_query = tracker.get_slot("expert_search_query")
|
||||
expert_search_answer = tracker.get_slot("expert_search_answer")
|
||||
retrieval_method_or_model = tracker.get_slot("retrieval_method_or_model")
|
||||
future_carrer = tracker.get_slot("future_carrer")
|
||||
interests = tracker.get_slot("interests")
|
||||
previous_courses = tracker.get_slot("previous_courses")
|
||||
reader_model = tracker.get_slot("reader_model")
|
||||
wpm_recommendation_answer = tracker.get_slot("wpm_recommendation_answer")
|
||||
last_searched_index = tracker.get_slot("last_searched_index")
|
||||
payload = {}
|
||||
if last_action == "action_ask_about_crawled_hsma_data":
|
||||
payload = json.dumps(
|
||||
{
|
||||
"type": "crawled_data",
|
||||
"user_queston": query_crawled_data,
|
||||
"provided_answer": answer_crawled_data,
|
||||
"retrieval_method_or_model": retrieval_method_or_model,
|
||||
"reader_model": reader_model,
|
||||
"feedback": last_intent,
|
||||
"last_searched_index": last_searched_index,
|
||||
}
|
||||
)
|
||||
if last_action == "action_answer_stupo":
|
||||
payload = json.dumps(
|
||||
{
|
||||
"type": "stupo",
|
||||
"user_queston": query_stupo,
|
||||
"provided_answer": answer_stupo,
|
||||
"retrieval_method_or_model": retrieval_method_or_model,
|
||||
"reader_model": reader_model,
|
||||
"feedback": last_intent,
|
||||
"last_searched_index": last_searched_index,
|
||||
}
|
||||
)
|
||||
if last_action == "action_expert_search":
|
||||
payload = json.dumps(
|
||||
{
|
||||
"type": "expert_search",
|
||||
"user_queston": expert_search_query,
|
||||
"provided_answer": expert_search_answer,
|
||||
"retrieval_method_or_model": retrieval_method_or_model,
|
||||
"reader_model": "GPT",
|
||||
"feedback": last_intent,
|
||||
}
|
||||
)
|
||||
if last_action == "action_recommend_module":
|
||||
payload = json.dumps(
|
||||
{
|
||||
"type": "wpm_recommendation",
|
||||
"user_queston": f"future_carrer:{future_carrer}\n interests: {interests}\n previous_courses:{previous_courses}",
|
||||
"provided_answer": wpm_recommendation_answer,
|
||||
"reader_model": "GPT",
|
||||
"retrieval_method_or_model": retrieval_method_or_model,
|
||||
"feedback": last_intent,
|
||||
}
|
||||
)
|
||||
headers = {
|
||||
"Content-Type": "application/json",
|
||||
}
|
||||
resp = requests.post(
|
||||
f"http://{BACKEND_HOST}:8080/feedback",
|
||||
headers=headers,
|
||||
data=payload,
|
||||
)
|
||||
dispatcher.utter_message(text="Vielen Dank für das Feedback!")
|
||||
dispatcher.utter_message(response="utter_how_can_i_help")
|
||||
return [FollowupAction("action_listen")]
|
||||
|
||||
|
||||
class ActionSetReminder(Action):
|
||||
"""Sets a reminder for the user based on their request with a default one-minute delay."""
|
||||
|
||||
def name(self) -> Text:
|
||||
return "action_set_reminder"
|
||||
|
||||
async def run(
|
||||
self,
|
||||
dispatcher: CollectingDispatcher,
|
||||
tracker: Tracker,
|
||||
domain: Dict[Text, Any],
|
||||
) -> List[Dict[Text, Any]]:
|
||||
date = datetime.datetime.now() + datetime.timedelta(minutes=1)
|
||||
entities = tracker.latest_message.get("entities")
|
||||
|
||||
reminder = ReminderScheduled(
|
||||
"EXTERNAL_reminder",
|
||||
trigger_date_time=date,
|
||||
entities=entities,
|
||||
name="my_reminder",
|
||||
kill_on_user_message=True,
|
||||
)
|
||||
|
||||
return [reminder]
|
||||
|
||||
|
||||
class ActionResetSlots(Action):
|
||||
"""Resets conversation slots to clear stored values from previous interactions."""
|
||||
def name(self) -> Text:
|
||||
return "action_reset_slots"
|
||||
|
||||
def run(
|
||||
self,
|
||||
dispatcher: CollectingDispatcher,
|
||||
tracker: Tracker,
|
||||
domain: Dict[Text, Any],
|
||||
) -> List[Dict[Text, Any]]:
|
||||
events = []
|
||||
events.append(SlotSet("expert_search_query", None))
|
||||
events.append(SlotSet("interests", None))
|
||||
events.append(SlotSet("future_carrer", None))
|
||||
events.append(SlotSet("previous_courses", None))
|
||||
events.append(SlotSet("references", None))
|
||||
return events
|
|
@ -0,0 +1,3 @@
|
|||
CREDITS={
|
||||
|
||||
}
|
|
@ -0,0 +1 @@
|
|||
requests
|
|
@ -0,0 +1,47 @@
|
|||
# The config recipe.
|
||||
# https://rasa.com/docs/rasa/model-configuration/
|
||||
recipe: default.v1
|
||||
|
||||
# Configuration for Rasa NLU.
|
||||
# https://rasa.com/docs/rasa/nlu/components/
|
||||
language: en
|
||||
|
||||
pipeline:
|
||||
# # No configuration for the NLU pipeline was provided. The following default pipeline was used to train your model.
|
||||
# # If you'd like to customize it, uncomment and adjust the pipeline.
|
||||
# # See https://rasa.com/docs/rasa/tuning-your-model for more information.
|
||||
- name: WhitespaceTokenizer
|
||||
- name: RegexFeaturizer
|
||||
- name: LexicalSyntacticFeaturizer
|
||||
- name: CountVectorsFeaturizer
|
||||
- name: CountVectorsFeaturizer
|
||||
analyzer: char_wb
|
||||
min_ngram: 1
|
||||
max_ngram: 4
|
||||
- name: DIETClassifier
|
||||
epochs: 100
|
||||
constrain_similarities: true
|
||||
- name: EntitySynonymMapper
|
||||
- name: ResponseSelector
|
||||
epochs: 100
|
||||
constrain_similarities: true
|
||||
- name: FallbackClassifier
|
||||
threshold: 0.3
|
||||
ambiguity_threshold: 0.1
|
||||
|
||||
# Configuration for Rasa Core.
|
||||
# https://rasa.com/docs/rasa/core/policies/
|
||||
policies:
|
||||
# # No configuration for policies was provided. The following default policies were used to train your model.
|
||||
# # If you'd like to customize them, uncomment and adjust the policies.
|
||||
# # See https://rasa.com/docs/rasa/policies for more information.
|
||||
- name: MemoizationPolicy
|
||||
- name: RulePolicy
|
||||
- name: UnexpecTEDIntentPolicy
|
||||
max_history: 5
|
||||
epochs: 100
|
||||
- name: TEDPolicy
|
||||
max_history: 5
|
||||
epochs: 100
|
||||
constrain_similarities: true
|
||||
assistant_id: 20230509-125855-cheerful-area
|
|
@ -0,0 +1,33 @@
|
|||
# This file contains the credentials for the voice & chat platforms
|
||||
# which your bot is using.
|
||||
# https://rasa.com/docs/rasa/messaging-and-voice-channels
|
||||
|
||||
rest:
|
||||
# # you don't need to provide anything here - this channel doesn't
|
||||
# # require any credentials
|
||||
|
||||
|
||||
#facebook:
|
||||
# verify: "<verify>"
|
||||
# secret: "<your secret>"
|
||||
# page-access-token: "<your page access token>"
|
||||
|
||||
#slack:
|
||||
# slack_token: "<your slack token>"
|
||||
# slack_channel: "<the slack channel>"
|
||||
# slack_signing_secret: "<your slack signing secret>"
|
||||
|
||||
socketio:
|
||||
user_message_evt: user_uttered
|
||||
bot_message_evt: bot_uttered
|
||||
session_persistence: true
|
||||
|
||||
#mattermost:
|
||||
# url: "https://<mattermost instance>/api/v4"
|
||||
# token: "<bot token>"
|
||||
# webhook_url: "<callback URL>"
|
||||
|
||||
# This entry is needed if you are using Rasa Enterprise. The entry represents credentials
|
||||
# for the Rasa Enterprise "channel", i.e. Talk to your bot and Share with guest testers.
|
||||
rasa:
|
||||
url: "http://localhost:5002/api"
|
|
@ -0,0 +1,325 @@
|
|||
version: "3.1"
|
||||
|
||||
nlu:
|
||||
- intent: greet
|
||||
examples: |
|
||||
- Hallo
|
||||
- Servus
|
||||
- moin
|
||||
- was geht
|
||||
- alles fit?
|
||||
- hi
|
||||
- hey
|
||||
- hiho
|
||||
|
||||
- intent: wpm_recommendation
|
||||
examples: |
|
||||
- Kannst du mir Wahlpflichtmodule empfehlen?
|
||||
- Ich brauche Empfehlungen für Wahlpflichtmodule.
|
||||
- Suche nach Wahlpflichtmodulen.
|
||||
- Empfehle mir Wahlpflichtmodule.
|
||||
- Ich möchte Vorschläge für Wahlpflichtmodule.
|
||||
|
||||
- intent: search_expert
|
||||
examples: |
|
||||
- Ich suche einen Experten
|
||||
- Expertensuche
|
||||
- Führe eine Expertensuche durch
|
||||
- Ich möchte eine Expertensuche
|
||||
# - Ich brauche jemanden der sich mit KI auskennt.
|
||||
# - Wer ist der Professor für künstliche Intelligenz?
|
||||
# - Kannst du mir einen Datenbankexperten empfehlen?
|
||||
# - Ich suche einen Matheprofessor.
|
||||
# - Gibt es einen Fachmann für Webentwicklung?
|
||||
# - Wer kann mir bei Mobile App Entwicklung helfen?
|
||||
# - Ich brauche einen Experten für Data Science.
|
||||
# - Wer ist der Spezialist für maschinelles Lernen hier?
|
||||
# - Ich suche jemanden der sich mit Netzwerksicherheit auskennt.
|
||||
# - Wer ist der beste Professor für Physik?
|
||||
# - Gibt es hier einen Experten für Robotik?
|
||||
# - Ich suche einen Spezialisten für Ethik in der Technologie.
|
||||
# - Wer kann mir bei Quantencomputing helfen?
|
||||
# - Gibt es einen Professor für Bioinformatik?
|
||||
# - Ich suche einen Experten für Mensch-Computer-Interaktion.
|
||||
# - Wer kennt sich mit eingebetteten Systemen aus?
|
||||
# - Ich brauche jemanden der sich mit Statistik auskennt.
|
||||
# - Ich suche einen Spezialisten für Kryptographie.
|
||||
# - Wer kann mir in Optimierungsalgorithmen helfen?
|
||||
# - Gibt es hier einen Experten für Parallel Computing?
|
||||
# - Ich brauche einen Experten für Software Engineering.
|
||||
# - Ich suche einen Experten für Computergrafik.
|
||||
# - Wer ist der Professor für Virtual Reality?
|
||||
# - Kannst du mir einen Experten für Blockchain empfehlen?
|
||||
# - Gibt es einen Fachmann für Computer Vision?
|
||||
# - Ich suche einen Experten für Computational Biology.
|
||||
# - Wer kann mir bei Game Development helfen?
|
||||
# - Ich brauche einen Experten für Betriebssysteme.
|
||||
# - Ich suche einen Professor für Linguistik.
|
||||
# - Wer ist der Spezialist für Bildverarbeitung?
|
||||
# - Ich suche jemanden, der sich mit Cybersecurity auskennt.
|
||||
# - Gibt es einen Professor für Rechnernetze?
|
||||
# - Ich brauche einen Experten für Compilerbau.
|
||||
# - Wer kann mir bei der Entwicklung von Chatbots helfen?
|
||||
# - Ich suche einen Experten für Natural Language Processing.
|
||||
# - Wer kennt sich mit Augmented Reality aus?
|
||||
# - Ich brauche jemanden der sich mit Geoinformatik auskennt.
|
||||
# - Wer ist der Spezialist für Automatisierung?
|
||||
# - Ich suche jemanden, der sich mit 3D-Modellierung auskennt.
|
||||
# - Wer kann mir in Computational Neuroscience helfen?
|
||||
# - Gibt es einen Experten für Digital Humanities?
|
||||
# - Ich suche einen Experten für Big Data.
|
||||
# - Wer ist der Professor für E-Learning?
|
||||
# - Kannst du mir einen Experten für Internet der Dinge empfehlen?
|
||||
# - Gibt es einen Fachmann für Sozialinformatik?
|
||||
# - Ich suche einen Spezialisten für Assistive Technologien.
|
||||
# - Wer kennt sich mit E-Commerce aus?
|
||||
# - Ich brauche jemanden, der sich mit Finanzmathematik auskennt.
|
||||
# - Ich suche einen Experten für Computerethik.
|
||||
|
||||
- intent: how_much_credits
|
||||
examples: |
|
||||
- Wie viele ECTS-Punkte gibt das Wahlpflichtmodul [Angular und Nodejs](module)?
|
||||
- Wie viele ECTS-Punkte gibt das Wahlpflichtmodul [KRV](module)?
|
||||
- Wie viele ECTS-Punkte gibt das modul [GNN](module)?
|
||||
- Wie viele ECTS gibt das modul [machine learning](module)?
|
||||
- Wie viele ECTS gibt der Kurs [nlp](module)?
|
||||
- Wie viele ECTS gibt das [Praktisches Studiensemester (PS)](module)?
|
||||
- Wie viele ECTS für das [Praxissemester](module)?
|
||||
- Wie viele ECTS für das [Pflichtpraktikum](module)?
|
||||
- Wie viele ECTS kriege ich für das [praxissemester](module)?
|
||||
- Wie viele ECTS gibt die [Bachelorarbeit (BA)](module)?
|
||||
- Wie viele ECTS gibt die [Bachelorarbeit](module)?
|
||||
- Wie viele ECTS gibt die [Thesis](module)?
|
||||
- Wie viele credits gibt der Kurs [Natural Language Processing](module)?
|
||||
- Wie viele Credits gibt der Kurs [Internet of things](module)?
|
||||
- Wie viele credits gibt der Kurs [IOT](module)?
|
||||
- Kannst du mir sagen, wie viele Leistungspunkte ich für das Wahlpflichtmodul [Angular und Nodejs](module) erhalten werde?
|
||||
|
||||
- intent: stupo_question
|
||||
examples: |
|
||||
- Wie viele Präsenztage muss ich im Praktischen Studiensemester absolvieren?
|
||||
- Was sind die allgemeinen Zulassungsvoraussetzungen für das Studium an der Hochschule Mannheim?
|
||||
- Welche Voraussetzungen muss ich erfüllen, um in Mannheim zu studieren?
|
||||
- Wie lange ist die Regelstudienzeit für ein Vollzeitstudium?+
|
||||
- Welche Vorteile haben Studierende mit Kindern?
|
||||
- Wie viele Semester dauert das Vollzeitstudium?
|
||||
- Gibt es eine Möglichkeit, das Studium in Teilzeit zu absolvieren?
|
||||
- Wie ist das Studium aufgebaut?
|
||||
- Was beinhaltet das praktische Studiensemester?
|
||||
- Wie viele Credits sind für einen Bachelorabschluss erforderlich?
|
||||
- Kann die Reihenfolge der Lehrveranstaltungen geändert werden?
|
||||
- Wie werden die Bedürfnisse von Studierenden mit Kindern berücksichtigt?
|
||||
- Was passiert, wenn ich während des Semesters schwanger werde?
|
||||
- Wie kann ich eine Fristverlängerung wegen einer Behinderung beantragen?
|
||||
- Wann findet das praktische Studiensemester statt?
|
||||
- Wer betreut mich während des praktischen Studiensemesters?
|
||||
- Muss ich einen Bericht für das praktische Studiensemester schreiben?
|
||||
- Wer entscheidet über die Anerkennung des praktischen Studiensemesters?
|
||||
- Muss ich selbst einen Platz für das praktische Studiensemester finden?
|
||||
- Welche Prüfungsleistungen muss ich erbringen, um das praktische Studiensemester zu beginnen?
|
||||
- Was beinhaltet die Bachelorvorprüfung?
|
||||
- Kann ich Elternzeit während des Studiums nehmen?
|
||||
- Was sind die allgemeinen Zulassungsvoraussetzungen für die Bachelorvorprüfung und die Bachelorprüfung an der Hochschule Mannheim?
|
||||
- Wie kann ich mich für Studien- und Prüfungsleistungen anmelden und abmelden?
|
||||
- Wann finden studienbegleitende Modulprüfungen statt?
|
||||
- Welche Maßnahmen können bei gesundheitlichen Beeinträchtigungen getroffen werden?
|
||||
- Wie lange vor einer Modulprüfung muss ein Antrag auf Nachteilsausgleich gestellt werden?
|
||||
- Was muss ich für mündliche Prüfungsleistungen wissen?
|
||||
- Wie lange dauern mündliche Prüfungen?
|
||||
- Welche schriftlichen Arbeiten sind in dem Studiengang vorgesehen und wie lange dauern sie?
|
||||
- Was sind die Rahmenbedingungen für Online-Prüfungen?
|
||||
- Welche Systeme sind für Online-Prüfungen zulässig?
|
||||
- Wie wird meine Identität bei einer Online-Prüfung überprüft?
|
||||
- Gibt es unterschiedliche Formen von Online-Prüfungen?
|
||||
- Was passiert, wenn ich während der Prüfungswochen krank werde?
|
||||
- Was sind die Kriterien für die Dauer von Klausuren?
|
||||
- Wie viele Prüfungen kann ich während eines praktischen Studiensemesters ablegen?
|
||||
- Wer ist für die Festsetzung der Noten für die Prüfungsleistungen verantwortlich?
|
||||
- Welche Notenskala wird zur Bewertung der Prüfungsleistungen verwendet?
|
||||
- Wie werden Zwischenwerte bei der Bewertung der Prüfungsleistungen behandelt?
|
||||
- Was passiert, wenn mehrere Prüfer eine schriftliche Prüfungsleistung bewerten und die Noten stark voneinander abweichen?
|
||||
- Wie wird die Modulnote berechnet?
|
||||
- Was sind die Kategorien für die Modulnote basierend auf dem Durchschnitt?
|
||||
- Was sollte man tun, wenn man Einwendungen gegen die Bewertung einer Prüfungsleistung hat?
|
||||
- Wie lange hat man Zeit, um Einwendungen gegen die Bewertung einer Prüfungsleistung zu erheben?
|
||||
- Was passiert, wenn ich einen Prüfungstermin versäume?
|
||||
- Welche Konsequenzen gibt es für Täuschung oder den Gebrauch nicht zugelassener Hilfsmittel?
|
||||
- Was sollte man tun, wenn man aus einem triftigen Grund nicht an der Prüfung teilnehmen kann?
|
||||
- Wie oft kann eine nicht bestandene Prüfungsleistung wiederholt werden?
|
||||
- Wie meldet man sich für eine Wiederholungsprüfung an?
|
||||
- Was ist die Voraussetzung für eine zweite Wiederholung einer Prüfungsleistung?
|
||||
- Ist eine dritte Wiederholung einer Prüfungsleistung möglich?
|
||||
- Was besagt § 15 in der stupo über die Anrechnung von Studienzeiten?
|
||||
- Wie werden Fehlversuche von anderen Hochschulen anerkannt?
|
||||
- Bis wann muss ein Antrag für Anrechnung gestellt werden?
|
||||
- Was ist die Aufgabe des Prüfungsausschusses nach § 16 in der stupo?
|
||||
- Wer sind die Mitglieder des Prüfungsausschusses?
|
||||
- Wie wird der Prüfungsausschuss gebildet?
|
||||
- Was sind die Aufgaben des zentralen Prüfungsausschusses?
|
||||
- Wer ist berechtigt, Prüfungen abzunehmen laut § 17 des stupos?
|
||||
- Kann ich den Prüfer für meine Bachelorarbeit selbst wählen?
|
||||
- Was sind die Qualifikationen für einen Beisitzer?
|
||||
- Wer ist zuständig für die Entscheidung über Verstöße gegen Prüfungsvorschriften?
|
||||
- Wer stellt das Bachelorzeugnis aus?
|
||||
- Was passiert, wenn ich eine Frist überschreite?
|
||||
- Was ist der Zweck der Bachelorvorprüfung ?
|
||||
- Wie wird die Bachelorvorprüfung durchgeführt?
|
||||
- Wann muss die Bachelorvorprüfung abgeschlossen sein?
|
||||
- Was sind die fachlichen Voraussetzungen für die Bachelorvorprüfung?
|
||||
- Welche Prüfungsvorleistungen muss ich für die Bachelorvorprüfung erbringen?
|
||||
- Welche Modulprüfungen muss ich für die Bachelorvorprüfung absolvieren?
|
||||
- Wie wird die Gesamtnote für die Bachelorvorprüfung gebildet?
|
||||
- Wann bekomme ich mein Zeugnis für die Bachelorvorprüfung?
|
||||
- Was beinhaltet das Zeugnis der Bachelorvorprüfung?
|
||||
- Was ist der Zweck der finalen Bachelorprüfung?
|
||||
- Wie werden die Modulprüfungen der Bachelorprüfung durchgeführt?
|
||||
- Welche Voraussetzungen muss ich für die Bachelorprüfung erfüllen?
|
||||
- Darf ich die Bachelorprüfung ablegen, wenn ich noch nicht alle Prüfungsleistungen der Bachelorvorprüfung erbracht habe?
|
||||
- Was steht über die Art und den Umfang der Bachelorprüfung?
|
||||
- Welche Module muss ich in der Bachelorprüfung ablegen?
|
||||
- Wie viel Zeit habe ich für die Bearbeitung der Bachelorarbeit?
|
||||
- Ab wann kann ich das Thema für die Bachelorarbeit wählen?
|
||||
- Was muss ich bei der Abgabe der Bachelorarbeit beachten?
|
||||
- Wie wird die Bachelorarbeit bewertet?
|
||||
- Wie wird die Gesamtnote der Bachelorprüfung errechnet?
|
||||
- Wann erhalte ich das Zeugnis für die Bachelorprüfung?
|
||||
- Was bedeutet es, wenn die Bachelorprüfung "mit Auszeichnung bestanden" wurde?
|
||||
- Was ist ein Diploma Supplement?
|
||||
- Ist das Diploma Supplement Teil des Bachelorzeugnisses?
|
||||
- Ist es möglich, die Bachelorarbeit in einer anderen Sprache zu verfassen?
|
||||
- Muss ich ein praktisches Studiensemester absolvieren, um die Bachelorarbeit zu schreiben?
|
||||
- Wie wird das Diploma Supplement ausgestellt?
|
||||
- Welche akademischen Grade vergibt die Hochschule Mannheim?
|
||||
- Wann wird die Bachelorurkunde ausgehändigt?
|
||||
- Was passiert, wenn nachträglich Täuschung bei einer Prüfungsleistung festgestellt wird?
|
||||
- Was geschieht, wenn die Voraussetzungen für die Abnahme einer Prüfungsleistung nachträglich als nicht erfüllt erkannt werden?
|
||||
- Wie lange werden Prüfungsarbeiten aufbewahrt?
|
||||
- Wie kann ich Einsicht in meine Prüfungsarbeiten und Protokolle nehmen?
|
||||
- Was muss ich beachten, wenn im Regelstudienplan Wahlpflichtfächer vorgesehen sind?
|
||||
- Wie wähle ich das Thema für die Studienarbeit im Hauptstudium?
|
||||
- Gibt es Blockveranstaltungen zur Einführung in das praktische Studiensemester?
|
||||
- Wie wird die erfolgreiche Teilnahme an einer Blockveranstaltung dokumentiert?
|
||||
- Kann das Ergebnis von Prüfungsleistungen auch elektronisch bekannt gegeben werden?
|
||||
- An welche E-Mail-Adresse werden elektronische Mitteilungen gesendet?
|
||||
- intent: ask_module_info
|
||||
examples: |
|
||||
- Was ist das [Software-Entwicklungsprojekt (SEP)](module)?
|
||||
- Ich möchte infos über das [Softwareprojekt](module)?
|
||||
- Gibt mit infos über den Kurs [Kryptographische Verfahren (KRV)](module)?
|
||||
- Gibt mit infos über den Module [KRV](module)?
|
||||
- Gibt mit infos über den Module [KRV](module)?
|
||||
- Gibt mit infos über den Module [Mathematik für die Informatik 2 (MA2)](module)?
|
||||
- Möchte infos zu den Kurs [MA2](module)
|
||||
|
||||
- intent: ask_about_crawled_data
|
||||
examples: |
|
||||
- Wann ist die Anmeldefrist für das nächste Semester?
|
||||
- Wie setze ich mein Passwort für das Hochschulportal zurück?
|
||||
- Welche Unterlagen brauche ich für die Immatrikulation?
|
||||
- Bis wann muss ich die Studiengebühren bezahlen?
|
||||
- Wann und wo finden die Einführungsveranstaltungen statt?
|
||||
- Wie beantrage ich einen Studienplatzwechsel?
|
||||
- Wie komme ich an meine Matrikelnummer?
|
||||
- Welche Voraussetzungen gibt es für das BAföG?
|
||||
- Wo finde ich den aktuellen Mensaplan?
|
||||
- Ich habe Probleme mit meinem WLAN-Zugang auf dem Campus. Wer kann helfen?
|
||||
- Wo finde ich einen 3D Drucker an der Hochschule?
|
||||
- Wo finde ich einen Lasercutter an der Hochschule?
|
||||
- Hat die Hochschule 3D Drucker?
|
||||
- Wie setze ich meine Zentrale Kennung zurück?
|
||||
- Hat die Hochschule lasercutter?
|
||||
- Wo gibt es Pool Räume an der Hochschule?
|
||||
- Wie sind die Öffnungszeiten der Bibliothek?
|
||||
- Gibt es ein Studierendenwerk oder Mensa an der Hochschule?
|
||||
- Welche Studiengänge werden an der Hochschule Mannheim angeboten?
|
||||
- Wie kann ich mich für Masterstudiengänge bewerben?
|
||||
- Wo finde ich den Studienplan für Informatik?
|
||||
- Wer sind die Ansprechpartner für internationale Studierende?
|
||||
- Gibt es Möglichkeiten für ein Auslandssemester?
|
||||
- Wo kann ich mehr über die Forschungsprojekte an der Hochschule erfahren?
|
||||
- Wann beginnt das nächste Semester?
|
||||
- Gibt es Stipendien oder Förderungen für Studierende?
|
||||
- Wie erreiche ich den Campus mit öffentlichen Verkehrsmitteln?
|
||||
- Wo kann ich mein Studierendenausweis verlängern lassen?
|
||||
- Gibt es eine Einführungswoche für Erstsemester?
|
||||
- Wie melde ich mich zu Prüfungen an?
|
||||
- Wo kann ich eine Übersicht über die Labore und Einrichtungen finden?
|
||||
- Wer ist der Dekan des Fachbereichs Maschinenbau?
|
||||
- Welche Sportangebote gibt es an der Hochschule?
|
||||
- Wie kann ich mich für einen Sprachkurs anmelden?
|
||||
- Welche Dienste bietet das Rechenzentrum?
|
||||
- Wo finde ich Informationen über Wohnheime oder Unterkünfte in der Nähe?
|
||||
- Gibt es eine Kinderbetreuung oder Kindertagesstätte auf dem Campus?
|
||||
- Zeige mir den Campusplan der Hochschule.
|
||||
- Welche Services bietet das Studierendenwerk?
|
||||
- Ich möchte mehr über die Geschichte der Hochschule erfahren.
|
||||
- Kann ich die Hochschule während eines Tages der offenen Tür besuchen?
|
||||
- Erzähl mir von Alumni-Vereinigungen der Hochschule.
|
||||
- Ich brauche Informationen über Parkmöglichkeiten auf dem Campus.
|
||||
- Zeige mir die Forschungsbereiche des Fachbereichs Elektrotechnik.
|
||||
- Wie kann ich mich für die Hochschulsport-Kurse anmelden?
|
||||
- Ich möchte den Veranstaltungskalender der Hochschule sehen.
|
||||
- Erzähl mir mehr über Praktikums- und Jobmöglichkeiten für Studierende.
|
||||
- Was weißt du über Mars?
|
||||
- Erzähle mir etwas über Mars
|
||||
- Was weißt du etwas über Inno Space?
|
||||
- Erzähle mir etwas über Inno Space
|
||||
- intent: ask_for_references
|
||||
examples: |
|
||||
- Zeige mir die Dokumente, auf die du dich bezogen hast.
|
||||
- Zeige mir die Referenzen, auf die du dich bezogen hast.
|
||||
- Welche Referenzen hast du verwendet?
|
||||
- Welche Dokumente hast du verwendet?
|
||||
|
||||
- intent: ask_for_course_plan
|
||||
examples: |
|
||||
- Ich brauche einen Lehrveranstaltungsplan
|
||||
- Wo finde ich einen Lehrveranstaltungsplan?
|
||||
- Wo finde ich den Stundenplan?
|
||||
- Ich brauche einen Stundenplan?
|
||||
- Lehrveranstaltungsplan
|
||||
- Stundenplan
|
||||
- Studienplan
|
||||
- Ich brauche einen Studienplan?
|
||||
- wo finde ich den Studienplan?
|
||||
|
||||
- intent: ask_for_study_offers
|
||||
examples: |
|
||||
- Welche Studienangebote werden an der Hochschule angeboten?
|
||||
- Welche Studienangebote gibt es?
|
||||
- Kennst du dich mit den Studienangeboten aus?
|
||||
- Erzähle mir etwas über die Studienangebote
|
||||
- Ich brauche die Studienangebote
|
||||
|
||||
- synonym: Bachelorarbeit (BA)
|
||||
examples: |
|
||||
- Bachelorthesis
|
||||
- Bachelorarbeit
|
||||
- BA
|
||||
- bachelorarbeit
|
||||
- Thesis
|
||||
- thesis
|
||||
|
||||
- synonym: Lehrveranstaltungsplan
|
||||
examples: |
|
||||
- Studienplan
|
||||
- Studenplan
|
||||
- Lehrveranstaltungsplan
|
||||
|
||||
- synonym: Praktisches Studiensemester (PS)
|
||||
examples: |
|
||||
- Praktisches Studiensemester (PS)
|
||||
- Praktisches Studiensemester
|
||||
- Praxissemester
|
||||
- pflichtpraktikum
|
||||
|
||||
- synonym: Software-Entwicklungsprojekt (SEP)
|
||||
examples: |
|
||||
- SEP
|
||||
- Softwareprojekt
|
||||
- Projektsemester
|
||||
|
||||
- synonym: Mathematik für die Informatik 2 (MA2)
|
||||
examples: |
|
||||
- MA2
|
||||
- Mathe 2
|
|
@ -0,0 +1,90 @@
|
|||
version: "3.1"
|
||||
rules:
|
||||
- rule: activate_wpm_form
|
||||
steps:
|
||||
- intent: wpm_recommendation
|
||||
- action: action_reset_slots
|
||||
- action: wpm_form
|
||||
- active_loop: wpm_form
|
||||
- slot_was_set:
|
||||
- requested_slot: null
|
||||
|
||||
- rule: activate expert_search_form
|
||||
steps:
|
||||
- intent: search_expert
|
||||
- action: action_reset_slots
|
||||
- action: expert_search_form
|
||||
- active_loop: expert_search_form
|
||||
- slot_was_set:
|
||||
- requested_slot: null
|
||||
|
||||
- rule: submit expert_search_form
|
||||
condition:
|
||||
- active_loop: expert_search_form
|
||||
steps:
|
||||
- action: expert_search_form
|
||||
- active_loop: null
|
||||
- slot_was_set:
|
||||
- requested_slot: null
|
||||
- action: utter_searching_for_experts
|
||||
- action: action_expert_search
|
||||
- action: action_set_reminder
|
||||
|
||||
- rule: submit wpm form
|
||||
condition:
|
||||
- active_loop: wpm_form
|
||||
steps:
|
||||
- action: wpm_form
|
||||
- active_loop: null
|
||||
- slot_was_set:
|
||||
- requested_slot: null
|
||||
- action: utter_searching_for_wpms
|
||||
- action: action_recommend_module
|
||||
- action: action_set_reminder
|
||||
- rule: get_started
|
||||
steps:
|
||||
- intent: get_started
|
||||
- action: action_greet
|
||||
|
||||
- rule: utter credits for module
|
||||
steps:
|
||||
- intent: how_much_credits
|
||||
- action: action_get_credits
|
||||
- action: action_set_reminder
|
||||
|
||||
- rule: utter infos about module
|
||||
steps:
|
||||
- intent: ask_module_info
|
||||
- action: action_infos_module
|
||||
- action: action_set_reminder
|
||||
|
||||
- rule: ask about crawled data
|
||||
steps:
|
||||
- intent: ask_about_crawled_data
|
||||
- action: utter_searching_in_crawleddata
|
||||
- action: action_ask_about_crawled_hsma_data
|
||||
- action: action_set_reminder
|
||||
|
||||
- rule: ask about stupo
|
||||
steps:
|
||||
- intent: stupo_question
|
||||
- action: utter_searching_in_stupo
|
||||
- action: action_answer_stupo
|
||||
- action: action_set_reminder
|
||||
|
||||
- rule: ask for references
|
||||
steps:
|
||||
- intent: ask_for_references
|
||||
- action: action_provide_references
|
||||
- action: action_set_reminder
|
||||
|
||||
- rule: Trigger `action_react_to_reminder` for `EXTERNAL_reminder`
|
||||
steps:
|
||||
- intent: EXTERNAL_reminder
|
||||
- action: utter_how_can_i_help
|
||||
- rule: handle feedback
|
||||
steps:
|
||||
- or:
|
||||
- intent: liked_answer
|
||||
- intent: disliked_answer
|
||||
- action: action_handle_feedback
|
|
@ -0,0 +1,25 @@
|
|||
version: "3.1"
|
||||
|
||||
stories:
|
||||
- story: happy path
|
||||
steps:
|
||||
- intent: greet
|
||||
- action: utter_greet
|
||||
|
||||
- story: example_stupo_questions
|
||||
steps:
|
||||
- intent: example_stupo_questions
|
||||
- action: action_provide_stupo_example_questions
|
||||
- story: example_general_questions
|
||||
steps:
|
||||
- intent: example_general_questions
|
||||
- action: action_provide_general_example_questions
|
||||
- story: user ask for course ask_for_course_plan
|
||||
steps:
|
||||
- intent: ask_for_course_plan
|
||||
- action: utter_provide_course_plan
|
||||
|
||||
- story: user asks for study offers
|
||||
steps:
|
||||
- intent: ask_for_study_offers
|
||||
- action: utter_provide_course_plan
|
|
@ -0,0 +1,198 @@
|
|||
version: "3.1"
|
||||
|
||||
intents:
|
||||
- greet
|
||||
- goodbye
|
||||
- affirm
|
||||
- deny
|
||||
- mood_great
|
||||
- mood_unhappy
|
||||
- bot_challenge
|
||||
- wpm_recommendation
|
||||
- how_much_credits
|
||||
- get_started
|
||||
- ask_module_info
|
||||
- stupo_question
|
||||
- ask_about_crawled_data
|
||||
- ask_for_references
|
||||
- example_stupo_questions
|
||||
- example_general_questions
|
||||
- search_expert
|
||||
- ask_for_course_plan
|
||||
- ask_for_study_offers
|
||||
- EXTERNAL_reminder
|
||||
- liked_answer
|
||||
- disliked_answer
|
||||
entities:
|
||||
- reader_model
|
||||
- retrieval_method_or_model
|
||||
- rerank
|
||||
slots:
|
||||
query_stupo:
|
||||
type: text
|
||||
mappings:
|
||||
- type: custom
|
||||
answer_stupo:
|
||||
type: text
|
||||
mappings:
|
||||
- type: custom
|
||||
query_crawled_data:
|
||||
type: text
|
||||
mappings:
|
||||
- type: custom
|
||||
answer_crawled_data:
|
||||
type: text
|
||||
mappings:
|
||||
- type: custom
|
||||
reader_model:
|
||||
type: text
|
||||
mappings:
|
||||
- type: custom
|
||||
- type: from_entity
|
||||
entity: reader_model
|
||||
retrieval_method_or_model:
|
||||
type: text
|
||||
mappings:
|
||||
- type: custom
|
||||
- type: from_entity
|
||||
entity: retrieval_method_or_model
|
||||
rerank:
|
||||
type: bool
|
||||
mappings:
|
||||
- type: custom
|
||||
- type: from_entity
|
||||
entity: rerank
|
||||
references:
|
||||
type: any
|
||||
mappings:
|
||||
- type: custom
|
||||
interests:
|
||||
type: text
|
||||
mappings:
|
||||
- type: from_text
|
||||
conditions:
|
||||
- active_loop: wpm_form
|
||||
requested_slot: interests
|
||||
previous_courses:
|
||||
type: text
|
||||
mappings:
|
||||
- type: from_text
|
||||
conditions:
|
||||
- active_loop: wpm_form
|
||||
requested_slot: previous_courses
|
||||
future_carrer:
|
||||
type: text
|
||||
mappings:
|
||||
- type: from_text
|
||||
conditions:
|
||||
- active_loop: wpm_form
|
||||
requested_slot: future_carrer
|
||||
expert_search_query:
|
||||
type: text
|
||||
mappings:
|
||||
- type: from_text
|
||||
conditions:
|
||||
- active_loop: expert_search_form
|
||||
requested_slot: expert_search_query
|
||||
|
||||
expert_search_answer:
|
||||
type: text
|
||||
mappings:
|
||||
- type: custom
|
||||
|
||||
wpm_recommendation_answer:
|
||||
type: text
|
||||
mappings:
|
||||
- type: custom
|
||||
last_searched_index:
|
||||
type: text
|
||||
mappings:
|
||||
- type: custom
|
||||
|
||||
responses:
|
||||
utter_greet:
|
||||
- text: "Hallo! Ich bin der Chatbot der Hochschule 😊. \nIch biete dir Infos zur Hochschule, Hilfe zu Themen die in der Studienprüfungsordnung(StuPo) festgelegt sind wie Elternzeit , Prüfungsangelegenheiten, Studium mit Kindern, Beratung über Wahlpflichtmodule und durchsuche wissenschaftliche Arbeiten unserer Dozenten nach deinen Schlüsselwörtern.\n Wie kann ich dir helfen?"
|
||||
buttons:
|
||||
- title: "Empfehlungen für Wahlpflichtmodule"
|
||||
payload: "/wpm_recommendation"
|
||||
- title: "Expertensuche"
|
||||
payload: "/search_expert"
|
||||
- title: "Allgemeine Informationen über die Hochschule"
|
||||
payload: "/example_general_questions"
|
||||
- title: "Fragen über die Studienprüfungsordnung"
|
||||
payload: "/example_stupo_questions"
|
||||
|
||||
utter_how_can_i_help:
|
||||
- text: Wie kann ich dir weiterhelfen?
|
||||
buttons:
|
||||
- title: "Empfehlungen für Wahlpflichtmodule"
|
||||
payload: "/wpm_recommendation"
|
||||
- title: "Expertensuche"
|
||||
payload: "/search_expert"
|
||||
- title: "Allgemeine Informationen über die Hochschule"
|
||||
payload: "/example_general_questions"
|
||||
- title: "Fragen über die Studienprüfungsordnung"
|
||||
payload: "/example_stupo_questions"
|
||||
|
||||
utter_example_stupo_questions:
|
||||
- text: Ich kann versuchen, dir bei inhaltichen Fragen über die StuPo zu helfen. Du kannst mir Fragen wie :\n -"Wie kann man Elternzeit beantragen?",
|
||||
utter_did_that_help:
|
||||
- text: "Hat das dir weitergeholfen?"
|
||||
|
||||
utter_goodbye:
|
||||
- text: "Tschüss"
|
||||
utter_ask_field_of_study:
|
||||
- text: "In welchem Studiengang bist du eingeschrieben?"
|
||||
|
||||
utter_ask_semester:
|
||||
- text: "In welchem Semester bist du?"
|
||||
|
||||
utter_ask_interests:
|
||||
- text: "Welche Themen interessieren dich besonders?"
|
||||
|
||||
utter_ask_previous_courses:
|
||||
- text: "Welche Kurse hast du in der Vergangenheit belegt?"
|
||||
|
||||
utter_ask_future_carrer:
|
||||
- text: Welche Art von Karriere möchtest du nach dem Studium anstreben?
|
||||
utter_ask_expert_search_query:
|
||||
- text: In welchem Bereich suchst du einen Experten?
|
||||
utter_provide_course_plan:
|
||||
- text: "Hier ist der Link zum [Lehrveranstaltungsplan](https://services.informatik.hs-mannheim.de/stundenplan/)"
|
||||
utter_provide_study_offers:
|
||||
- text: "Hier ist der Link zu den [Studienangeboten](https://www.hs-mannheim.de/studieninteressierte/unsere-studiengaenge/bachelorstudiengaenge.html)"
|
||||
utter_searching_in_stupo:
|
||||
- text: Ich suche nach relevanten Informationen in der Studienprüfungsordnung...
|
||||
utter_searching_in_crawleddata:
|
||||
- text: Ich suche nach passenden Informationen auf der Hochschulseite...
|
||||
utter_searching_for_wpms:
|
||||
- text: Ich suche nach passenden Wahlpflichtmodulen...
|
||||
utter_searching_for_experts:
|
||||
- text: Ich suche nach passenden Experten...
|
||||
actions:
|
||||
- action_recommend_module
|
||||
- action_greet
|
||||
- action_get_credits
|
||||
- action_infos_module
|
||||
- action_answer_stupo
|
||||
- action_ask_about_crawled_hsma_data
|
||||
- action_provide_references
|
||||
- action_provide_stupo_example_questions
|
||||
- action_provide_general_example_questions
|
||||
- action_expert_search
|
||||
- action_set_reminder
|
||||
- action_handle_feedback
|
||||
- action_reset_slots
|
||||
forms:
|
||||
wpm_form:
|
||||
required_slots:
|
||||
- interests
|
||||
- previous_courses
|
||||
- future_carrer
|
||||
expert_search_form:
|
||||
required_slots:
|
||||
- expert_search_query
|
||||
|
||||
session_config:
|
||||
session_expiration_time: 60
|
||||
carry_over_slots_to_new_session: true
|
|
@ -0,0 +1,41 @@
|
|||
# This file contains the different endpoints your bot can use.
|
||||
|
||||
# Server where the models are pulled from.
|
||||
# https://rasa.com/docs/rasa/model-storage#fetching-models-from-a-server
|
||||
|
||||
#models:
|
||||
# url: http://my-server.com/models/default_core@latest
|
||||
# wait_time_between_pulls: 10 # [optional](default: 100)
|
||||
|
||||
# Server which runs your custom actions.
|
||||
# https://rasa.com/docs/rasa/custom-actions
|
||||
|
||||
action_endpoint:
|
||||
url: "http://app:5055/webhook"
|
||||
# Tracker store which is used to store the conversations.
|
||||
# By default the conversations are stored in memory.
|
||||
# https://rasa.com/docs/rasa/tracker-stores
|
||||
|
||||
#tracker_store:
|
||||
# type: redis
|
||||
# url: <host of the redis instance, e.g. localhost>
|
||||
# port: <port of your redis instance, usually 6379>
|
||||
# db: <number of your database within redis, e.g. 0>
|
||||
# password: <password used for authentication>
|
||||
# use_ssl: <whether or not the communication is encrypted, default false>
|
||||
|
||||
#tracker_store:
|
||||
# type: mongod
|
||||
# url: <url to your mongo instance, e.g. mongodb://localhost:27017>
|
||||
# db: <name of the db within your mongo instance, e.g. rasa>
|
||||
# username: <username used for authentication>
|
||||
# password: <password used for authentication>
|
||||
|
||||
# Event broker which all conversation events should be streamed to.
|
||||
# https://rasa.com/docs/rasa/event-brokers
|
||||
|
||||
#event_broker:
|
||||
# url: localhost
|
||||
# username: username
|
||||
# password: password
|
||||
# queue: queue
|
|
@ -0,0 +1,91 @@
|
|||
#### This file contains tests to evaluate that your bot behaves as expected.
|
||||
#### If you want to learn more, please see the docs: https://rasa.com/docs/rasa/testing-your-assistant
|
||||
|
||||
stories:
|
||||
- story: happy path 1
|
||||
steps:
|
||||
- user: |
|
||||
hello there!
|
||||
intent: greet
|
||||
- action: utter_greet
|
||||
- user: |
|
||||
amazing
|
||||
intent: mood_great
|
||||
- action: utter_happy
|
||||
|
||||
- story: happy path 2
|
||||
steps:
|
||||
- user: |
|
||||
hello there!
|
||||
intent: greet
|
||||
- action: utter_greet
|
||||
- user: |
|
||||
amazing
|
||||
intent: mood_great
|
||||
- action: utter_happy
|
||||
- user: |
|
||||
bye-bye!
|
||||
intent: goodbye
|
||||
- action: utter_goodbye
|
||||
|
||||
- story: sad path 1
|
||||
steps:
|
||||
- user: |
|
||||
hello
|
||||
intent: greet
|
||||
- action: utter_greet
|
||||
- user: |
|
||||
not good
|
||||
intent: mood_unhappy
|
||||
- action: utter_cheer_up
|
||||
- action: utter_did_that_help
|
||||
- user: |
|
||||
yes
|
||||
intent: affirm
|
||||
- action: utter_happy
|
||||
|
||||
- story: sad path 2
|
||||
steps:
|
||||
- user: |
|
||||
hello
|
||||
intent: greet
|
||||
- action: utter_greet
|
||||
- user: |
|
||||
not good
|
||||
intent: mood_unhappy
|
||||
- action: utter_cheer_up
|
||||
- action: utter_did_that_help
|
||||
- user: |
|
||||
not really
|
||||
intent: deny
|
||||
- action: utter_goodbye
|
||||
|
||||
- story: sad path 3
|
||||
steps:
|
||||
- user: |
|
||||
hi
|
||||
intent: greet
|
||||
- action: utter_greet
|
||||
- user: |
|
||||
very terrible
|
||||
intent: mood_unhappy
|
||||
- action: utter_cheer_up
|
||||
- action: utter_did_that_help
|
||||
- user: |
|
||||
no
|
||||
intent: deny
|
||||
- action: utter_goodbye
|
||||
|
||||
- story: say goodbye
|
||||
steps:
|
||||
- user: |
|
||||
bye-bye!
|
||||
intent: goodbye
|
||||
- action: utter_goodbye
|
||||
|
||||
- story: bot challenge
|
||||
steps:
|
||||
- user: |
|
||||
are you a bot?
|
||||
intent: bot_challenge
|
||||
- action: utter_iamabot
|
|
@ -0,0 +1,180 @@
|
|||
name: chatbot_env
|
||||
channels:
|
||||
- defaults
|
||||
dependencies:
|
||||
- _libgcc_mutex=0.1=main
|
||||
- _openmp_mutex=5.1=1_gnu
|
||||
- bzip2=1.0.8=h7b6447c_0
|
||||
- ca-certificates=2023.08.22=h06a4308_0
|
||||
- ld_impl_linux-64=2.38=h1181459_1
|
||||
- libffi=3.4.4=h6a678d5_0
|
||||
- libgcc-ng=11.2.0=h1234567_1
|
||||
- libgomp=11.2.0=h1234567_1
|
||||
- libstdcxx-ng=11.2.0=h1234567_1
|
||||
- libuuid=1.41.5=h5eee18b_0
|
||||
- ncurses=6.4=h6a678d5_0
|
||||
- openssl=3.0.11=h7f8727e_2
|
||||
- pip=23.2.1=py310h06a4308_0
|
||||
- python=3.10.13=h955ad1f_0
|
||||
- readline=8.2=h5eee18b_0
|
||||
- setuptools=68.0.0=py310h06a4308_0
|
||||
- sqlite=3.41.2=h5eee18b_0
|
||||
- tk=8.6.12=h1ccaba5_0
|
||||
- tzdata=2023c=h04d1e81_0
|
||||
- wheel=0.38.4=py310h06a4308_0
|
||||
- xz=5.4.2=h5eee18b_0
|
||||
- zlib=1.2.13=h5eee18b_0
|
||||
- pip:
|
||||
- absl-py==1.4.0
|
||||
- aio-pika==8.2.3
|
||||
- aiofiles==23.2.1
|
||||
- aiogram==2.25.1
|
||||
- aiohttp==3.8.5
|
||||
- aiohttp-retry==2.8.3
|
||||
- aiormq==6.4.2
|
||||
- aiosignal==1.3.1
|
||||
- apscheduler==3.9.1.post1
|
||||
- astunparse==1.6.3
|
||||
- async-timeout==4.0.3
|
||||
- attrs==22.1.0
|
||||
- babel==2.9.1
|
||||
- bidict==0.22.1
|
||||
- boto3==1.28.53
|
||||
- botocore==1.31.53
|
||||
- cachecontrol==0.12.14
|
||||
- cachetools==5.3.1
|
||||
- certifi==2023.7.22
|
||||
- cffi==1.15.1
|
||||
- charset-normalizer==3.2.0
|
||||
- click==8.1.7
|
||||
- cloudpickle==2.2.1
|
||||
- colorclass==2.2.2
|
||||
- coloredlogs==15.0.1
|
||||
- colorhash==1.2.1
|
||||
- confluent-kafka==2.2.0
|
||||
- cryptography==41.0.4
|
||||
- cycler==0.11.0
|
||||
- dask==2022.10.2
|
||||
- dnspython==2.3.0
|
||||
- docopt==0.6.2
|
||||
- fbmessenger==6.0.0
|
||||
- fire==0.5.0
|
||||
- flatbuffers==23.5.26
|
||||
- fonttools==4.42.1
|
||||
- frozenlist==1.4.0
|
||||
- fsspec==2023.9.2
|
||||
- future==0.18.3
|
||||
- gast==0.4.0
|
||||
- google-auth==2.23.0
|
||||
- google-auth-oauthlib==1.0.0
|
||||
- google-pasta==0.2.0
|
||||
- greenlet==2.0.2
|
||||
- grpcio==1.58.0
|
||||
- h11==0.14.0
|
||||
- h5py==3.9.0
|
||||
- httptools==0.6.0
|
||||
- humanfriendly==10.0
|
||||
- idna==3.4
|
||||
- jax==0.4.16
|
||||
- jmespath==1.0.1
|
||||
- joblib==1.2.0
|
||||
- jsonpickle==3.0.2
|
||||
- jsonschema==4.17.3
|
||||
- keras==2.12.0
|
||||
- kiwisolver==1.4.5
|
||||
- libclang==16.0.6
|
||||
- locket==1.0.0
|
||||
- magic-filter==1.0.11
|
||||
- markdown==3.4.4
|
||||
- markupsafe==2.1.3
|
||||
- matplotlib==3.5.3
|
||||
- mattermostwrapper==2.2
|
||||
- ml-dtypes==0.3.1
|
||||
- msgpack==1.0.6
|
||||
- multidict==5.2.0
|
||||
- networkx==2.6.3
|
||||
- numpy==1.23.5
|
||||
- oauthlib==3.2.2
|
||||
- opt-einsum==3.3.0
|
||||
- packaging==20.9
|
||||
- pamqp==3.2.1
|
||||
- partd==1.4.0
|
||||
- pillow==10.0.1
|
||||
- pluggy==1.3.0
|
||||
- portalocker==2.8.2
|
||||
- prompt-toolkit==3.0.28
|
||||
- protobuf==4.23.3
|
||||
- psycopg2-binary==2.9.7
|
||||
- pyasn1==0.5.0
|
||||
- pyasn1-modules==0.3.0
|
||||
- pycparser==2.21
|
||||
- pydantic==1.10.9
|
||||
- pydot==1.4.2
|
||||
- pyjwt==2.8.0
|
||||
- pykwalify==1.8.0
|
||||
- pymongo==4.3.3
|
||||
- pyparsing==3.1.1
|
||||
- pyrsistent==0.19.3
|
||||
- python-crfsuite==0.9.9
|
||||
- python-dateutil==2.8.2
|
||||
- python-engineio==4.7.1
|
||||
- python-socketio==5.9.0
|
||||
- pytz==2022.7.1
|
||||
- pyyaml==6.0.1
|
||||
- questionary==1.10.0
|
||||
- randomname==0.1.5
|
||||
- rasa==3.6.9
|
||||
- rasa-sdk==3.6.2
|
||||
- redis==4.6.0
|
||||
- regex==2022.10.31
|
||||
- requests==2.31.0
|
||||
- requests-oauthlib==1.3.1
|
||||
- requests-toolbelt==1.0.0
|
||||
- rocketchat-api==1.30.0
|
||||
- rsa==4.9
|
||||
- ruamel-yaml==0.17.21
|
||||
- ruamel-yaml-clib==0.2.7
|
||||
- s3transfer==0.6.2
|
||||
- sanic==21.12.2
|
||||
- sanic-cors==2.0.1
|
||||
- sanic-jwt==1.8.0
|
||||
- sanic-routing==0.7.2
|
||||
- scikit-learn==1.1.3
|
||||
- scipy==1.11.2
|
||||
- sentry-sdk==1.14.0
|
||||
- simple-websocket==0.10.1
|
||||
- six==1.16.0
|
||||
- sklearn-crfsuite==0.3.6
|
||||
- slack-sdk==3.22.0
|
||||
- sqlalchemy==1.4.49
|
||||
- structlog==23.1.0
|
||||
- structlog-sentry==2.0.3
|
||||
- tabulate==0.9.0
|
||||
- tarsafe==0.0.4
|
||||
- tensorboard==2.12.3
|
||||
- tensorboard-data-server==0.7.1
|
||||
- tensorflow==2.12.0
|
||||
- tensorflow-estimator==2.12.0
|
||||
- tensorflow-hub==0.13.0
|
||||
- tensorflow-io-gcs-filesystem==0.32.0
|
||||
- tensorflow-text==2.12.0
|
||||
- termcolor==2.3.0
|
||||
- terminaltables==3.1.10
|
||||
- threadpoolctl==3.2.0
|
||||
- toolz==0.12.0
|
||||
- tqdm==4.66.1
|
||||
- twilio==8.2.2
|
||||
- typing-extensions==4.8.0
|
||||
- typing-utils==0.1.0
|
||||
- tzlocal==5.0.1
|
||||
- ujson==5.8.0
|
||||
- urllib3==1.26.16
|
||||
- uvloop==0.17.0
|
||||
- wcwidth==0.2.6
|
||||
- webexteamssdk==1.6.1
|
||||
- websockets==10.4
|
||||
- werkzeug==2.3.7
|
||||
- wrapt==1.14.1
|
||||
- wsproto==1.2.0
|
||||
- yarl==1.9.2
|
||||
prefix: /home/alibabaoglu/miniconda3/envs/chatbot
|
|
@ -0,0 +1,291 @@
|
|||
name: data_service
|
||||
channels:
|
||||
- defaults
|
||||
dependencies:
|
||||
- _libgcc_mutex=0.1=main
|
||||
- _openmp_mutex=5.1=1_gnu
|
||||
- ca-certificates=2023.08.22=h06a4308_0
|
||||
- ld_impl_linux-64=2.38=h1181459_1
|
||||
- libffi=3.4.4=h6a678d5_0
|
||||
- libgcc-ng=11.2.0=h1234567_1
|
||||
- libgomp=11.2.0=h1234567_1
|
||||
- libstdcxx-ng=11.2.0=h1234567_1
|
||||
- ncurses=6.4=h6a678d5_0
|
||||
- openssl=3.0.11=h7f8727e_2
|
||||
- pip=23.2.1=py39h06a4308_0
|
||||
- python=3.9.18=h955ad1f_0
|
||||
- readline=8.2=h5eee18b_0
|
||||
- setuptools=68.0.0=py39h06a4308_0
|
||||
- sqlite=3.41.2=h5eee18b_0
|
||||
- tk=8.6.12=h1ccaba5_0
|
||||
- wheel=0.41.2=py39h06a4308_0
|
||||
- xz=5.4.2=h5eee18b_0
|
||||
- zlib=1.2.13=h5eee18b_0
|
||||
- pip:
|
||||
- absl-py==2.0.0
|
||||
- accelerate==0.23.0
|
||||
- aiohttp==3.8.6
|
||||
- aiohttp-cors==0.7.0
|
||||
- aiorwlock==1.3.0
|
||||
- aiosignal==1.3.1
|
||||
- alembic==1.12.0
|
||||
- anyio==3.7.1
|
||||
- appdirs==1.4.4
|
||||
- asgiref==3.7.2
|
||||
- astunparse==1.6.3
|
||||
- async-timeout==4.0.3
|
||||
- attrs==23.1.0
|
||||
- authlib==1.2.1
|
||||
- automat==22.10.0
|
||||
- azure-ai-formrecognizer==3.3.1
|
||||
- azure-common==1.1.28
|
||||
- azure-core==1.29.4
|
||||
- backoff==2.2.1
|
||||
- beautifulsoup4==4.12.2
|
||||
- beir==0.2.3
|
||||
- blessed==1.20.0
|
||||
- boilerpy3==1.0.6
|
||||
- boto3==1.28.63
|
||||
- botocore==1.31.63
|
||||
- cachetools==5.3.1
|
||||
- canals==0.8.0
|
||||
- cattrs==23.1.2
|
||||
- certifi==2023.7.22
|
||||
- cffi==1.16.0
|
||||
- charset-normalizer==3.3.0
|
||||
- click==8.0.4
|
||||
- cloudpickle==2.2.1
|
||||
- cmake==3.27.6
|
||||
- coloredlogs==15.0.1
|
||||
- colorful==0.5.5
|
||||
- constantly==15.1.0
|
||||
- contourpy==1.1.1
|
||||
- cryptography==41.0.4
|
||||
- cssselect==1.2.0
|
||||
- cycler==0.12.1
|
||||
- databricks-cli==0.18.0
|
||||
- diskcache==5.6.3
|
||||
- distlib==0.3.7
|
||||
- dnspython==2.4.2
|
||||
- docker==6.1.3
|
||||
- docopt==0.6.2
|
||||
- elastic-transport==7.16.0
|
||||
- elasticsearch==7.17.9
|
||||
- entrypoints==0.4
|
||||
- events==0.5
|
||||
- exceptiongroup==1.1.3
|
||||
- faiss-cpu==1.7.2
|
||||
- farm-haystack==1.21.2
|
||||
- fastapi==0.103.2
|
||||
- ffmpeg-python==0.2.0
|
||||
- filelock==3.12.4
|
||||
- flask==2.2.5
|
||||
- flatbuffers==23.5.26
|
||||
- fonttools==4.43.1
|
||||
- frozenlist==1.4.0
|
||||
- fsspec==2023.9.2
|
||||
- future==0.18.3
|
||||
- gast==0.4.0
|
||||
- gitdb==4.0.10
|
||||
- gitpython==3.1.37
|
||||
- google-api-core==2.12.0
|
||||
- google-auth==2.23.3
|
||||
- google-auth-oauthlib==0.4.6
|
||||
- google-pasta==0.2.0
|
||||
- googleapis-common-protos==1.61.0
|
||||
- gpustat==1.1.1
|
||||
- greenlet==3.0.0
|
||||
- grpcio==1.43.0
|
||||
- gunicorn==21.2.0
|
||||
- h11==0.14.0
|
||||
- h5py==3.10.0
|
||||
- httpcore==0.18.0
|
||||
- httpx==0.25.0
|
||||
- huggingface-hub==0.18.0
|
||||
- humanfriendly==10.0
|
||||
- hyperlink==21.0.0
|
||||
- idna==3.4
|
||||
- importlib-metadata==6.8.0
|
||||
- importlib-resources==6.1.0
|
||||
- incremental==22.10.0
|
||||
- inflect==7.0.0
|
||||
- isodate==0.6.1
|
||||
- itemadapter==0.8.0
|
||||
- itemloaders==1.1.0
|
||||
- itsdangerous==2.1.2
|
||||
- jarowinkler==1.2.3
|
||||
- jinja2==3.1.2
|
||||
- jmespath==1.0.1
|
||||
- joblib==1.3.2
|
||||
- jsonschema==4.19.1
|
||||
- jsonschema-specifications==2023.7.1
|
||||
- keras==2.11.0
|
||||
- kiwisolver==1.4.5
|
||||
- langdetect==1.0.9
|
||||
- lazy-imports==0.3.1
|
||||
- libclang==16.0.6
|
||||
- lit==17.0.2
|
||||
- llama-cpp-python==0.2.11
|
||||
- llvmlite==0.41.0
|
||||
- loguru==0.7.2
|
||||
- lxml==4.9.3
|
||||
- mako==1.2.4
|
||||
- markdown==3.5
|
||||
- markupsafe==2.1.3
|
||||
- matplotlib==3.8.0
|
||||
- mlflow==2.7.1
|
||||
- monotonic==1.6
|
||||
- more-itertools==10.1.0
|
||||
- mpmath==1.3.0
|
||||
- msgpack==1.0.7
|
||||
- msrest==0.7.1
|
||||
- multidict==6.0.4
|
||||
- networkx==3.1
|
||||
- nltk==3.8.1
|
||||
- num2words==0.5.12
|
||||
- numba==0.58.0
|
||||
- numpy==1.25.2
|
||||
- nvidia-cublas-cu11==11.10.3.66
|
||||
- nvidia-cuda-cupti-cu11==11.7.101
|
||||
- nvidia-cuda-nvrtc-cu11==11.7.99
|
||||
- nvidia-cuda-runtime-cu11==11.7.99
|
||||
- nvidia-cudnn-cu11==8.5.0.96
|
||||
- nvidia-cufft-cu11==10.9.0.58
|
||||
- nvidia-curand-cu11==10.2.10.91
|
||||
- nvidia-cusolver-cu11==11.4.0.1
|
||||
- nvidia-cusparse-cu11==11.7.4.91
|
||||
- nvidia-ml-py==12.535.108
|
||||
- nvidia-nccl-cu11==2.14.3
|
||||
- nvidia-nvtx-cu11==11.7.91
|
||||
- oauthlib==3.2.2
|
||||
- onnx==1.12.0
|
||||
- onnxruntime==1.16.1
|
||||
- onnxruntime-tools==1.7.0
|
||||
- openai==0.28.1
|
||||
- openai-whisper==20230308
|
||||
- opencensus==0.11.3
|
||||
- opencensus-context==0.1.3
|
||||
- opencv-python==4.8.1.78
|
||||
- opensearch-py==2.3.2
|
||||
- opt-einsum==3.3.0
|
||||
- outcome==1.2.0
|
||||
- packaging==23.2
|
||||
- pandas==2.1.1
|
||||
- parsel==1.8.1
|
||||
- pdf2image==1.16.3
|
||||
- pdfminer-six==20221105
|
||||
- pdfplumber==0.10.2
|
||||
- pillow==10.0.1
|
||||
- pinecone-client==2.2.4
|
||||
- platformdirs==3.11.0
|
||||
- posthog==3.0.2
|
||||
- prometheus-client==0.13.1
|
||||
- prompthub-py==4.0.0
|
||||
- protego==0.3.0
|
||||
- protobuf==3.19.6
|
||||
- psutil==5.9.5
|
||||
- psycopg2-binary==2.9.9
|
||||
- py-cpuinfo==9.0.0
|
||||
- py-spy==0.3.14
|
||||
- py3nvml==0.2.7
|
||||
- pyarrow==13.0.0
|
||||
- pyasn1==0.5.0
|
||||
- pyasn1-modules==0.3.0
|
||||
- pycparser==2.21
|
||||
- pydantic==1.10.13
|
||||
- pydispatcher==2.0.7
|
||||
- pyjwt==2.8.0
|
||||
- pymupdf==1.23.5
|
||||
- pymupdfb==1.23.5
|
||||
- pyopenssl==23.2.0
|
||||
- pyparsing==3.1.1
|
||||
- pypdfium2==4.21.0
|
||||
- pysocks==1.7.1
|
||||
- pytesseract==0.3.10
|
||||
- python-dateutil==2.8.2
|
||||
- python-docx==1.0.1
|
||||
- python-dotenv==1.0.0
|
||||
- python-frontmatter==1.0.0
|
||||
- python-magic==0.4.27
|
||||
- pytrec-eval==0.5
|
||||
- pytz==2023.3.post1
|
||||
- pyyaml==6.0.1
|
||||
- quantulum3==0.9.0
|
||||
- querystring-parser==1.2.4
|
||||
- queuelib==1.6.2
|
||||
- rank-bm25==0.2.2
|
||||
- rapidfuzz==2.7.0
|
||||
- ray==1.13.0
|
||||
- referencing==0.30.2
|
||||
- regex==2023.10.3
|
||||
- requests==2.31.0
|
||||
- requests-cache==0.9.8
|
||||
- requests-file==1.5.1
|
||||
- requests-oauthlib==1.3.1
|
||||
- rpds-py==0.10.6
|
||||
- rsa==4.9
|
||||
- s3transfer==0.7.0
|
||||
- safetensors==0.4.0
|
||||
- scikit-learn==1.3.1
|
||||
- scipy==1.11.3
|
||||
- scrapy==2.11.0
|
||||
- selenium==4.14.0
|
||||
- sentence-transformers==2.2.2
|
||||
- sentencepiece==0.1.99
|
||||
- seqeval==1.2.2
|
||||
- service-identity==23.1.0
|
||||
- six==1.16.0
|
||||
- smart-open==6.4.0
|
||||
- smmap==5.0.1
|
||||
- sniffio==1.3.0
|
||||
- sortedcontainers==2.4.0
|
||||
- soupsieve==2.5
|
||||
- sqlalchemy==1.4.49
|
||||
- sqlalchemy-utils==0.41.1
|
||||
- sqlparse==0.4.4
|
||||
- sseclient-py==1.8.0
|
||||
- starlette==0.27.0
|
||||
- sympy==1.12
|
||||
- tabulate==0.9.0
|
||||
- tenacity==8.2.3
|
||||
- tensorboard==2.11.2
|
||||
- tensorboard-data-server==0.6.1
|
||||
- tensorboard-plugin-wit==1.8.1
|
||||
- tensorflow==2.11.1
|
||||
- tensorflow-estimator==2.11.0
|
||||
- tensorflow-hub==0.15.0
|
||||
- tensorflow-io-gcs-filesystem==0.34.0
|
||||
- tensorflow-text==2.11.0
|
||||
- termcolor==2.3.0
|
||||
- threadpoolctl==3.2.0
|
||||
- tika==2.6.0
|
||||
- tiktoken==0.5.1
|
||||
- tldextract==5.0.0
|
||||
- tokenizers==0.13.3
|
||||
- torch==2.0.1
|
||||
- torchvision==0.15.2
|
||||
- tqdm==4.66.1
|
||||
- transformers==4.32.1
|
||||
- trio==0.22.2
|
||||
- trio-websocket==0.11.1
|
||||
- triton==2.0.0
|
||||
- twisted==22.10.0
|
||||
- typing-extensions==4.8.0
|
||||
- tzdata==2023.3
|
||||
- url-normalize==1.4.3
|
||||
- urllib3==1.26.17
|
||||
- uvicorn==0.16.0
|
||||
- validators==0.22.0
|
||||
- virtualenv==20.24.5
|
||||
- w3lib==2.1.2
|
||||
- wcwidth==0.2.8
|
||||
- weaviate-client==3.24.2
|
||||
- websocket-client==1.6.4
|
||||
- werkzeug==3.0.0
|
||||
- wrapt==1.15.0
|
||||
- wsproto==1.2.0
|
||||
- xmltodict==0.13.0
|
||||
- yarl==1.9.2
|
||||
- zipp==3.17.0
|
||||
- zope-interface==6.1
|
||||
prefix: /home/alibabaoglu/miniconda3/envs/data_service
|
|
@ -0,0 +1,43 @@
|
|||
import requests
|
||||
import json
|
||||
import os
|
||||
host= os.environ.get("MODEL_SERVICE_HOST","127.0.0.1" )
|
||||
BASE_URL= f"http://{host}:5000/"
|
||||
ANSWER_URL= f"{BASE_URL}generate_answer"
|
||||
EMBEDDINGS_URL= f"{BASE_URL}generate_embeddings"
|
||||
class EmbeddingServiceCaller:
|
||||
def __init__(self) -> None:
|
||||
pass
|
||||
def get_embeddings(self, text,embedding_type="input_embeddings", operation="mean", embedding_model="llama", layer=-1):
|
||||
headers = {
|
||||
'Content-Type': 'application/json'
|
||||
}
|
||||
payload=json.dumps({
|
||||
"query":text,
|
||||
"embedding_type":embedding_type,
|
||||
"operation":operation,
|
||||
"embedding_model":embedding_model,
|
||||
"layer":layer
|
||||
})
|
||||
return requests.request("POST", f"{EMBEDDINGS_URL}", headers=headers, data=payload).json()
|
||||
|
||||
def get_answer(self,payload="", prompt=""):
|
||||
headers = {
|
||||
'Content-Type': 'application/json'
|
||||
}
|
||||
if payload:
|
||||
response = requests.request("POST", f"{ANSWER_URL}", headers=headers, data=payload)
|
||||
else:
|
||||
payload = json.dumps({
|
||||
"prompt": prompt
|
||||
})
|
||||
response = requests.request("POST", f"{ANSWER_URL}", headers=headers, data=payload)
|
||||
return response.json()
|
||||
|
||||
def _call(self, url, method):
|
||||
response = requests.request(method, url)
|
||||
return response.json()
|
||||
|
||||
if __name__ == "__main__":
|
||||
caller= EmbeddingServiceCaller()
|
||||
print(caller.get_embeddings("Hallsdfasdf Hallsdfasdf Hallsdfasdf"))
|
File diff suppressed because one or more lines are too long
|
@ -0,0 +1,83 @@
|
|||
"""
|
||||
The PDFConverter class is a utility for converting PDF files into various formats for subsequent processing.
|
||||
It employs libraries like pdfminer and pdfplumber to facilitate these conversions.
|
||||
The class offers functionality to convert PDFs to text using pdfminer, to HTML format, and to extract tables from PDFs with pdfplumber.
|
||||
While it currently utilizes specific libraries for these tasks, it's structured to potentially integrate other libraries like Camelot for table extraction, as indicated by the commented-out methods.
|
||||
This converter acts as a critical pre-processing component in various data processing workflows, preparing PDF content for more detailed analysis or content management systems.
|
||||
"""
|
||||
from pathlib import Path
|
||||
from typing import List
|
||||
from io import StringIO
|
||||
from pdfminer.high_level import extract_text_to_fp, extract_text
|
||||
from pdfminer.layout import LAParams
|
||||
import pdfplumber
|
||||
import pandas as pd
|
||||
# import camelot
|
||||
|
||||
class PDFConverter:
|
||||
def __init__(self, init_haystack=True) -> None:
|
||||
if init_haystack:
|
||||
from haystack.nodes import PDFToTextConverter
|
||||
from haystack import Document
|
||||
self.haystack_converter = PDFToTextConverter(
|
||||
remove_numeric_tables=True,
|
||||
valid_languages=["de", "en"]
|
||||
)
|
||||
|
||||
def convert_pdf_to_text_haystack(self, path: Path) -> List:
|
||||
if self.haystack_converter:
|
||||
docs = self.haystack_converter.convert(file_path=path, meta=None)
|
||||
return docs
|
||||
|
||||
def convert_pdf_to_text_pdfminer(self, path: Path):
|
||||
text = extract_text(path)
|
||||
return text
|
||||
|
||||
def convert_pdf_to_html(self, path: Path):
|
||||
output_string = StringIO()
|
||||
with open(path, 'rb') as fin:
|
||||
extract_text_to_fp(fin, output_string,
|
||||
laparams=LAParams(), output_type='html', codec=None)
|
||||
return output_string.getvalue()
|
||||
|
||||
|
||||
# Function to convert extracted tables to HTML
|
||||
def tables_to_html(self, tables):
|
||||
html_tables = []
|
||||
for table in tables:
|
||||
df = pd.DataFrame(table[1:], columns=table[0])
|
||||
html_table = df.to_html(index=False, border=1, table_id="table_data")
|
||||
html_tables.append(html_table)
|
||||
return html_tables
|
||||
|
||||
# Function to extract tables from PDF using pdfplumber
|
||||
def extract_tables_from_pdf(self, pdf_path):
|
||||
tables = []
|
||||
with pdfplumber.open(pdf_path) as pdf:
|
||||
for page in pdf.pages:
|
||||
page_tables = page.extract_tables()
|
||||
for table in page_tables:
|
||||
tables.append(table)
|
||||
return tables
|
||||
def convert_pdf_tables_pdfplumber(self,path:Path):
|
||||
tables = self.extract_tables_from_pdf(path)
|
||||
html_tables = self.tables_to_html(tables)
|
||||
return html_tables
|
||||
|
||||
# Function to extract tables from PDF using Camelot
|
||||
# def extract_tables_from_pdf_camelot(self,pdf_path):
|
||||
# tables = camelot.read_pdf(pdf_path, flavor='stream', pages='all', split_text=True, strip_text='\n')
|
||||
# return tables
|
||||
|
||||
# Function to convert extracted tables to HTML
|
||||
# def tables_to_html_camelot(self,tables):
|
||||
# html_tables = []
|
||||
# for table in tables:
|
||||
# df = table.df
|
||||
# html_table = df.to_html(index=False, border=1, table_id="table_data")
|
||||
# html_tables.append(html_table)
|
||||
# return html_tables
|
||||
# def convert_pdf_tables_camelot(self,path:Path):
|
||||
# tables = self.extract_tables_from_pdf_camelot(path)
|
||||
# html_tables = self.tables_to_html_camelot(tables)
|
||||
# return html_tables
|
|
@ -0,0 +1,54 @@
|
|||
|
||||
"""
|
||||
This script includes a SQL to JSON converter specifically designed for converting data from a university's module database into a JSON format.
|
||||
It connects to a MySQL database, retrieves module data using a SQL query, and processes this data into a structured JSON format.
|
||||
NOTE: You have to run MySQL Database on localhost at the default PORT.
|
||||
The script removes unnecessary timestamp columns, combines relevant fields to create a content field for each module, and identifies elective modules (Wahlpflichtmodule) based on specific criteria.
|
||||
The processed data is saved in a JSON file.
|
||||
"""
|
||||
import pymysql
|
||||
import pandas as pd
|
||||
import json
|
||||
# Database connection setup
|
||||
db_connection = pymysql.connect(
|
||||
host='localhost',
|
||||
user='maydane',
|
||||
password='1234',
|
||||
db='mydatabase')
|
||||
|
||||
# SQL query execution
|
||||
query = 'SELECT * FROM modul'
|
||||
df = pd.read_sql(query, con=db_connection)
|
||||
|
||||
timestamp_cols = ['changed']
|
||||
|
||||
# Remove timestamps, cause they are irrelevant
|
||||
df = df.drop(columns=timestamp_cols)
|
||||
# DataFrame in JSON konvertieren
|
||||
json_data = df.to_dict(orient='records')
|
||||
|
||||
# Write JSON to file
|
||||
with open('data.json', 'w', encoding="utf-8") as f:
|
||||
json.dump(json_data, f, ensure_ascii=False)
|
||||
# Verbindung schließen
|
||||
db_connection.close()
|
||||
|
||||
#---------------------------------------------------------
|
||||
# This file is from a notebook. So this part is antoher script for parsing the json to a suitable format.
|
||||
# TODO: This can be refactored into the upper script
|
||||
import json
|
||||
# Load the data from a JSON file for converting to right format
|
||||
with open('data.json', 'r') as f:
|
||||
data = json.load(f)
|
||||
|
||||
# Iterate over the data, combining the fields
|
||||
for dic in data:
|
||||
combined_str = dic.get("name_de", "") + " " + dic.get("inhalte_de", "") + " " + dic.get("kompetenzen_de", "")
|
||||
dic["content"] = combined_str
|
||||
dic["is_wpm"]= dic.get("semester") == "6/7" and dic.get("pflichtmodul") == 0
|
||||
|
||||
|
||||
|
||||
# If you want to save the updated data back to the JSON file:
|
||||
with open('converted_data.json', 'w') as f:
|
||||
json.dump(data, f, ensure_ascii=False, indent=4)
|
|
@ -0,0 +1,44 @@
|
|||
|
||||
|
||||
import scrapy
|
||||
import json
|
||||
|
||||
|
||||
# THIS crawls every content from a provided list of urls.
|
||||
# NOTE: This means that we have to execute previously the hsma_url_crawler.py with the CMD scrapy runspider hsma_crawler.py
|
||||
# NOTE: Then exectue this file with scrapy runspider hsma_content_crawler.py
|
||||
# NOTE: Move afterwards the generated "url_texts.json" into the /data directory and rename it to "crawled_hsma_web.json"
|
||||
# TODO: Automate the file moving process to the /data dir
|
||||
class MySpider(scrapy.Spider):
|
||||
name = 'myspider'
|
||||
allowed_domains = ["hs-mannheim.de"]
|
||||
|
||||
custom_settings = {
|
||||
'LOG_LEVEL': 'INFO',
|
||||
'ROBOTSTXT_OBEY': True,
|
||||
'DEPTH_LIMIT': 1,
|
||||
'FEED_FORMAT':'json',
|
||||
'FEED_URI': 'url_texts.json',
|
||||
'FEED_EXPORT_ENCODING': 'utf-8'
|
||||
|
||||
}
|
||||
def __init__(self):
|
||||
# Read the file and load the JSON
|
||||
with open('urls.json', 'r') as f:
|
||||
self.start_urls = json.load(f)
|
||||
|
||||
def parse(self, response):
|
||||
# Remove script and style elements
|
||||
for script in response.xpath('//script | //style | //footer'):
|
||||
script.extract()
|
||||
# Ignore specific elements using XPath
|
||||
# Extract text from the remaining HTML elements
|
||||
text = response.xpath('//body//text()[not(ancestor::header or ancestor::nav or ancestor::footer or ancestor::script or ancestor::*[contains(@class, "cc-container")] or ancestor::*[contains(@class, "c-top-link")])]').getall()
|
||||
# Remove leading and trailing whitespace from each piece of text
|
||||
text = [t.strip() for t in text]
|
||||
# Remove empty strings
|
||||
text = [t for t in text if t != '']
|
||||
# Join the pieces of text
|
||||
text = ' '.join(text)
|
||||
# Yield the scraped content
|
||||
yield {'url': response.url, 'content': text}
|
|
@ -0,0 +1,28 @@
|
|||
import scrapy
|
||||
import json
|
||||
# This crawls all availiable urls for crawling from the hs-mannheim domain
|
||||
# RUN WITH "scrapy runspider hsma_crawler.py"
|
||||
|
||||
class MySpider(scrapy.Spider):
|
||||
name = 'myspider'
|
||||
allowed_domains = ["hs-mannheim.de"]
|
||||
start_urls = ["https://www.hs-mannheim.de/", "https://www.startup.hs-mannheim.de/"]
|
||||
custom_settings = {
|
||||
'LOG_LEVEL': 'INFO',
|
||||
'ROBOTSTXT_OBEY': True,
|
||||
'DEPTH_LIMIT': 1,
|
||||
}
|
||||
|
||||
# Initialize the list
|
||||
urls = set()
|
||||
|
||||
def parse(self, response):
|
||||
# Follow all links on the page
|
||||
for href in response.css('a::attr(href)').getall():
|
||||
url = response.urljoin(href)
|
||||
self.urls.add(url)
|
||||
|
||||
def closed(self, reason):
|
||||
# When spider closes, write URLs to file
|
||||
with open('urls.json', 'w') as f:
|
||||
json.dump(list(self.urls), f)
|
File diff suppressed because one or more lines are too long
|
@ -0,0 +1,260 @@
|
|||
[
|
||||
"https://www.hs-mannheim.de/die-hochschule/hochschule-mannheim/leitbild.html",
|
||||
"https://www.hs-mannheim.de/studieninteressierte/unsere-studiengaenge/masterstudiengaenge.html",
|
||||
"http://services.informatik.hs-mannheim.de/stundenplan/plan_prof.php",
|
||||
"http://www.sw.hs-mannheim.de",
|
||||
"https://www.hs-mannheim.de/studieninteressierte/die-hsma-kennenlernen/messen.html",
|
||||
"https://www.hs-mannheim.de/studierende/studienstart/von-studis-fuer-studis.html",
|
||||
"https://www.hs-mannheim.de/bewerbung/orientierungstest.html",
|
||||
"https://www.mars.hs-mannheim.de/",
|
||||
"https://www.hs-mannheim.de/die-hochschule/organisation-und-gremien/beauftragte/datenschutzbeauftragter.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/studium-und-lehre/sprachenzentrum/sprachtandem.html",
|
||||
"https://www.hs-mannheim.de/datenschutzerklaerung.html",
|
||||
"https://www.hs-mannheim.de/gleichstellung.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/organisation-und-gremien/fachschaften.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/studium-und-lehre/sprachenzentrum/aktuelle-sprachkurse.html",
|
||||
"http://services.informatik.hs-mannheim.de/stundenplan/frei/",
|
||||
"https://www.inno-space.hs-mannheim.de/",
|
||||
"https://www.hs-mannheim.de/die-hochschule/studium-und-lehre/gasthoerer.html",
|
||||
"https://www.startup.hs-mannheim.de/service-angebot/startup-journey.html",
|
||||
"https://www.hs-mannheim.de/presse.html",
|
||||
"https://www.startup.hs-mannheim.de/erfolgsgeschichten.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/hochschule-mannheim/rechtsvorschriften.html",
|
||||
"https://www.hs-mannheim.de/bewerbung/auswahlsatzungen.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/hochschule-mannheim.html",
|
||||
"https://www.startup.hs-mannheim.de/startups-vor-ort/aucteq-biosystems-1-1.html",
|
||||
"http://malumni.de/",
|
||||
"https://www.hs-mannheim.de/bewerbung/nc-werte.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/forschung-und-transfer/forschungsfoerderung/forschungsfoerderung-der-karl-voelker-stiftung.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/studium-und-lehre/studienfoerderung-stipendien.html",
|
||||
"https://www.hs-mannheim.de/studierende/termine-news-service/semestertermine.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/organisation-und-gremien/zusammen-am-campus/bedrohungs-und-konfliktmanagement-bekom.html",
|
||||
"https://www.hs-mannheim.de/bewerbung/zulassungsvoraussetzungen.html",
|
||||
"https://www.startup.hs-mannheim.de/ueber-uns/startups-1-1.html",
|
||||
"http://www.design.hs-mannheim.de",
|
||||
"https://www.hs-mannheim.de/die-hochschule/forschung-und-transfer/nachwuchsfoerderung.html",
|
||||
"https://www.hs-mannheim.de/studierende/intranet.html",
|
||||
"https://www.hs-mannheim.de/studierende/studienorganisation/pruefungsplaene.html",
|
||||
"https://www.hrk.de/weltoffene-hochschulen",
|
||||
"https://www.hs-mannheim.de/einzelansicht/offene-schnuppervorlesungen-an-der-hochschule-mannheim-in-den-herbstferien.html",
|
||||
"https://www.hs-mannheim.de/beschaeftigte/personalentwicklung.html",
|
||||
"http://www.informatik.hs-mannheim.de",
|
||||
"https://www.hs-mannheim.de/die-hochschule/internationales/international-office/an-der-hochschule-mannheim-studieren.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/forschung-und-transfer.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/organisation-und-gremien/zentrale-angebote.html",
|
||||
"https://www.startup.hs-mannheim.de/start/netzwerk.html",
|
||||
"https://www.startup.hs-mannheim.de/startup-support.html",
|
||||
"https://www.startdurch.hs-mannheim.de",
|
||||
"https://www.hs-mannheim.de/studieninteressierte/die-hsma-kennenlernen.html#c125528",
|
||||
"https://www.startup.hs-mannheim.de/start/startupambassadors.html",
|
||||
"https://www.hs-mannheim.de/einzelansicht-termine/maschinenbaukolloquium-im-onlineformat-ueber-webex-1.html",
|
||||
"https://www.hs-mannheim.de/stellenangebote.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/forschung-und-transfer/forschungsprofil.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/internationales/international-office.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/studium-und-lehre/service-center-hochschuldidaktik-und-qualitaetsmanagement/beratung-zur-entwicklung-von-lehrkonzepten.html",
|
||||
"https://www.hs-mannheim.de/bewerbung/vorbereitungskurse.html",
|
||||
"https://www.hs-mannheim.de/studierende/studienorganisation/fragen-rund-ums-studium.html",
|
||||
"https://www.exist.de/DE/Programm/Exist-Gruenderstipendium/inhalt.html",
|
||||
"https://www.hs-mannheim.de/sitemap.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/hochschule-mannheim/familienfreundliche-hochschule/studieren-mit-kind.html",
|
||||
"https://www.hs-mannheim.de/beschaeftigte/termine-news-service.html",
|
||||
"https://www.hs-mannheim.de/studieninteressierte/unsere-studiengaenge/kurzvideos-zu-unseren-studiengaengen.html",
|
||||
"https://www.hs-mannheim.de/studieninteressierte/unsere-studiengaenge/bachelorstudiengaenge.html",
|
||||
"https://www.cit.hs-mannheim.de/service/e-mail-dienste-kalender/webmail/anmelden.html",
|
||||
"http://www.et.hs-mannheim.de",
|
||||
"https://www.startup.hs-mannheim.de/#top",
|
||||
"https://www.hs-mannheim.de/die-hochschule/hochschule-mannheim/hochschule-in-zahlen.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/forschung-und-transfer/nachwuchsfoerderung/promotionsstipendien-der-albert-und-anneliese-konanz-stiftung.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/studium-und-lehre/sprachenzentrum/integration-ins-studium.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/organisation-und-gremien/verwaltung.html",
|
||||
"https://www.hs-mannheim.de/bewerbung/auslaendische-bewerber.html",
|
||||
"https://www.hs-mannheim.de/beschaeftigte/personalentwicklung/campus-lead-fuehrungskraefteentwicklung.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/organisation-und-gremien/digitalisierung/informationssicherheit.html",
|
||||
"http://www.wing.hs-mannheim.de",
|
||||
"https://www.hs-mannheim.de/die-hochschule/studium-und-lehre/sprachenzentrum.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/hochschule-mannheim/haus-brandschutzordnung.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/organisation-und-gremien/organigramm.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/organisation-und-gremien/zusammen-am-campus/chancengleichheit.html",
|
||||
"https://www.hs-mannheim.de/einzelansicht/lehrpreis-der-hochschule-mannheim-fuer-prof-nagel.html",
|
||||
"https://www.hs-mannheim.de/studieninteressierte/die-hsma-kennenlernen/girls-day-und-boys-day.html",
|
||||
"https://www.startup.hs-mannheim.de/startups-vor-ort/pro-aspectx-1-1.html",
|
||||
"http://www.tbl.hs-mannheim.de",
|
||||
"https://www.startup.hs-mannheim.de/",
|
||||
"https://www.hs-mannheim.de/anfahrt-und-campusplan.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/organisation-und-gremien/beauftragte/strahlenschutzbeauftragter.html",
|
||||
"https://www.hs-mannheim.de/studierende/studienorganisation/satzungen-ordnungen.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/startup.html",
|
||||
"https://www.startup.hs-mannheim.de/ueber-uns/startups-1/connou-1.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/hochschule-mannheim/campusplan.html",
|
||||
"https://support.hs-mannheim.de/otrs/customer.pl",
|
||||
"https://www.hs-mannheim.de/studierende/studienorganisation/studium-pruefung.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/studium-und-lehre/trainee-programm.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/forschung-und-transfer/forschungsprojekte.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/studium-und-lehre/sprachenzentrum/international-annerkante-sprachzertifikate.html",
|
||||
"https://www.hs-mannheim.de/bewerbung.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/studium-und-lehre/sprachenzentrum/sprachlehrkraefte.html",
|
||||
"https://www.hs-mannheim.de/studieninteressierte/unsere-studiengaenge/weiterbildungsstudiengaenge.html",
|
||||
"https://www.hs-mannheim.de/studierende/hochschulsport/sportscard.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/organisation-und-gremien/zusammen-am-campus/bedrohungs-und-konfliktmanagement-bekom/bedrohungsmanagement/sexuelle-belaestigung.html",
|
||||
"https://inno-space.de/makerspace.html",
|
||||
"https://www.startup.hs-mannheim.de/startups-vor-ort/mentalport-1-1.html",
|
||||
"http://www.bib.hs-mannheim.de/",
|
||||
"http://www.youtube.com/user/HochschuleMannheim",
|
||||
"https://www.hs-mannheim.de/die-hochschule/organisation-und-gremien/senat.html",
|
||||
"https://www.hs-mannheim.de/studierende/hochschulsport.html",
|
||||
"https://services.informatik.hs-mannheim.de/stundenplan/tagestermine.php",
|
||||
"https://www.hs-mannheim.de/die-hochschule/internationales/international-office/auslandsaufenthalte-outgoing.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/studium-und-lehre/studienfoerderung-stipendien/mittelstandsstipendien.html",
|
||||
"https://www.hs-mannheim.de/unternehmen/mami-mannheimer-mittelstandsmesse.html",
|
||||
"https://www.hs-mannheim.de/einzelansicht/veranstaltungsrueckblick-top-thema-kuenstliche-intelligenz-des-m2aind-symposium-stoesst-auf-reges-interesse.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/studium-und-lehre/studienangebot/binationale-studiengaenge.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/forschung-und-transfer/forschungsprofil/kompetenzzentren.html",
|
||||
"https://www.hs-mannheim.de/beschaeftigte/intranet.html",
|
||||
"https://www.hs-mannheim.de/beschaeftigte/telefonliste.html",
|
||||
"https://www.hs-mannheim.de/beschaeftigte/personalentwicklung/das-mitarbeitergespraech.html",
|
||||
"https://www.startup.hs-mannheim.de/newsletter.html",
|
||||
"https://www.cit.hs-mannheim.de/",
|
||||
"https://www.hs-mannheim.de/die-hochschule/internationales/partnerhochschulen.html",
|
||||
"https://www.startup.hs-mannheim.de/foerdermittel.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/studium-und-lehre/konanz-fuehrungsakademie.html",
|
||||
"https://www.startup.hs-mannheim.de/anfahrt.html",
|
||||
"https://www.hs-mannheim.de/studierende/service-center-studierende.html",
|
||||
"https://www.zll.hs-mannheim.de/",
|
||||
"https://www.hs-mannheim.de/die-hochschule/organisation-und-gremien/zusammen-am-campus/sozialberatung.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/hochschule-mannheim/familienfreundliche-hochschule/pflegende-angehoerige.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/forschung-und-transfer/transforma.html",
|
||||
"https://www.hs-mannheim.de/",
|
||||
"https://www.hs-mannheim.de/studieninteressierte/unsere-studiengaenge.html",
|
||||
"http://www.career.hs-mannheim.de",
|
||||
"https://www.hs-mannheim.de/beschaeftigte/hochschulsport.html",
|
||||
"https://www.hs-mannheim.de/studierende/termine-news-service.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/forschung-und-transfer/interner-bereich.html",
|
||||
"http://www.mb.hs-mannheim.de",
|
||||
"https://www.hs-mannheim.de/die-hochschule/studium-und-lehre.html",
|
||||
"https://www.hs-mannheim.de/studieninteressierte/die-hsma-kennenlernen/schnuppervorlesungen.html",
|
||||
"https://www.hs-mannheim.de/bewerbung/zugang-fuer-berufstaetige.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/organisation-und-gremien/personalrat.html",
|
||||
"https://www.startup.hs-mannheim.de/kreativraum.html",
|
||||
"http://www.inftech.hs-mannheim.de",
|
||||
"https://www.hs-mannheim.de/unternehmen/termine-news-service.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/forschung-und-transfer/transfer/erfindungen-und-patente.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/internationales/sprachenzentrum.html",
|
||||
"https://www.hs-mannheim.de/#top",
|
||||
"https://www.hs-mannheim.de/die-hochschule/veranstaltungen/veranstaltungskalender.html",
|
||||
"http://www.cit.hs-mannheim.de/",
|
||||
"http://services.informatik.hs-mannheim.de/stundenplan",
|
||||
"https://www.hs-mannheim.de/studierende/studienorganisation/zulassung-immatrikulation.html",
|
||||
"https://www.hs-mannheim.de/professorenliste/professoren.php",
|
||||
"https://www.hs-mannheim.de/bewerbung/bewerbungsunterlagen.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/forschung-und-transfer/transfer.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/organisation-und-gremien/digitalisierung.html",
|
||||
"https://www.hs-mannheim.de/presse/zentraler-newsbereich.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/fakultaeten-und-institute.html",
|
||||
"https://www.hs-mannheim.de/impressum.html",
|
||||
"https://www.startup.hs-mannheim.de/impressum.html",
|
||||
"https://www.hs-mannheim.de/beschaeftigte/personalrat.html",
|
||||
"https://www.hs-mannheim.de/studierende/studienstart.html",
|
||||
"https://malumni.de/jobteaser-karrierenetzwerk/",
|
||||
"https://www.hs-mannheim.de/die-hochschule/studium-und-lehre/service-center-hochschuldidaktik-und-qualitaetsmanagement.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/organisation-und-gremien/zusammen-am-campus/bedrohungs-und-konfliktmanagement-bekom/antidiskriminierung.html",
|
||||
"https://www.instagram.com/mars.hsmannheim/",
|
||||
"https://www.hs-mannheim.de/die-hochschule/organisation-und-gremien/hochschulwahlen.html",
|
||||
"https://www.startup.hs-mannheim.de/unsere-angebote/future-skills.html",
|
||||
"http://www.hfsw.de/",
|
||||
"https://www.hs-mannheim.de/die-hochschule/organisation-und-gremien/zusammen-am-campus/studieren-mit-behinderung.html",
|
||||
"https://www.hs-mannheim.de/beschaeftigte/personalentwicklung/erasmus-personalmobilitaet.html",
|
||||
"https://www.hs-mannheim.de/einzelansicht-termine/wissen-um-eins-zeitmanagement-im-studium-die-dinge-geregelt-kriegen.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/internationales/international-office/info-hochschulmitglieder.html",
|
||||
"https://www.hs-mannheim.de/unternehmen/mittelstandsstipendien.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/hochschule-mannheim/familienfreundliche-hochschule/studieren-mit-kind/praxissemester-mit-kind-1.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/internationales/international-office/ansprechpartner.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/studium-und-lehre/studienangebot.html",
|
||||
"https://www.hs-mannheim.de/studierende/lehrveranstaltungen/zeitzuordnungen-1.html",
|
||||
"https://www.startup.hs-mannheim.de/mars-1/stellenangebote.html",
|
||||
"https://www.stifterverband.org/entrepreneurial-skills-charta",
|
||||
"http://www.wrm.hs-mannheim.de",
|
||||
"https://www.startup.hs-mannheim.de/startup-tools/vortragsmaterialien.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/organisation-und-gremien/zusammen-am-campus.html",
|
||||
"https://www.hs-mannheim.de/bewerbung/bewerbung-fuer-ein-zweitstudium.html",
|
||||
"https://www.hs-mannheim.de/studierende/studienorganisation/lehrveranstaltungen.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/hochschule-mannheim/verein-der-freunde.html",
|
||||
"https://gruendungsradar.de/sites/gradar/files/gruendungsradar_2022.pdf",
|
||||
"https://www.startup.hs-mannheim.de/?no_cache=1",
|
||||
"https://www.hs-mannheim.de/?no_cache=1",
|
||||
"http://www.dhik.org/",
|
||||
"https://www.hs-mannheim.de/bewerbung/immatrikulation.html",
|
||||
"http://www.hs-mannheim.de/professorenliste/professoren.php",
|
||||
"https://www.startup.hs-mannheim.de/sitemap.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/organisation-und-gremien/hochschulrat.html",
|
||||
"https://www.hs-mannheim.de/einzelansicht-termine/default-d37c9cfcc4a101a006bba98f062caef9.html",
|
||||
"http://www.facebook.de/HochschuleMannheim",
|
||||
"http://www.stw-ma.de/Essen+_+Trinken/Men%C3%BCpl%C3%A4ne/Hochschule+Mannheim.html",
|
||||
"http://www.vct.hs-mannheim.de",
|
||||
"https://www.hs-mannheim.de/die-hochschule/forschung-und-transfer/nachwuchsfoerderung/konanz-graduiertenakademie.html",
|
||||
"https://www.hs-mannheim.de/studierende/studienstart/erste-schritte.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/hochschule-mannheim/familienfreundliche-hochschule/charta-familie.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/studium-und-lehre/studienfoerderung-stipendien/deutschlandstipendium.html",
|
||||
"http://moodle.hs-mannheim.de/",
|
||||
"https://www.hs-mannheim.de/einzelansicht-termine/default-cfb9720dea08d704fa4a676a79c44c44.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/forschung-und-transfer/kontakt-und-beratung.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/studium-und-lehre/sprachenzentrum/sprachangebot.html",
|
||||
"https://www.vs.hs-mannheim.de/",
|
||||
"https://www.hs-mannheim.de/einzelansicht/landesweiter-studieninfotag-an-der-hochschule-mannheim-offen-fuer-die-zukunft.html",
|
||||
"https://www.startup.hs-mannheim.de/startups-vor-ort/symovo-1-1.html",
|
||||
"https://www.hs-mannheim.de/einzelansicht/prorektoren-der-hochschule-wiedergewaehlt.html",
|
||||
"https://www.hs-mannheim.de/beschaeftigte/hochschulsport/sportscard.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/fakultaeten-und-institute/institute.html",
|
||||
"https://www.zll.hs-mannheim.de",
|
||||
"https://www.hs-mannheim.de/studierende/sprachenzentrum.html",
|
||||
"https://www.linkedin.com/company/marshsmannheim",
|
||||
"https://www.hs-mannheim.de/die-hochschule/studium-und-lehre/trainee-programm/partnerunternehmen.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/studium-und-lehre/service-center-hochschuldidaktik-und-qualitaetsmanagement/cantina-didactica.html",
|
||||
"https://www.startup.hs-mannheim.de/unsere-angebote/future-skills.html#c178449",
|
||||
"https://www.hs-mannheim.de/die-hochschule/organisation-und-gremien/rektorat.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/hochschule-mannheim/familienfreundliche-hochschule/studieren-mit-kind/praxissemester-mit-kind.html",
|
||||
"https://www.hs-mannheim.de/studieninteressierte/die-hsma-kennenlernen.html",
|
||||
"https://www.hs-mannheim.de/studieninteressierte/die-hsma-kennenlernen/studientag.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/organisation-und-gremien/beauftragte.html",
|
||||
"https://www.hs-mannheim.de/studierende/studienstart/unser-campus.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/hochschule-mannheim/familienfreundliche-hochschule.html",
|
||||
"http://www.kompass.hs-mannheim.de/",
|
||||
"https://www.instagram.com/hochschulemannheim/",
|
||||
"https://www.hs-mannheim.de/studierende/studienstart/rund-ums-studium.html",
|
||||
"https://www.startup.hs-mannheim.de/ueber-uns/koepfe.html",
|
||||
"https://www.hs-mannheim.de/studierende/studienorganisation.html",
|
||||
"https://www.hs-mannheim.de/einzelansicht/erfolgreiche-teilnahme-des-verbundprojekts-transforma-an-der-tagung-des-bw-staatsministeriums.html",
|
||||
"https://www.hs-mannheim.de/studierende/studienorganisation/formulardownload.html",
|
||||
"https://www.bmwi.de/Navigation/DE/Home/home.html",
|
||||
"https://www.hs-mannheim.de/careerstation.html",
|
||||
"https://www.hs-mannheim.de/studieninteressierte.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/internationales/international-office/datenschutz.html",
|
||||
"https://www.bmbf.de/bmbf/de/home/home_node.html",
|
||||
"https://www.hs-mannheim.de/beschaeftigte/personalentwicklung/angebote-fuer-neue-mitarbeiterinnen-und-professorinnen.html",
|
||||
"https://www.hs-mannheim.de/einzelansicht-termine/default-d3e26b9355ddb9d1437b6def70101a70.html",
|
||||
"https://www.modal.hs-mannheim.de/",
|
||||
"https://www.career.hs-mannheim.de/",
|
||||
"https://www.startup.hs-mannheim.de/datenschutzerklaerung.html",
|
||||
"https://www.career.hs-mannheim.de",
|
||||
"https://www.total-e-quality.de/",
|
||||
"https://www.startup.hs-mannheim.de/default-38103d67d0.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/forschung-und-transfer/forschungsfoerderung.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/hochschule-mannheim/amtliche-informationen.html",
|
||||
"https://www.startup.hs-mannheim.de/35912-1-2.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/hochschule-mannheim/familienfreundliche-hochschule/beschaeftigte-mit-kind.html",
|
||||
"http://www.biotech.hs-mannheim.de",
|
||||
"https://hs-mannheim.webex.com/meet/m.reisner",
|
||||
"https://www.startup.hs-mannheim.de/coaching.html",
|
||||
"https://www.hs-mannheim.de/studieninteressierte/unsere-studiengaenge/binationale-studiengaenge.html",
|
||||
"https://noten.hs-mannheim.de/",
|
||||
"https://www.hs-mannheim.de/einzelansicht-termine/literaturrecherche-sozialwesen-wirtschaft.html",
|
||||
"http://www.hrk.de/audit",
|
||||
"https://hmt-mrn.de/",
|
||||
"https://unglaublich-wichtig.de",
|
||||
"https://www.hs-mannheim.de/die-hochschule/internationales/deutsch-franzoesisches-zentrum.html",
|
||||
"https://www.hs-mannheim.de/bewerbung/bewerbung-fuer-hoehere-semester.html",
|
||||
"https://www.startup.hs-mannheim.de/ueber-uns/ueber-das-gruendungszentrum.html",
|
||||
"http://www.english.hs-mannheim.de/",
|
||||
"http://www.hs-mannheim.de/die-hochschule/hochschule-mannheim/familienfreundliche-hochschule.html",
|
||||
"https://www.hs-mannheim.de/die-hochschule/studium-und-lehre/service-center-hochschuldidaktik-und-qualitaetsmanagement/i-tube-mathe.html",
|
||||
"https://www.hs-mannheim.de/beschaeftigte/personalentwicklung/telearbeit-und-gleitzeit.html"
|
||||
]
|
File diff suppressed because one or more lines are too long
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
File diff suppressed because it is too large
Load Diff
File diff suppressed because one or more lines are too long
Binary file not shown.
File diff suppressed because it is too large
Load Diff
Binary file not shown.
File diff suppressed because one or more lines are too long
File diff suppressed because it is too large
Load Diff
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue