From 63882d77c05fa0ac480126b4e7af9849e88a0a3c Mon Sep 17 00:00:00 2001 From: s8613 Date: Wed, 4 Jun 2025 20:25:54 +0200 Subject: [PATCH] Merge conflict resolved --- project/README.md | 70 +++++++++++++++++++++ project/backend/exxetaGPT-service/Readme.md | 41 ------------ project/backend/spacy-service/Readme.md | 20 ------ 3 files changed, 70 insertions(+), 61 deletions(-) create mode 100644 project/README.md delete mode 100644 project/backend/exxetaGPT-service/Readme.md delete mode 100644 project/backend/spacy-service/Readme.md diff --git a/project/README.md b/project/README.md new file mode 100644 index 0000000..264d3fb --- /dev/null +++ b/project/README.md @@ -0,0 +1,70 @@ +# PSE2 - Pitchbook Extraction Webapplication + +A microservices platform for processing pitchbook PDFs using OCR and entity extraction services. Combines SpaCy NLP and GPT-based (ExxetaGPT) extraction of kpi in Pitchbooks. + +``` +## Quick Start + +### 1. Environment Setup +Create a `.env` file in the project root: + +# Database +DATABASE_URL=url +POSTGRES_USER=admin +POSTGRES_PASSWORD=password + +# API Key (required for ExxetaGPT service) +API_KEY=your_exxeta_jwt_token_here +``` + +### 2. Start Application +```bash +# Build and start all services +docker-compose up --build + +# Run in background +docker-compose up --build -d + +# Stop services +docker-compose down +``` + +### 3. Access Application +- **Frontend:** http://localhost:8080 +- **API:** http://localhost:5050 + +## Services Overview + +| Service | Port | Purpose | +|---------|------|---------| +| **Frontend** | 8080 | React UI for file upload and results display | +| **Coordinator** | 5050 | Main API, file storage, database management | +| **OCR** | 5051 | PDF text extraction using OCRmyPDF | +| **ExxetaGPT** | 5053 | AI entity extraction using GPT-4o-mini | +| **SpaCy** | 5052 | NLP entity extraction using custom model | +| **Validate** | 5054 | Merges and validates results from both extractors | +| **Database** | 5432 | PostgreSQL for data persistence | + +## Usage Flow + +1. Upload PDF via web interface +2. OCR service extracts text from PDF +3. Both ExxetaGPT and SpaCy services extract kpi's entities +4. Validate service merges and validates results +5. View extracted kpi's and original PDF side-by-side + +## Troubleshooting + +**Services won't start:** +```bash +# Check logs +docker-compose logs +``` + +**ExxetaGPT errors:** +- Ensure `API_KEY` is set in `.env` file +- Check API key validity and network access + +**Database connection issues:** +- Wait for database health check to pass +- Verify `DATABASE_URL` format in `.env` diff --git a/project/backend/exxetaGPT-service/Readme.md b/project/backend/exxetaGPT-service/Readme.md deleted file mode 100644 index 1123fc8..0000000 --- a/project/backend/exxetaGPT-service/Readme.md +++ /dev/null @@ -1,41 +0,0 @@ -# ExxetaGPT Microservice - -## Lokaler Start (ohne Container) - -### 1. Voraussetzungen - -- Python 3.11+ -- Virtuelle Umgebung (empfohlen) - -```bash -python -m venv venv -source venv/bin/activate -pip install -r requirements.txt -``` - -### 2. .env Datei erstellen -Leg eine .env Datei im Projektverzeichnis mit der Exxeta API-Key an - -(Der API Key ist ein JWT von Exxeta – nicht veröffentlichen!) - -### 3. Starten -python app.py - -## Verwendung als Docker-Container - -### 1. Build -```bash -docker build -t exxeta-gpt . -``` - -### 2. Starten -```bash -docker run -p 5050:5050 --env-file .env exxeta-gpt -``` - -## Beispielaufruf: -```bash -curl -X POST http://localhost:5050/extract \ - -H "Content-Type: application/json" \ - -d @text-per-page.json -``` \ No newline at end of file diff --git a/project/backend/spacy-service/Readme.md b/project/backend/spacy-service/Readme.md deleted file mode 100644 index 80eb551..0000000 --- a/project/backend/spacy-service/Readme.md +++ /dev/null @@ -1,20 +0,0 @@ -# SpaCy Microservice - -## Den Service mit in einem Docker-Container starten - -### 1. Build -```bash -docker build -t spacy-service . -``` - -### 2. Starten -```bash -docker run -p 5050:5050 spacy-service -``` - -## Beispielaufruf: -```bash -curl -X POST http://localhost:5050/extraction \ - -H "Content-Type: application/json" \ - -d @text-per-page.json -``` \ No newline at end of file