pse2_ff/project
s8613 ffd30aefee Fixed text and resetting form. 2025-07-01 19:32:42 +02:00
..
backend Neu trainieren-Button deaktiviert, wenn keine neuen Kennzahlen vorhanden sind; Trainingsdaten werden beim Start bereinigt 2025-06-29 16:34:25 +02:00
frontend Fixed text and resetting form. 2025-07-01 19:32:42 +02:00
.env.template Add: OCR sends pdf async to coordinantor 2025-06-07 12:40:32 +02:00
Dockerfile Update project/Dockerfile 2025-06-15 14:16:12 +02:00
README.md Merge conflict resolved 2025-06-04 20:25:54 +02:00
docker-compose.yml Kommentare aus dem Review berücksichtigt und umgesetzt 2025-06-29 00:43:34 +02:00

README.md

PSE2 - Pitchbook Extraction Webapplication

A microservices platform for processing pitchbook PDFs using OCR and entity extraction services. Combines SpaCy NLP and GPT-based (ExxetaGPT) extraction of kpi in Pitchbooks.

## Quick Start

### 1. Environment Setup
Create a `.env` file in the project root:

# Database
DATABASE_URL=url
POSTGRES_USER=admin
POSTGRES_PASSWORD=password

# API Key (required for ExxetaGPT service)
API_KEY=your_exxeta_jwt_token_here

2. Start Application

# Build and start all services
docker-compose up --build

# Run in background
docker-compose up --build -d

# Stop services
docker-compose down

3. Access Application

Services Overview

Service Port Purpose
Frontend 8080 React UI for file upload and results display
Coordinator 5050 Main API, file storage, database management
OCR 5051 PDF text extraction using OCRmyPDF
ExxetaGPT 5053 AI entity extraction using GPT-4o-mini
SpaCy 5052 NLP entity extraction using custom model
Validate 5054 Merges and validates results from both extractors
Database 5432 PostgreSQL for data persistence

Usage Flow

  1. Upload PDF via web interface
  2. OCR service extracts text from PDF
  3. Both ExxetaGPT and SpaCy services extract kpi's entities
  4. Validate service merges and validates results
  5. View extracted kpi's and original PDF side-by-side

Troubleshooting

Services won't start:

# Check logs
docker-compose logs

ExxetaGPT errors:

  • Ensure API_KEY is set in .env file
  • Check API key validity and network access

Database connection issues:

  • Wait for database health check to pass
  • Verify DATABASE_URL format in .env