Merge conflict resolved
parent
96ad5fd15c
commit
63882d77c0
|
|
@ -0,0 +1,70 @@
|
||||||
|
# PSE2 - Pitchbook Extraction Webapplication
|
||||||
|
|
||||||
|
A microservices platform for processing pitchbook PDFs using OCR and entity extraction services. Combines SpaCy NLP and GPT-based (ExxetaGPT) extraction of kpi in Pitchbooks.
|
||||||
|
|
||||||
|
```
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### 1. Environment Setup
|
||||||
|
Create a `.env` file in the project root:
|
||||||
|
|
||||||
|
# Database
|
||||||
|
DATABASE_URL=url
|
||||||
|
POSTGRES_USER=admin
|
||||||
|
POSTGRES_PASSWORD=password
|
||||||
|
|
||||||
|
# API Key (required for ExxetaGPT service)
|
||||||
|
API_KEY=your_exxeta_jwt_token_here
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Start Application
|
||||||
|
```bash
|
||||||
|
# Build and start all services
|
||||||
|
docker-compose up --build
|
||||||
|
|
||||||
|
# Run in background
|
||||||
|
docker-compose up --build -d
|
||||||
|
|
||||||
|
# Stop services
|
||||||
|
docker-compose down
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Access Application
|
||||||
|
- **Frontend:** http://localhost:8080
|
||||||
|
- **API:** http://localhost:5050
|
||||||
|
|
||||||
|
## Services Overview
|
||||||
|
|
||||||
|
| Service | Port | Purpose |
|
||||||
|
|---------|------|---------|
|
||||||
|
| **Frontend** | 8080 | React UI for file upload and results display |
|
||||||
|
| **Coordinator** | 5050 | Main API, file storage, database management |
|
||||||
|
| **OCR** | 5051 | PDF text extraction using OCRmyPDF |
|
||||||
|
| **ExxetaGPT** | 5053 | AI entity extraction using GPT-4o-mini |
|
||||||
|
| **SpaCy** | 5052 | NLP entity extraction using custom model |
|
||||||
|
| **Validate** | 5054 | Merges and validates results from both extractors |
|
||||||
|
| **Database** | 5432 | PostgreSQL for data persistence |
|
||||||
|
|
||||||
|
## Usage Flow
|
||||||
|
|
||||||
|
1. Upload PDF via web interface
|
||||||
|
2. OCR service extracts text from PDF
|
||||||
|
3. Both ExxetaGPT and SpaCy services extract kpi's entities
|
||||||
|
4. Validate service merges and validates results
|
||||||
|
5. View extracted kpi's and original PDF side-by-side
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
**Services won't start:**
|
||||||
|
```bash
|
||||||
|
# Check logs
|
||||||
|
docker-compose logs
|
||||||
|
```
|
||||||
|
|
||||||
|
**ExxetaGPT errors:**
|
||||||
|
- Ensure `API_KEY` is set in `.env` file
|
||||||
|
- Check API key validity and network access
|
||||||
|
|
||||||
|
**Database connection issues:**
|
||||||
|
- Wait for database health check to pass
|
||||||
|
- Verify `DATABASE_URL` format in `.env`
|
||||||
|
|
@ -1,41 +0,0 @@
|
||||||
# ExxetaGPT Microservice
|
|
||||||
|
|
||||||
## Lokaler Start (ohne Container)
|
|
||||||
|
|
||||||
### 1. Voraussetzungen
|
|
||||||
|
|
||||||
- Python 3.11+
|
|
||||||
- Virtuelle Umgebung (empfohlen)
|
|
||||||
|
|
||||||
```bash
|
|
||||||
python -m venv venv
|
|
||||||
source venv/bin/activate
|
|
||||||
pip install -r requirements.txt
|
|
||||||
```
|
|
||||||
|
|
||||||
### 2. .env Datei erstellen
|
|
||||||
Leg eine .env Datei im Projektverzeichnis mit der Exxeta API-Key an
|
|
||||||
|
|
||||||
(Der API Key ist ein JWT von Exxeta – nicht veröffentlichen!)
|
|
||||||
|
|
||||||
### 3. Starten
|
|
||||||
python app.py
|
|
||||||
|
|
||||||
## Verwendung als Docker-Container
|
|
||||||
|
|
||||||
### 1. Build
|
|
||||||
```bash
|
|
||||||
docker build -t exxeta-gpt .
|
|
||||||
```
|
|
||||||
|
|
||||||
### 2. Starten
|
|
||||||
```bash
|
|
||||||
docker run -p 5050:5050 --env-file .env exxeta-gpt
|
|
||||||
```
|
|
||||||
|
|
||||||
## Beispielaufruf:
|
|
||||||
```bash
|
|
||||||
curl -X POST http://localhost:5050/extract \
|
|
||||||
-H "Content-Type: application/json" \
|
|
||||||
-d @text-per-page.json
|
|
||||||
```
|
|
||||||
|
|
@ -1,20 +0,0 @@
|
||||||
# SpaCy Microservice
|
|
||||||
|
|
||||||
## Den Service mit in einem Docker-Container starten
|
|
||||||
|
|
||||||
### 1. Build
|
|
||||||
```bash
|
|
||||||
docker build -t spacy-service .
|
|
||||||
```
|
|
||||||
|
|
||||||
### 2. Starten
|
|
||||||
```bash
|
|
||||||
docker run -p 5050:5050 spacy-service
|
|
||||||
```
|
|
||||||
|
|
||||||
## Beispielaufruf:
|
|
||||||
```bash
|
|
||||||
curl -X POST http://localhost:5050/extraction \
|
|
||||||
-H "Content-Type: application/json" \
|
|
||||||
-d @text-per-page.json
|
|
||||||
```
|
|
||||||
Loading…
Reference in New Issue