Merge conflict resolved
parent
96ad5fd15c
commit
63882d77c0
|
|
@ -0,0 +1,70 @@
|
|||
# PSE2 - Pitchbook Extraction Webapplication
|
||||
|
||||
A microservices platform for processing pitchbook PDFs using OCR and entity extraction services. Combines SpaCy NLP and GPT-based (ExxetaGPT) extraction of kpi in Pitchbooks.
|
||||
|
||||
```
|
||||
## Quick Start
|
||||
|
||||
### 1. Environment Setup
|
||||
Create a `.env` file in the project root:
|
||||
|
||||
# Database
|
||||
DATABASE_URL=url
|
||||
POSTGRES_USER=admin
|
||||
POSTGRES_PASSWORD=password
|
||||
|
||||
# API Key (required for ExxetaGPT service)
|
||||
API_KEY=your_exxeta_jwt_token_here
|
||||
```
|
||||
|
||||
### 2. Start Application
|
||||
```bash
|
||||
# Build and start all services
|
||||
docker-compose up --build
|
||||
|
||||
# Run in background
|
||||
docker-compose up --build -d
|
||||
|
||||
# Stop services
|
||||
docker-compose down
|
||||
```
|
||||
|
||||
### 3. Access Application
|
||||
- **Frontend:** http://localhost:8080
|
||||
- **API:** http://localhost:5050
|
||||
|
||||
## Services Overview
|
||||
|
||||
| Service | Port | Purpose |
|
||||
|---------|------|---------|
|
||||
| **Frontend** | 8080 | React UI for file upload and results display |
|
||||
| **Coordinator** | 5050 | Main API, file storage, database management |
|
||||
| **OCR** | 5051 | PDF text extraction using OCRmyPDF |
|
||||
| **ExxetaGPT** | 5053 | AI entity extraction using GPT-4o-mini |
|
||||
| **SpaCy** | 5052 | NLP entity extraction using custom model |
|
||||
| **Validate** | 5054 | Merges and validates results from both extractors |
|
||||
| **Database** | 5432 | PostgreSQL for data persistence |
|
||||
|
||||
## Usage Flow
|
||||
|
||||
1. Upload PDF via web interface
|
||||
2. OCR service extracts text from PDF
|
||||
3. Both ExxetaGPT and SpaCy services extract kpi's entities
|
||||
4. Validate service merges and validates results
|
||||
5. View extracted kpi's and original PDF side-by-side
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**Services won't start:**
|
||||
```bash
|
||||
# Check logs
|
||||
docker-compose logs
|
||||
```
|
||||
|
||||
**ExxetaGPT errors:**
|
||||
- Ensure `API_KEY` is set in `.env` file
|
||||
- Check API key validity and network access
|
||||
|
||||
**Database connection issues:**
|
||||
- Wait for database health check to pass
|
||||
- Verify `DATABASE_URL` format in `.env`
|
||||
|
|
@ -1,41 +0,0 @@
|
|||
# ExxetaGPT Microservice
|
||||
|
||||
## Lokaler Start (ohne Container)
|
||||
|
||||
### 1. Voraussetzungen
|
||||
|
||||
- Python 3.11+
|
||||
- Virtuelle Umgebung (empfohlen)
|
||||
|
||||
```bash
|
||||
python -m venv venv
|
||||
source venv/bin/activate
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
### 2. .env Datei erstellen
|
||||
Leg eine .env Datei im Projektverzeichnis mit der Exxeta API-Key an
|
||||
|
||||
(Der API Key ist ein JWT von Exxeta – nicht veröffentlichen!)
|
||||
|
||||
### 3. Starten
|
||||
python app.py
|
||||
|
||||
## Verwendung als Docker-Container
|
||||
|
||||
### 1. Build
|
||||
```bash
|
||||
docker build -t exxeta-gpt .
|
||||
```
|
||||
|
||||
### 2. Starten
|
||||
```bash
|
||||
docker run -p 5050:5050 --env-file .env exxeta-gpt
|
||||
```
|
||||
|
||||
## Beispielaufruf:
|
||||
```bash
|
||||
curl -X POST http://localhost:5050/extract \
|
||||
-H "Content-Type: application/json" \
|
||||
-d @text-per-page.json
|
||||
```
|
||||
|
|
@ -1,20 +0,0 @@
|
|||
# SpaCy Microservice
|
||||
|
||||
## Den Service mit in einem Docker-Container starten
|
||||
|
||||
### 1. Build
|
||||
```bash
|
||||
docker build -t spacy-service .
|
||||
```
|
||||
|
||||
### 2. Starten
|
||||
```bash
|
||||
docker run -p 5050:5050 spacy-service
|
||||
```
|
||||
|
||||
## Beispielaufruf:
|
||||
```bash
|
||||
curl -X POST http://localhost:5050/extraction \
|
||||
-H "Content-Type: application/json" \
|
||||
-d @text-per-page.json
|
||||
```
|
||||
Loading…
Reference in New Issue