|
|
||
|---|---|---|
| .. | ||
| backend | ||
| frontend | ||
| .env.template | ||
| Dockerfile | ||
| README.md | ||
| docker-compose.yml | ||
README.md
PSE2 - Pitchbook Extraction Webapplication
A microservices platform for processing pitchbook PDFs using OCR and entity extraction services. Combines SpaCy NLP and GPT-based (ExxetaGPT) extraction of kpi in Pitchbooks.
## Quick Start
### 1. Environment Setup
Create a `.env` file in the project root:
# Database
DATABASE_URL=url
POSTGRES_USER=admin
POSTGRES_PASSWORD=password
# API Key (required for ExxetaGPT service)
API_KEY=your_exxeta_jwt_token_here
2. Start Application
# Build and start all services
docker-compose up --build
# Run in background
docker-compose up --build -d
# Stop services
docker-compose down
3. Access Application
- Frontend: http://localhost:8080
- API: http://localhost:5050
Services Overview
| Service | Port | Purpose |
|---|---|---|
| Frontend | 8080 | React UI for file upload and results display |
| Coordinator | 5050 | Main API, file storage, database management |
| OCR | 5051 | PDF text extraction using OCRmyPDF |
| ExxetaGPT | 5053 | AI entity extraction using GPT-4o-mini |
| SpaCy | 5052 | NLP entity extraction using custom model |
| Validate | 5054 | Merges and validates results from both extractors |
| Database | 5432 | PostgreSQL for data persistence |
Usage Flow
- Upload PDF via web interface
- OCR service extracts text from PDF
- Both ExxetaGPT and SpaCy services extract kpi's entities
- Validate service merges and validates results
- View extracted kpi's and original PDF side-by-side
Troubleshooting
Services won't start:
# Check logs
docker-compose logs
ExxetaGPT errors:
- Ensure
API_KEYis set in.envfile - Check API key validity and network access
Database connection issues:
- Wait for database health check to pass
- Verify
DATABASE_URLformat in.env