# PSE2 - Pitchbook Extraction Webapplication A microservices platform for processing pitchbook PDFs using OCR and entity extraction services. Combines SpaCy NLP and GPT-based (ExxetaGPT) extraction of kpi in Pitchbooks. ``` ## Quick Start ### 1. Environment Setup Create a `.env` file in the project root: # Database DATABASE_URL=url POSTGRES_USER=admin POSTGRES_PASSWORD=password # API Key (required for ExxetaGPT service) API_KEY=your_exxeta_jwt_token_here ``` ### 2. Start Application ```bash # Build and start all services docker-compose up --build # Run in background docker-compose up --build -d # Stop services docker-compose down ``` ### 3. Access Application - **Frontend:** http://localhost:8080 - **API:** http://localhost:5050 ## Services Overview | Service | Port | Purpose | |---------|------|---------| | **Frontend** | 8080 | React UI for file upload and results display | | **Coordinator** | 5050 | Main API, file storage, database management | | **OCR** | 5051 | PDF text extraction using OCRmyPDF | | **ExxetaGPT** | 5053 | AI entity extraction using GPT-4o-mini | | **SpaCy** | 5052 | NLP entity extraction using custom model | | **Validate** | 5054 | Merges and validates results from both extractors | | **Database** | 5432 | PostgreSQL for data persistence | ## Usage Flow 1. Upload PDF via web interface 2. OCR service extracts text from PDF 3. Both ExxetaGPT and SpaCy services extract kpi's entities 4. Validate service merges and validates results 5. View extracted kpi's and original PDF side-by-side ## Troubleshooting **Services won't start:** ```bash # Check logs docker-compose logs ``` **ExxetaGPT errors:** - Ensure `API_KEY` is set in `.env` file - Check API key validity and network access **Database connection issues:** - Wait for database health check to pass - Verify `DATABASE_URL` format in `.env`