Document Extractions
Extract structured data from documents with AI. Digitize invoices, manuals, purchase orders, and maintenance reports automatically.
Document Extractions
Extractions allow artificial intelligence to analyze documents — PDFs, images, invoices, technical manuals — and extract relevant data automatically and in a structured form. Instead of manually transcribing data from an invoice or an equipment manual, you upload the document and the AI does the work in seconds.
What is it for?
In an industrial plant, there is an enormous amount of information in paper or unstructured documents: supplier invoices, spare parts purchase orders, scanned inspection reports, technical manuals, calibration certificates. That information is valuable but "trapped" in the document — no one can easily query it.
Rela AI Extractions convert those documents into queryable data:
- Maintenance invoices become records with supplier, amount, parts purchased, date
- Technical manuals become equipment sheets with operating parameters
- Purchase orders become inventory records
- Scanned inspection reports become history records
Once extracted and saved, WhatsApp and email agents can query them in conversations: "What was the cost of repairs on pump B-07 this year?" — the agent searches the records and responds.
How does it work?
The extraction process has 4 steps:
-
Template (Collection): You define what fields you want to extract from the document. Example for an invoice: supplier, invoice number, date, line items, total.
-
Document: You upload the file (PDF, image) or provide the URL.
-
AI analysis: The system analyzes the document using computer vision and OCR, identifies the template fields in the document, and extracts the values.
-
Review and save: You see the extracted values, correct them if needed, and save. The record becomes available for queries.
The AI automatically selects the best method based on the document:
- Vision: for scanned PDFs or images
- Text: for PDFs with selectable text (faster and more precise)
- Combined: vision + OCR for maximum precision on complex documents
How to use it?
Step 1: Prepare the template (Collection)
First you need to have a Collection that defines the fields to extract. Go to Data > Collections and create a collection with the fields for the type of document you want to digitize.
Example collection "Maintenance Invoices":
- Invoice number (text)
- Supplier (text)
- Issue date (date)
- Work description (long text)
- Included spare parts (list)
- Subtotal (number)
- Tax (number)
- Total (number)
Step 2: Start the extraction
- Go to Data > Extractions.
- Click New Extraction.
- Select the collection that defines the fields to extract.
Step 3: Upload the document
You have 3 options:
| Format | When to use it |
|---|---|
| File | Upload the PDF or image directly from your computer (max 25 MB) |
| URL | If the document is on a server or external system — paste the direct URL |
| Text | For documents where you already have the plain text — without computer vision |
Step 4: Review the extracted data
The AI will process the document (generally in 3-15 seconds) and show:
- Extracted field: the name of the template field
- Found value: what the AI read from the document
- Page: which page of the PDF it was found on (useful for multi-page PDFs)
Review each value. If the AI made a reading error (e.g., confused an 8 for a 0), correct it directly.
Step 5: Save the record
Click Save. The record is stored in the collection, queryable by search and by AI agents.
Batch extraction
For documents with multiple items (an invoice with 20 line items, a spare parts catalog):
- Upload the document.
- The system automatically extracts all items as separate records.
- Review the list of extracted items.
- Save all at once.
Ideal for: inventory catalogs, staff lists, reports covering multiple assets.
Enable extraction audit
If you activate the audit option before extracting, the system includes:
- The original document with bounding boxes marking where each data point was found
- The extraction method used
- The exact coordinates of each value in the document
Useful for verifying precision and for documenting the process in compliance projects.
Key benefits
- Document digitization in seconds, without manual transcription
- Support for scanned PDFs, images, and text documents
- Review before saving — you can always correct before confirming
- Batch extraction for documents with multiple records
- Extracted data becomes a knowledge base queryable by AI agents
- Reduction of manual transcription errors
- Complete extraction history for traceability
Common use cases
Scenario 1: Maintenance invoice digitization At the end of each month, the maintenance coordinator receives 30-40 invoices from repair and parts suppliers. Instead of transcribing them manually, they upload each PDF to Rela AI. The AI extracts supplier, invoice number, date, line items, and total in seconds. At the end of the process, all the data is in a queryable collection. The manager can ask the WhatsApp agent: "How much did we spend on maintenance for compressor C-01 in the first quarter?" and receives the answer immediately.
Scenario 2: New equipment technical manual The technical manual for a new generator arrives (120-page PDF). The coordinator extracts the key sections: operating parameters, alarm limits, recommended maintenance schedule, critical spare parts list. Each section is saved as a record in the "Equipment Manuals" collection. The technician can ask the WhatsApp agent: "What is the maximum operating temperature for generator GEN-02?" without having to search through the manual.
Scenario 3: Spare parts purchase orders The warehouse receives paper purchase orders for the spare parts inventory. The person in charge photographs them with their phone and uploads them to Rela AI. The AI extracts the part number, description, quantity, and price from each line. The data is automatically saved in the inventory collection, updating available stock without typing anything.