Rela AIRela AI Docs
Tutorials

Extract data from PDFs and create a searchable collection

Learn to create extraction templates, process PDF documents, and enable AI-powered queries on extracted data.

What you will achieve

By the end of this tutorial you will have a complete flow: upload a PDF, extract structured data with AI, and allow an agent to answer questions based on that information. Estimated time: 20 minutes.

Prerequisites

  • An active Rela AI account with dashboard access
  • A test PDF file (e.g., equipment data sheet, purchase order, or certificate)
  • A configured agent (optional, for the query step)

Step 1: Create an extraction template

  1. In the sidebar, go to Data > Templates.
  2. Click New Template.
  3. Define the fields you want to extract:
FieldTypeDescription
serial_numbertextEquipment serial number
manufacturertextManufacturer name
manufacturing_datedateManufacturing date
power_kwnumberRated power in kW
voltagenumberOperating voltage
  1. Assign a name to the template: Equipment Data Sheet.
  2. Click Save.

You should see the template listed in the table with the 5 configured fields.

You can add optional fields by marking them as "not required". The AI will attempt to extract them but will not generate an error if it cannot find them.
Use correct field types. If you define a field as number but the PDF contains text like "N/A", the extraction will fail for that field. Use text if the value may not be numeric.

Step 2: Upload a PDF document

  1. Go to Data > Extractions.
  2. Click New Extraction.
  3. Select the template Equipment Data Sheet.
  4. Drag your PDF file to the upload area or click to select it.
  5. Click Start Extraction.

You should see a progress indicator while the AI analyzes the document.

The AI works best with digitally generated PDFs. Scanned documents with handwritten text may produce OCR errors. If you experience issues, try a higher-quality PDF.

Step 3: The AI extracts the fields

  1. Wait for the status to change to Completed (usually 10-30 seconds).
  2. Review the results in the preview:
{
  "serial_number": "SN-2024-00847",
  "manufacturer": "Siemens",
  "manufacturing_date": "2024-03-15",
  "power_kw": 75,
  "voltage": 480
}

You should see each extracted field with its value and a confidence indicator.

If a field has low confidence, the AI marks it in yellow. You can correct it manually before saving.

Step 4: Verify and save the data

  1. Review each extracted field and correct any errors.
  2. If everything looks correct, click Approve & Save.
  3. The data is stored as a record within a collection.

You should see the saved record in Data > Collections inside the collection associated with the template.

Step 5: Create a query tool

  1. Go to Tools > New Tool.
  2. Select the type Collection Query.
  3. Configure:
    • Name: Query Equipment Sheets
    • Collection: select the collection for the Equipment Data Sheet template
    • Description for AI: Use this tool to look up technical equipment information such as serial number, manufacturer, power, and voltage.
  4. Click Create.

You should see the tool listed and available for assignment to agents.

Step 6: The agent answers with extracted data

  1. Go to Agents and select your agent.
  2. In the Tools section, add Query Equipment Sheets.
  3. Save the changes.
  4. Send a message to the agent: What is the power rating for equipment SN-2024-00847?

You should see a response like: "Equipment SN-2024-00847 manufactured by Siemens has a rated power of 75 kW."

The AI uses semantic search to find the correct record, so the user does not need to type the exact serial number. Phrases like "the Siemens equipment" also work.

Summary

StepActionResult
1Create template5-field structure defined
2Upload PDFDocument loaded for extraction
3AI extractionFields extracted automatically
4Verify and saveRecord stored in collection
5Create toolCollection query available
6Query via agentResponse based on real data

Next steps

On this page