Skip to main content

Extraction

AI-powered structured data extraction

This guide covers the Extraction API, which lets you define named metadata fields and have the AI extract their values from your indexed documents. Files must be indexed as EliseFiles first — see File Indexation.

How It Works

Extraction runs as an asynchronous job, following the same pattern as Question Answering. You define metadata fields to extract, target a set of documents, and retrieve the structured results once the job completes.

The key difference from QA: instead of natural-language questions, you define metadata fields (metas) with a name, a description, and an expected data type. The platform extracts a typed value for each field from each document.

Creating an Extraction Job

curl -X POST "https://<api-domain>/api/core/extraction/jobs" \
-H "Authorization: Bearer <your-pat>" \
-H "Content-Type: application/json" \
-d '{
"files": {
"fileIds": ["a1b2c3d4-e5f6-7890-abcd-ef1234567890"],
"collectionIds": []
},
"metas": [
{
"meta": "document_title",
"description": "The full title of the document",
"answerType": { "dataType": "STRING", "multiValued": false }
},
{
"meta": "study_year",
"description": "The year the study was conducted or published",
"answerType": { "dataType": "INT", "multiValued": false }
},
{
"meta": "risk_level",
"description": "The assessed risk level",
"answerType": {
"dataType": "ENUM",
"multiValued": false,
"enumValues": ["LOW", "MEDIUM", "HIGH"]
}
}
]
}'

Defining Metadata Fields

Each field in the metas array is an EliseMetaInput:

FieldRequiredDescription
metaYesMachine-readable field name (e.g. "document_title")
answerTypeYesExpected data type — see Answer Types
descriptionNoHuman-readable description to guide the AI extraction

Answer Types

The answerType.dataType field specifies the extracted value format:

dataTypeDescriptionPopulated field in DataValue
STRINGFree textstrValue
INTInteger numberlongValue
FLOATDecimal numberdoubleValue
BOOLBoolean valueboolValue
DATEDate or datetimedateValue
ENUMOne of a fixed set of valuesstrValue

Set multiValued: true to extract a list of values (e.g. multiple authors). The list variant of the corresponding field is then populated (strListValue, longListValue, etc.).

Polling Job Status

The job runs asynchronously. Poll until status is SUCCESS or FAILED.

curl -s "https://<api-domain>/api/core/extraction/jobs/${JOB_ID}" \
-H "Authorization: Bearer <your-pat>"

Retrieving Results

Once the job is SUCCESS, retrieve the extracted values.

curl -s "https://<api-domain>/api/core/extraction/jobs/${JOB_ID}/results" \
-H "Authorization: Bearer <your-pat>"

Understanding the Result Fields

Each EliseMetaResult contains:

FieldDescription
metaThe field name as defined in the job input
answerTyped value — see DataValue below
rawValueThe raw extracted value as returned by the AI engine (untyped)
explanationWhy the AI extracted this value
referenceIdsAnnotation IDs pointing to source passages

The answer field is a DataValue object. Only one field within it is populated, depending on the answerType.dataType defined when the job was created:

{
"meta": "study_year",
"answer": { "longValue": 2023 },
"rawValue": 2023,
"explanation": "The document states 'published in 2023' on page 1."
}
Multi-valued results

When multiValued: true is set, the list variant of the corresponding field is populated instead (strListValue, longListValue, etc.).

Retrieving Inputs

Retrieve the original field definitions and file targets submitted to the job.

curl -s "https://<api-domain>/api/core/extraction/jobs/${JOB_ID}/inputs" \
-H "Authorization: Bearer <your-pat>"

Retrieving Annotations

Each extraction result includes a referenceIds field identifying the exact document passages the AI used to extract each value. Use the annotations endpoint to resolve those IDs into full objects with text excerpts, document names, and precise positions.

curl -s "https://<api-domain>/api/core/extraction/jobs/${JOB_ID}/annotations" \
-H "Authorization: Bearer <your-pat>"

For the full annotation data model, position types, and lookup patterns, see the Annotations guide.

Listing Extraction Jobs

List all extraction jobs for the current user.

curl -s "https://<api-domain>/api/core/extraction/jobs?page=0&pageSize=20" \
-H "Authorization: Bearer <your-pat>"

Next Steps

  • Annotations — understand the full annotation data model and position types
  • Question Answering for natural-language questions instead of structured fields
  • Collections to organise files before running jobs on them
  • API Reference for complete endpoint documentation