MatsuDB REST API
Download OpenAPI specification:Download
REST API gateway for MatsuDB services providing document intelligence with vector search
List all corpus documents
Get a list of all corpus documents in the namespace
Authorizations:
query Parameters
| page_size | integer Default: 20 Page size |
| page_token | string Page token for pagination |
Responses
Response samples
- 200
- 400
- 500
{- "corpora": [
- {
- "blob_uri": "s3://bucket/key",
- "corpus_checksum": "7cc1b5cf8a72caa6fc7f6f8aa984f4f834b4b1dc6db2d13ba6be4ad837398a62",
- "corpus_id": "12345",
- "created_at": "2025-01-01T12:00:00Z",
- "indexation_status": "completed",
- "mime_type": "application/pdf",
- "original_name": "document.pdf"
}
], - "next_page_token": "eyJwYWdlIjoxfQ==",
- "total_count": 42
}Upload a new corpus document
Upload a document to create a new corpus. Optimized for large file streaming.
Authorizations:
Request Body schema: multipart/form-datarequired
| file required | string <binary> Document file to upload |
| original_name | string Original filename (optional, inferred from file if not provided) |
| key | string Custom storage key (optional, auto-generated if not provided) |
| mime_type | string MIME type (optional, inferred if not provided) |
Responses
Response samples
- 201
- 400
- 413
- 422
- 500
{- "corpus": {
- "blob_uri": "s3://bucket/key",
- "corpus_checksum": "7cc1b5cf8a72caa6fc7f6f8aa984f4f834b4b1dc6db2d13ba6be4ad837398a62",
- "corpus_id": "12345",
- "created_at": "2025-01-01T12:00:00Z",
- "indexation_status": "completed",
- "mime_type": "application/pdf",
- "original_name": "document.pdf"
}, - "ok": true
}Get corpus document by ID
Get details of a specific corpus document by its ID
Authorizations:
path Parameters
| corpus_id required | string Corpus ID |
Responses
Response samples
- 200
- 400
- 404
- 500
{- "corpus": {
- "blob_uri": "s3://bucket/key",
- "corpus_checksum": "7cc1b5cf8a72caa6fc7f6f8aa984f4f834b4b1dc6db2d13ba6be4ad837398a62",
- "corpus_id": "12345",
- "created_at": "2025-01-01T12:00:00Z",
- "indexation_status": "completed",
- "mime_type": "application/pdf",
- "original_name": "document.pdf"
}
}Force reindexation of a corpus
Triggers a forced reindexation of the specified corpus document
Authorizations:
path Parameters
| corpus_id required | string Corpus ID |
Responses
Response samples
- 202
- 400
- 404
- 500
{- "corpus_id": "12345",
- "message": "Reindexation triggered successfully",
- "status": "pending",
- "success": true
}List child nodes
Get a list of child nodes for a given root node (e.g. a corpus)
Authorizations:
query Parameters
| root_node_id required | string Root Node ID (e.g. a corpus id) |
| page_size | integer Default: 20 Page size |
| page_token | string Page token for pagination |
| node_types | string Comma-separated list of node types to filter |
Responses
Response samples
- 200
- 400
- 500
{- "next_page_token": "eyJwYWdlIjoxfQ==",
- "nodes": [
- {
- "blob_checksum": "7cc1b5cf8a72caa6fc7f6f8aa984f4f834b4b1dc6db2d13ba6be4ad837398a62",
- "blob_uri": "s3://bucket/key",
- "classifications": [
- "[application/pdf]"
], - "created_at": "2025-01-01T12:00:00Z",
- "hierarchical_path": "1.2.3",
- "metadata": { },
- "namespace_id": "namespace-123",
- "next_node_id": "12346",
- "node_id": "12345",
- "node_type": "TEXT",
- "parent_node_id": "12344",
- "positions": [
- {
- "path": "1,2,3",
- "payload": "(x0,y0,x1,y1)",
- "position_type": "bbox"
}
], - "root_node_id": "12345",
- "sparse_vec": "{0:0.5,1:0.3}/250002",
- "tags": [
- "string"
], - "text_content": "Document content",
- "text_hash": "abc123",
- "token_count": 1000,
- "vec": [
- 0
]
}
], - "total_count": 42
}Get a node by ID
Get details of a specific node by its ID
Authorizations:
path Parameters
| node_id required | string Node ID |
query Parameters
| root_node_id required | string Root Node ID (e.g. a corpus id) |
Responses
Response samples
- 200
- 400
- 404
- 500
{- "node": {
- "blob_checksum": "7cc1b5cf8a72caa6fc7f6f8aa984f4f834b4b1dc6db2d13ba6be4ad837398a62",
- "blob_uri": "s3://bucket/key",
- "classifications": [
- "[application/pdf]"
], - "created_at": "2025-01-01T12:00:00Z",
- "hierarchical_path": "1.2.3",
- "metadata": { },
- "namespace_id": "namespace-123",
- "next_node_id": "12346",
- "node_id": "12345",
- "node_type": "TEXT",
- "parent_node_id": "12344",
- "positions": [
- {
- "path": "1,2,3",
- "payload": "(x0,y0,x1,y1)",
- "position_type": "bbox"
}
], - "root_node_id": "12345",
- "sparse_vec": "{0:0.5,1:0.3}/250002",
- "tags": [
- "string"
], - "text_content": "Document content",
- "text_hash": "abc123",
- "token_count": 1000,
- "vec": [
- 0
]
}
}Create or update a rule
Create or update a rule for a specific trigger
Authorizations:
Request Body schema: application/jsonrequired
Rule upsert request
| filters required | Array of integers Filter configuration as JSON object |
| trigger_id required | string Trigger ID to configure |
Responses
Request samples
- Payload
{- "filters": [
- 0
], - "trigger_id": "corpus_parsing"
}Response samples
- 200
- 201
- 400
- 500
{- "created": true,
- "updated": false
}Get a rule
Get a specific rule by trigger ID
Authorizations:
path Parameters
| trigger_id required | string Trigger ID |
Responses
Response samples
- 200
- 400
- 404
- 500
{- "rule": {
- "filters": [
- 0
], - "namespace_id": "my-namespace",
- "trigger_id": "corpus_parsing",
- "updated_at": "2025-01-01T12:00:00Z"
}
}Get trigger schema
Get detailed schema information for a specific trigger
Authorizations:
path Parameters
| trigger_id required | string Trigger ID |
Responses
Response samples
- 200
- 400
- 404
- 500
{- "schema": {
- "filter_config_type": "matsu.rules.v1.CorpusParsingFilterConfig",
- "trigger_id": "corpus_parsing"
}
}Perform dense vector search from text
Search nodes using semantic text search (auto-generates dense embeddings)
Authorizations:
Request Body schema: application/jsonrequired
Dense search request
| min_similarity | number [ 0 .. 1 ] Minimum similarity threshold (0.0 to 1.0) |
| node_types | Array of strings Node types to filter by |
| query_text required | string Text query to convert to dense embeddings |
| root_node_ids | Array of strings Root node IDs to search within (as strings) |
| similarity_metric | string Enum: "cosine" "inner_product" "l2" Similarity metric to use (cosine, inner_product, l2). Defaults to inner_product |
| top_k | integer >= 1 Maximum number of results to return |
Responses
Request samples
- Payload
{- "min_similarity": 0.7,
- "node_types": [
- "string"
], - "query_text": "What is machine learning?",
- "root_node_ids": [
- "string"
], - "similarity_metric": "inner_product",
- "top_k": 10
}Response samples
- 200
- 400
- 500
{- "results": [
- {
- "node": {
- "blob_checksum": "7cc1b5cf8a72caa6fc7f6f8aa984f4f834b4b1dc6db2d13ba6be4ad837398a62",
- "blob_uri": "s3://bucket/key",
- "classifications": [
- "[application/pdf]"
], - "created_at": "2025-01-01T12:00:00Z",
- "hierarchical_path": "1.2.3",
- "metadata": { },
- "namespace_id": "namespace-123",
- "next_node_id": "12346",
- "node_id": "12345",
- "node_type": "TEXT",
- "parent_node_id": "12344",
- "positions": [
- {
- "path": "1,2,3",
- "payload": "(x0,y0,x1,y1)",
- "position_type": "bbox"
}
], - "root_node_id": "12345",
- "sparse_vec": "{0:0.5,1:0.3}/250002",
- "tags": [
- "string"
], - "text_content": "Document content",
- "text_hash": "abc123",
- "token_count": 1000,
- "vec": [
- 0
]
}, - "score": 0.95
}
], - "total_count": 42
}Perform exact text search
Search nodes using exact text matching (ILIKE)
Authorizations:
Request Body schema: application/jsonrequired
Exact search request
| node_types | Array of strings Node types to filter by |
| query required | string Text query for exact matching |
| root_node_ids | Array of strings Root node IDs to search within (as strings) |
Responses
Request samples
- Payload
{- "node_types": [
- "string"
], - "query": "search term",
- "root_node_ids": [
- "string"
]
}Response samples
- 200
- 400
- 500
{- "results": [
- {
- "node": {
- "blob_checksum": "7cc1b5cf8a72caa6fc7f6f8aa984f4f834b4b1dc6db2d13ba6be4ad837398a62",
- "blob_uri": "s3://bucket/key",
- "classifications": [
- "[application/pdf]"
], - "created_at": "2025-01-01T12:00:00Z",
- "hierarchical_path": "1.2.3",
- "metadata": { },
- "namespace_id": "namespace-123",
- "next_node_id": "12346",
- "node_id": "12345",
- "node_type": "TEXT",
- "parent_node_id": "12344",
- "positions": [
- {
- "path": "1,2,3",
- "payload": "(x0,y0,x1,y1)",
- "position_type": "bbox"
}
], - "root_node_id": "12345",
- "sparse_vec": "{0:0.5,1:0.3}/250002",
- "tags": [
- "string"
], - "text_content": "Document content",
- "text_hash": "abc123",
- "token_count": 1000,
- "vec": [
- 0
]
}, - "score": 0.95
}
], - "total_count": 42
}Perform sparse vector search from text
Search nodes using text search with sparse embeddings (auto-generates sparse vectors)
Authorizations:
Request Body schema: application/jsonrequired
Sparse search request
| min_similarity | number [ 0 .. 1 ] Minimum similarity threshold (0.0 to 1.0) |
| node_types | Array of strings Node types to filter by |
| query_text required | string Text query to convert to sparse embeddings |
| root_node_ids | Array of strings Root node IDs to search within (as strings) |
| similarity_metric | string Enum: "cosine" "inner_product" "l2" Similarity metric to use (cosine, inner_product, l2). Defaults to inner_product |
| top_k | integer >= 1 Maximum number of results to return |
Responses
Request samples
- Payload
{- "min_similarity": 0.5,
- "node_types": [
- "string"
], - "query_text": "machine learning algorithms",
- "root_node_ids": [
- "string"
], - "similarity_metric": "inner_product",
- "top_k": 10
}Response samples
- 200
- 400
- 500
{- "results": [
- {
- "node": {
- "blob_checksum": "7cc1b5cf8a72caa6fc7f6f8aa984f4f834b4b1dc6db2d13ba6be4ad837398a62",
- "blob_uri": "s3://bucket/key",
- "classifications": [
- "[application/pdf]"
], - "created_at": "2025-01-01T12:00:00Z",
- "hierarchical_path": "1.2.3",
- "metadata": { },
- "namespace_id": "namespace-123",
- "next_node_id": "12346",
- "node_id": "12345",
- "node_type": "TEXT",
- "parent_node_id": "12344",
- "positions": [
- {
- "path": "1,2,3",
- "payload": "(x0,y0,x1,y1)",
- "position_type": "bbox"
}
], - "root_node_id": "12345",
- "sparse_vec": "{0:0.5,1:0.3}/250002",
- "tags": [
- "string"
], - "text_content": "Document content",
- "text_hash": "abc123",
- "token_count": 1000,
- "vec": [
- 0
]
}, - "score": 0.95
}
], - "total_count": 42
}