Getting Started

Quick Start Guide

This guide walks you through uploading a document, understanding the processing lifecycle, and performing your first search. For API fundamentals, see the Introduction guide.

Uploading Your First Document

Uploading a document creates a CORPUS node that represents the original file. The upload endpoint accepts multipart form data with the document file and optional metadata like original filename and MIME type. The system streams large files efficiently, enabling uploads of documents of various sizes.

After upload, the API returns a corpus identifier and metadata. The system immediately begins processing the document through parsing workflows that extract content and create structured node hierarchies. This processing occurs asynchronously, so the document becomes searchable once processing completes.

The upload response includes the corpus identifier, which you'll use to reference the document in subsequent operations. The response also includes the blob checksum, which enables the system to recognize duplicate uploads and reuse existing parsing results.

cURL
Python

curl -X POST "https://<your_domain>/v1/corpus" \
  -H "Authorization: Bearer <your-token>" \
  -F "file=@document.pdf" \
  -F "original_name=document.pdf"

import requests

headers = {"Authorization": "Bearer <your-token>"}

with open("document.pdf", "rb") as f:
    files = {"file": f}
    data = {"original_name": "document.pdf"}
    response = requests.post(
        "https://<your_domain>/v1/corpus",
        headers=headers,
        files=files,
        data=data
    )

Processing is Asynchronous

Document processing happens asynchronously. The upload completes immediately, but parsing and enrichment occur in the background. Use status tracking to monitor when processing completes.

See the Document Management guide for more details on uploading and managing documents.

Understanding the Processing Lifecycle

When you upload a document, the system creates a CORPUS node representing the original file. Processing workflows then create an ARTIFACT node representing the parsed document structure, followed by child nodes representing paragraphs, images, tables, and other document components.

Processing occurs asynchronously through workflows that parse content, generate embeddings, and enrich nodes with additional information. The system tracks processing status for each node, enabling you to monitor progress and understand when content becomes searchable.

You can query node status to understand processing state. Status progresses through PENDING, RUNNING, and COMPLETED states, with FAILED or CANCELLED states indicating problems. Status information includes error messages when failures occur, enabling you to diagnose and resolve processing issues.

Learn More

For detailed information about the processing lifecycle:

Workflows: How automated processing pipelines work
Status Tracking: Monitoring processing progress
Node Types: Understanding document structure

Your First Search

Once processing completes, nodes become searchable through text search operations. The simplest search uses exact text matching to find nodes containing specific terms. More sophisticated searches use semantic embeddings to find conceptually similar content regardless of exact terminology.

To perform a semantic text search, send a POST request to /v1/search/dense with your query text. The system automatically generates embeddings for your query and finds nodes with similar semantic meaning. Results include similarity scores that indicate how closely each result matches your query.

Request
Response

POST /v1/search/dense
Content-Type: application/json
Authorization: Bearer <your-token>

{
  "query_text": "climate change impacts",
  "min_similarity": 0.7,
  "top_k": 10
}

{
  "results": [
    {
      "node_id": "node-123",
      "root_node_id": "corpus-456",
      "similarity_score": 0.89,
      "content": "The effects of global warming on ecosystems...",
      "node_type": "TEXT"
    }
  ],
  "total_count": 15
}

Search results return nodes with their content, positions, and metadata. You can use these results to navigate document structures, retrieve full content, or perform additional searches based on discovered information.

Search Modes

MatsuDB supports multiple search modes:

Dense search: Semantic similarity using dense embeddings
Sparse search: Lexical precision using sparse embeddings
Exact search: Literal text matching

See the Search guide for detailed information about all search modes.

Next Steps

Now that you've completed your first workflow, explore these guides:

Document Management: Upload, list, and manage documents
Search Operations: Advanced search techniques and filtering
Node Navigation: Explore document structures
Automation: Configure automated processing
Common Patterns: Real-world usage patterns

Related Concepts

Deepen your understanding with these concepts:

Node: The fundamental unit of information
Embeddings: How semantic search works
Workflows: Automated processing pipelines

Uploading Your First Document​

Understanding the Processing Lifecycle​

Your First Search​

Next Steps​

Uploading Your First Document

Understanding the Processing Lifecycle

Your First Search

Next Steps