Common Usage Patterns

Real-World Workflows

This guide presents common usage patterns that combine multiple API operations to solve real-world problems. These patterns demonstrate complete workflows from document upload to search and navigation.

Complete Document Processing Workflow

A typical workflow begins by uploading documents to create corpus nodes that trigger parsing workflows. Monitor processing status to understand when parsing completes and nodes become available. Once processing completes, perform searches to discover relevant content, then navigate document structures to access specific nodes.

This workflow demonstrates the complete document lifecycle from upload to search. Each step builds on previous steps: uploads create searchable content, processing makes content discoverable, and searches enable content access. Understanding this workflow enables you to integrate MatsuDB into your applications effectively.

Status tracking enables you to monitor processing progress and understand when content becomes searchable. Query node status to determine processing state, then proceed with searches once processing completes. This monitoring enables responsive applications that adapt to processing timing.

Python Example
JavaScript Example

import requests
import time

headers = {"Authorization": "Bearer <your-token>"}

# 1. Upload document
with open("document.pdf", "rb") as f:
    files = {"file": f}
    response = requests.post(
        "https://<your_domain>/v1/corpus",
        headers=headers,
        files=files
    )
    corpus = response.json()
    corpus_id = corpus["corpus_id"]

# 3. Monitor processing
while True:
    response = requests.get(
        f"https://<your_domain>/v1/corpus/{corpus_id}",
        headers=headers
    )
    corpus_info = response.json()

    if corpus_info["processing_status"] == "COMPLETED":
        break

    time.sleep(5)  # Poll every 5 seconds

# 4. Search content
search_response = requests.post(
    "https://<your_domain>/v1/search/dense",
    headers=headers,
    json={
        "query_text": "methodology",
        "max_results": 10
    }
)
results = search_response.json()

# 5. Navigate to first result
if results["results"]:
    node_id = results["results"][0]["node_id"]
    node_response = requests.get(
        f"https://<your_domain>/v1/nodes/{node_id}",
        headers=headers,
        params={"root_node_id": corpus_id}
    )
    node = node_response.json()
    print(f"Found: {node['content']}")

// 1. Upload document
const formData = new FormData();
formData.append("file", fileInput.files[0]);

const uploadResponse = await fetch("https://<your_domain>/v1/corpus", {
  method: "POST",
  headers: { Authorization: "Bearer <your-token>" },
  body: formData,
});
const corpus = await uploadResponse.json();
const corpusId = corpus.corpus_id;

// 2. Monitor processing
const pollStatus = async () => {
  while (true) {
    const response = await fetch(
      `https://<your_domain>/v1/corpus/${corpusId}`,
      { headers: { Authorization: "Bearer <your-token>" } }
    );
    const corpusInfo = await response.json();

    if (corpusInfo.processing_status === "COMPLETED") {
      break;
    }

    await new Promise((resolve) => setTimeout(resolve, 5000));
  }
};

await pollStatus();

// 3. Search and navigate
const searchResponse = await fetch("https://<your_domain>/v1/search/dense", {
  method: "POST",
  headers: {
    Authorization: "Bearer <your-token>",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    query_text: "methodology",
    max_results: 10,
  }),
});
const results = await searchResponse.json();

if (results.results.length > 0) {
  const nodeId = results.results[0].node_id;
  const nodeResponse = await fetch(
    `https://<your_domain>/v1/nodes/${nodeId}?root_node_id=${corpusId}`,
    { headers: { Authorization: "Bearer <your-token>" } }
  );
  const node = await nodeResponse.json();
  console.log("Found:", node.content);
}

Semantic Search Across Documents

Semantic search enables discovery of conceptually similar content across document boundaries. Upload multiple documents to create separate corpus nodes, then perform text-based dense searches that explore all documents simultaneously. Results include nodes from different documents, enabling cross-document discovery.

This pattern enables knowledge discovery where related content appears in different documents. A query about "climate change" might find relevant paragraphs in scientific papers, policy documents, and news articles, even when terminology differs. Semantic search transcends document boundaries, creating connections that literal search would miss.

Filter by root node identifiers to restrict searches to specific documents when needed, or omit filters to search across all documents. This flexibility enables both document-scoped and cross-document search strategies that match your discovery needs.

Cross-Document Search
Multi-Document Search

# Search across all documents
response = requests.post(
    "https://<your_domain>/v1/search/dense",
    headers=headers,
    json={
        "query_text": "climate change impacts",
        "max_results": 20
        # No root_node_ids filter = search all documents
    }
)
results = response.json()

# Group results by document
by_document = {}
for result in results["results"]:
    doc_id = result["root_node_id"]
    if doc_id not in by_document:
        by_document[doc_id] = []
    by_document[doc_id].append(result)

# Search specific documents
response = requests.post(
    "https://<your_domain>/v1/search/dense",
    headers=headers,
    json={
        "query_text": "methodology",
        "root_node_ids": ["corpus-123", "corpus-456", "corpus-789"],
        "max_results": 10
    }
)
results = response.json()

Learn More

For detailed information about search modes, see the Search guide. For understanding how semantic search works, see the Embeddings concept.

Automated Processing Configuration

Configure rules to customize automated processing for your namespace. List available triggers to understand automation capabilities, then retrieve trigger schemas to understand filter configuration structures. Create rules that associate triggers with filter configurations, enabling processing strategies that match your organizational needs.

This pattern enables processing customization without system modifications. Configure parsing triggers to process only PDF documents, embedding triggers to process only text nodes, or augmentation triggers to process only specific content types. This customization enables efficient processing that focuses resources on relevant content.

Monitor rule effectiveness by observing workflow execution and adjusting filter configurations as needed. Iterative refinement enables optimization of processing pipelines, ensuring that automation serves your organizational goals effectively.

Setup Automation
Monitor Effectiveness

# 1. Discover available triggers
triggers_response = requests.get(
    "https://<your_domain>/v1/rules/triggers",
    headers=headers
)
triggers = triggers_response.json()

# 2. Get trigger schema
schema_response = requests.get(
    "https://<your_domain>/v1/rules/triggers/corpus_parsing/schema",
    headers=headers
)
schema = schema_response.json()

# 3. Configure rule
rule_response = requests.post(
    "https://<your_domain>/v1/rules",
    headers=headers,
    json={
        "trigger_id": "corpus_parsing",
        "filter": {
            "mime_types": ["application/pdf"]
        }
    }
)

# Monitor workflow execution through status tracking
status_response = requests.get(
    "https://<your_domain>/v1/status",
    headers=headers,
    params={
        "operation": "corpus_parsing",
        "state": "COMPLETED"
    }
)
completed = status_response.json()

# Analyze processing success rate
total = len(completed["statuses"])
successful = sum(1 for s in completed["statuses"] if s["state"] == "COMPLETED")
success_rate = successful / total if total > 0 else 0
print(f"Success rate: {success_rate:.2%}")

Hierarchical Document Exploration

Navigate document structures by retrieving corpus nodes, then listing child nodes to discover document organization. Retrieve specific nodes to access content, and use parent-child relationships to navigate hierarchies. Follow sequential links to move through content in reading order, or use hierarchical paths to understand structural relationships.

This pattern enables exploration of document structures without requiring complete hierarchy loading. Progressive navigation discovers structure incrementally, enabling efficient exploration of large documents. Navigation patterns adapt to your needs: follow reading order for sequential access, or follow hierarchies for structural understanding.

Combine navigation with search to discover entry points, then navigate from discovered nodes to related content. This hybrid approach enables both discovery and exploration, supporting comprehensive document understanding.

Hierarchical Navigation
Sequential Navigation

def explore_hierarchy(root_node_id, current_node_id=None, depth=0):
    """Recursively explore document hierarchy"""
    if current_node_id is None:
        current_node_id = root_node_id

    # Retrieve current node
    node_response = requests.get(
        f"https://<your_domain>/v1/nodes/{current_node_id}",
        headers=headers,
        params={"root_node_id": root_node_id}
    )
    node = node_response.json()

    print("  " * depth + f"{node['node_type']}: {node['content'][:50]}...")

    # List children
    children_response = requests.get(
        "https://<your_domain>/v1/nodes/children",
        headers=headers,
        params={
            "root_node_id": root_node_id,
            "parent_node_id": current_node_id
        }
    )
    children = children_response.json()

    # Recursively explore children
    for child in children["nodes"]:
        explore_hierarchy(root_node_id, child["node_id"], depth + 1)

# Start exploration from corpus
explore_hierarchy(corpus_id)

def read_sequentially(root_node_id, start_node_id):
    """Read document in sequential order"""
    current_node_id = start_node_id

    while current_node_id:
        node_response = requests.get(
            f"https://<your_domain>/v1/nodes/{current_node_id}",
            headers=headers,
            params={"root_node_id": root_node_id}
        )
        node = node_response.json()

        print(node["content"])

        # Follow sequential link
        current_node_id = node.get("next_node_id")

# Start from first node
read_sequentially(corpus_id, first_node_id)

Multi-Corpus Management

Manage multiple corpus documents by listing corpora to discover your document collection, then retrieving specific corpora to understand document metadata. Perform searches across multiple corpora by omitting root node filters, or restrict searches to specific corpora by including root node identifiers.

This pattern enables collection-level operations where you manage and search across multiple documents. List operations provide collection overview, while search operations enable discovery across document boundaries. Filter operations enable document-scoped queries when needed.

Status tracking enables monitoring of processing across multiple documents, providing visibility into collection-wide processing state. Query status by root node to understand document-level processing, or query status by operation to understand workflow-level processing across documents.

Collection Overview
Cross-Corpus Search

# List all corpora
corpora_response = requests.get(
    "https://<your_domain>/v1/corpus",
    headers=headers,
    params={"page_size": 100}
)
corpora = corpora_response.json()

# Get processing status for each
for corpus in corpora["corpora"]:
    corpus_info = requests.get(
        f"https://<your_domain>/v1/corpus/{corpus['corpus_id']}",
        headers=headers
    ).json()

    print(f"{corpus['original_name']}: {corpus_info['processing_status']}")

# Search across all corpora
all_corpus_ids = [c["corpus_id"] for c in corpora["corpora"]]

search_response = requests.post(
    "https://<your_domain>/v1/search/dense",
    headers=headers,
    json={
        "query_text": "research findings",
        "root_node_ids": all_corpus_ids,  # Search specific corpora
        "max_results": 50
    }
)
results = search_response.json()

# Analyze results by corpus
by_corpus = {}
for result in results["results"]:
    corpus_id = result["root_node_id"]
    if corpus_id not in by_corpus:
        by_corpus[corpus_id] = []
    by_corpus[corpus_id].append(result)

Next Steps

Now that you understand common patterns, explore:

API Reference: Complete endpoint documentation
Concepts Documentation: Deep dive into system architecture
Other Guides: Additional usage patterns and techniques

Related Concepts

These concepts are central to the patterns described:

Node: The fundamental unit of information
Workflows: Automated processing pipelines
Embeddings: Semantic search capabilities
Rules & Triggers: Automation configuration

Complete Document Processing Workflow​

Semantic Search Across Documents​

Automated Processing Configuration​

Hierarchical Document Exploration​

Multi-Corpus Management​

Next Steps​