What is MatsuDB?
This guide introduces MatsuDB and its core concepts. For hands-on examples, see the Getting Started guide.
The Problem with Traditional Document Storage
Traditional document storage treats files as black boxes. When you upload a PDF, it remains a single, indivisible unit. To find information, you must download the entire file and search manually. There's no way to link a specific paragraph to related content in another document, or to search across thousands of documents semantically.
MatsuDB solves this problem.
How MatsuDB Works
MatsuDB uses a Document Parsing and Knowledge Graph Engine to transform your documents into a structured, searchable knowledge tree. When you upload a document, MatsuDB automatically:
- Decomposes the document into atomic units called nodes
- Preserves the hierarchical structure (sections, paragraphs, tables)
- Generates semantic embeddings for intelligent search
- Connects related content across document boundaries (knowledge graph)
Each piece of information, a paragraph, an image, a table cell, becomes an independent, searchable entity while maintaining its relationship to the original context.
From Document to Knowledge Tree
Here's how a simple research paper transforms into a MatsuDB knowledge structure:
For a complete reference of all node types (CORPUS, ARTIFACT, SECTION, TEXT, IMAGE, TABLE, FORMULA, FORM, etc.), see the Node Types documentation.
What This Enables
This atomic approach unlocks powerful capabilities:
- Semantic Search
- Precise Navigation
- Cross-Document Links
- Automated Processing
Search across all your documents by meaning, not just keywords. Find "climate impacts on marine ecosystems" even when documents use different terminology like "ocean warming effects."
Navigate directly to a specific paragraph, image, or table cell. No need to download entire files, access exactly the content you need.
Discover related content across document boundaries. A concept in one paper links to its elaboration in another, creating a knowledge network.
Configure workflows that trigger when specific content types are created. Automatically enrich, translate, or analyze content as documents are processed.
The REST API
The MatsuDB REST API provides programmatic access to these capabilities:
| Operation | Description |
|---|---|
| Upload documents | Create corpus nodes that trigger automatic parsing |
| Search content | Find nodes using semantic, lexical, or exact search |
| Navigate structures | Traverse hierarchical relationships between nodes |
| Configure automation | Set up rules and triggers for custom workflows |
All operations use Bearer token authentication that encapsulates your namespace context, ensuring complete data isolation.
For complete endpoint documentation, see the API Reference.
Next Steps
- Start Building
- Understand Concepts
- Upload your first document and see the transformation in action
- Perform your first search to discover content semantically
- Navigate document structures to explore the knowledge tree
- Node: Deep dive into the fundamental unit of information
- Namespace: How data isolation works
- Embeddings: How semantic search works
- Workflows: Automated processing pipelines