Search Operations

Search Capabilities

MatsuDB provides text-based search that automatically generates embeddings from your query text. This guide covers dense semantic search, sparse lexical search, and exact text search. For search fundamentals, see Getting Started.

Search Modes

MatsuDB provides three text-based search modes that serve different query intents:

Dense Search
Sparse Search
Exact Search

Uses semantic embeddings to find conceptually similar content, enabling discovery that transcends exact terminology. The system automatically generates dense embeddings from your query text, so you don't need to provide pre-computed vectors. Finds "ocean conservation" when querying "marine protection" through semantic similarity.

Choosing a Search Mode

Dense search: Best for conceptual discovery and finding related content
Sparse search: Best for precise terminology matching and technical content
Exact search: Best for finding specific identifiers or exact phrases

For detailed information about how embeddings work, see the Embeddings concept documentation.

Dense Semantic Search

Dense search automatically generates dense embeddings from your query text and finds nodes with similar semantic meaning. The system handles embedding generation internally, so you simply provide your query text and the system finds semantically similar content.

Dense search supports configurable similarity metrics: cosine similarity measures vector angles, inner product measures vector alignment, and L2 distance measures geometric proximity. Each metric serves different use cases, with cosine similarity providing scale-invariant ranking and inner product providing magnitude-aware ranking.

The search request includes your query text, minimum similarity threshold, maximum result count, and optional filters for root nodes and node types. Results include similarity scores that indicate how closely each result matches your query, enabling you to understand result relevance and filter by quality.

Request
Response

POST /v1/search/dense
Content-Type: application/json
Authorization: Bearer <your-token>

{
  "query_text": "climate change impacts",
  "min_similarity": 0.7,
  "top_k": 10,
  "similarity_metric": "cosine",
  "root_node_ids": ["corpus-123"],
  "node_types": ["TEXT"]
}

{
  "results": [
    {
      "node": {
        "node_id": "node-456",
        "root_node_id": "corpus-123",
        "text_content": "Climate change impacts on ecosystems...",
        "node_type": "TEXT"
      },
      "score": 0.89
    }
  ],
  "total_count": 15
}

See the API Reference for complete endpoint documentation.

Sparse Lexical Search

Sparse search automatically generates sparse embeddings from your query text and finds nodes with overlapping token weights, ranking results by lexical similarity. The system handles embedding generation internally, so you simply provide your query text.

Sparse vectors represent text as weighted token mappings within a large vocabulary space. Only non-zero weights are stored, making sparse vectors memory-efficient despite their large vocabulary space. Sparse similarity calculations focus on token overlap, providing lexical precision that complements semantic search.

Sparse search supports the same similarity metrics as dense search, enabling consistent ranking strategies across both search modes. The search request includes your query text, similarity threshold, result limits, and optional filters. Results include similarity scores based on token overlap, enabling you to understand lexical relevance.

Request
Response

POST /v1/search/sparse
Content-Type: application/json
Authorization: Bearer <your-token>

{
  "query_text": "machine learning algorithm",
  "min_similarity": 0.6,
  "top_k": 10
}

{
  "results": [
    {
      "node": {
        "node_id": "node-789",
        "text_content": "The machine learning algorithm processes...",
        "node_type": "TEXT"
      },
      "score": 0.85
    }
  ],
  "total_count": 8
}

Exact Text Search

Exact search bypasses embeddings entirely, using text matching for queries requiring literal string matching. This mode serves queries that need exact phrase matching or pattern recognition, providing a fallback when vector search is inappropriate.

Exact search uses case-insensitive pattern matching to find nodes containing your query text. The search scans text content directly, enabling fast literal matching without embedding generation or vector calculations. This mode excels at finding specific identifiers, exact phrases, or patterns that require precise text matching.

The search request includes the query text and optional filters for root nodes and node types. Results are ranked by relevance, with exact matches appearing first. This mode provides immediate results without embedding generation, making it suitable for simple text queries.

Request
Response

POST /v1/search/exact
Content-Type: application/json
Authorization: Bearer <your-token>

{
  "query": "Section 3.2",
  "root_node_ids": ["corpus-123"]
}

{
  "results": [
    {
      "node": {
        "node_id": "node-321",
        "text_content": "Section 3.2 describes the methodology...",
        "node_type": "TEXT"
      },
      "score": 1.0
    }
  ],
  "total_count": 1
}

Filtering and Scoping

All search modes support filtering by root node identifiers and node types. Root node filtering restricts searches to specific documents, enabling document-scoped queries that find content within particular corpus boundaries. Node type filtering restricts searches to specific content types, enabling queries that find only text nodes, only image nodes, or only table nodes.

These filters combine to create sophisticated query patterns. You can search for text nodes within specific documents, or find images across multiple documents, or locate tables in a particular corpus. Filter combinations enable precise queries that match your exact search requirements.

Filters are optional, so you can search across all documents and all node types when appropriate. When filters are omitted, searches explore the complete namespace content, enabling discovery across document boundaries and content types.

Document-Scoped
Type-Filtered
Combined Filters

{
  "query_text": "methodology",
  "root_node_ids": ["corpus-123"],
  "top_k": 20
}

Searches only within the specified document.

{
  "query_text": "data analysis",
  "node_types": ["TEXT"],
  "top_k": 20
}

Searches only text nodes across all documents.

{
  "query_text": "results",
  "root_node_ids": ["corpus-123", "corpus-456"],
  "node_types": ["TEXT", "TABLE"],
  "top_k": 20
}

Searches text and table nodes in specific documents.

Understanding Search Results

Search results include matched nodes with similarity scores that indicate relevance. Higher scores indicate closer matches, enabling you to rank results by quality and filter by relevance thresholds. Results are ordered by similarity, with best matches appearing first.

Each result includes the complete node information: content, positions, metadata, and relationships. You can use this information to understand context, navigate to related content, or retrieve additional details. Result nodes include their hierarchical position, enabling you to understand where content appears within document structures.

Search responses include total counts indicating how many nodes matched your query, enabling you to understand result set sizes.

Result Quality

Use min_similarity thresholds to filter low-quality results
Higher similarity scores indicate better matches
Consider combining multiple search modes for comprehensive results
Use filters to focus searches on relevant content types

Next Steps

Now that you understand search operations, explore:

Node Navigation: Navigate from search results to related content
Common Patterns: Real-world search workflows
Embeddings Concept: Deep dive into how semantic search works

Related Concepts

Understanding these concepts will help you use search effectively:

Embeddings: How semantic search works
Node: What you're searching for
Namespace: Search scope boundaries

Search Modes​

Dense Semantic Search​

Sparse Lexical Search​

Exact Text Search​

Filtering and Scoping​

Understanding Search Results​

Next Steps​