Skip to main content

Search Operations

Search Capabilities

MatsuDB provides text-based search that automatically generates embeddings from your query text. This guide covers dense semantic search, sparse lexical search, and exact text search. For search fundamentals, see Getting Started.

Search Modes

MatsuDB provides three text-based search modes that serve different query intents:

Uses semantic embeddings to find conceptually similar content, enabling discovery that transcends exact terminology. The system automatically generates dense embeddings from your query text, so you don't need to provide pre-computed vectors. Finds "ocean conservation" when querying "marine protection" through semantic similarity.

Choosing a Search Mode
  • Dense search: Best for conceptual discovery and finding related content
  • Sparse search: Best for precise terminology matching and technical content
  • Exact search: Best for finding specific identifiers or exact phrases

For detailed information about how embeddings work, see the Embeddings concept documentation.

Dense search automatically generates dense embeddings from your query text and finds nodes with similar semantic meaning. The system handles embedding generation internally, so you simply provide your query text and the system finds semantically similar content.

Dense search supports configurable similarity metrics: cosine similarity measures vector angles, inner product measures vector alignment, and L2 distance measures geometric proximity. Each metric serves different use cases, with cosine similarity providing scale-invariant ranking and inner product providing magnitude-aware ranking.

The search request includes your query text, minimum similarity threshold, maximum result count, and optional filters for root nodes and node types. Results include similarity scores that indicate how closely each result matches your query, enabling you to understand result relevance and filter by quality.

POST /v1/search/dense
Content-Type: application/json
Authorization: Bearer <your-token>

{
"query_text": "climate change impacts",
"min_similarity": 0.7,
"top_k": 10,
"similarity_metric": "cosine",
"root_node_ids": ["corpus-123"],
"node_types": ["TEXT"]
}

See the API Reference for complete endpoint documentation.

Sparse search automatically generates sparse embeddings from your query text and finds nodes with overlapping token weights, ranking results by lexical similarity. The system handles embedding generation internally, so you simply provide your query text.

Sparse vectors represent text as weighted token mappings within a large vocabulary space. Only non-zero weights are stored, making sparse vectors memory-efficient despite their large vocabulary space. Sparse similarity calculations focus on token overlap, providing lexical precision that complements semantic search.

Sparse search supports the same similarity metrics as dense search, enabling consistent ranking strategies across both search modes. The search request includes your query text, similarity threshold, result limits, and optional filters. Results include similarity scores based on token overlap, enabling you to understand lexical relevance.

POST /v1/search/sparse
Content-Type: application/json
Authorization: Bearer <your-token>

{
"query_text": "machine learning algorithm",
"min_similarity": 0.6,
"top_k": 10
}

Exact search bypasses embeddings entirely, using text matching for queries requiring literal string matching. This mode serves queries that need exact phrase matching or pattern recognition, providing a fallback when vector search is inappropriate.

Exact search uses case-insensitive pattern matching to find nodes containing your query text. The search scans text content directly, enabling fast literal matching without embedding generation or vector calculations. This mode excels at finding specific identifiers, exact phrases, or patterns that require precise text matching.

The search request includes the query text and optional filters for root nodes and node types. Results are ranked by relevance, with exact matches appearing first. This mode provides immediate results without embedding generation, making it suitable for simple text queries.

POST /v1/search/exact
Content-Type: application/json
Authorization: Bearer <your-token>

{
"query": "Section 3.2",
"root_node_ids": ["corpus-123"]
}

Filtering and Scoping

All search modes support filtering by root node identifiers and node types. Root node filtering restricts searches to specific documents, enabling document-scoped queries that find content within particular corpus boundaries. Node type filtering restricts searches to specific content types, enabling queries that find only text nodes, only image nodes, or only table nodes.

These filters combine to create sophisticated query patterns. You can search for text nodes within specific documents, or find images across multiple documents, or locate tables in a particular corpus. Filter combinations enable precise queries that match your exact search requirements.

Filters are optional, so you can search across all documents and all node types when appropriate. When filters are omitted, searches explore the complete namespace content, enabling discovery across document boundaries and content types.

{
"query_text": "methodology",
"root_node_ids": ["corpus-123"],
"top_k": 20
}

Searches only within the specified document.

Understanding Search Results

Search results include matched nodes with similarity scores that indicate relevance. Higher scores indicate closer matches, enabling you to rank results by quality and filter by relevance thresholds. Results are ordered by similarity, with best matches appearing first.

Each result includes the complete node information: content, positions, metadata, and relationships. You can use this information to understand context, navigate to related content, or retrieve additional details. Result nodes include their hierarchical position, enabling you to understand where content appears within document structures.

Search responses include total counts indicating how many nodes matched your query, enabling you to understand result set sizes.

Result Quality
  • Use min_similarity thresholds to filter low-quality results
  • Higher similarity scores indicate better matches
  • Consider combining multiple search modes for comprehensive results
  • Use filters to focus searches on relevant content types

Next Steps

Now that you understand search operations, explore:

  1. Node Navigation: Navigate from search results to related content
  2. Common Patterns: Real-world search workflows
  3. Embeddings Concept: Deep dive into how semantic search works
Related Concepts

Understanding these concepts will help you use search effectively: