Search Operations
MatsuDB provides text-based search that automatically generates embeddings from your query text. This guide covers dense semantic search, sparse lexical search, and exact text search. For search fundamentals, see Getting Started.
Search Modes
MatsuDB provides three text-based search modes that serve different query intents:
- Dense Search
- Sparse Search
- Exact Search
Uses semantic embeddings to find conceptually similar content, enabling discovery that transcends exact terminology. The system automatically generates dense embeddings from your query text, so you don't need to provide pre-computed vectors. Finds "ocean conservation" when querying "marine protection" through semantic similarity.
Uses lexical representations to find content with matching terminology, enabling precise matching of technical terms and phrases. The system automatically generates sparse embeddings from your query text. Matches exact terminology like "machine learning algorithm" with precise lexical overlap.
Uses text matching for literal queries requiring exact string matching. Finds literal phrases like "Section 3.2" or specific identifiers. Bypasses embeddings entirely.
- Dense search: Best for conceptual discovery and finding related content
- Sparse search: Best for precise terminology matching and technical content
- Exact search: Best for finding specific identifiers or exact phrases
For detailed information about how embeddings work, see the Embeddings concept documentation.
Dense Semantic Search
Dense search automatically generates dense embeddings from your query text and finds nodes with similar semantic meaning. The system handles embedding generation internally, so you simply provide your query text and the system finds semantically similar content.
Dense search supports configurable similarity metrics: cosine similarity measures vector angles, inner product measures vector alignment, and L2 distance measures geometric proximity. Each metric serves different use cases, with cosine similarity providing scale-invariant ranking and inner product providing magnitude-aware ranking.
The search request includes your query text, minimum similarity threshold, maximum result count, and optional filters for root nodes and node types. Results include similarity scores that indicate how closely each result matches your query, enabling you to understand result relevance and filter by quality.
- Request
- Response
POST /v1/search/dense
Content-Type: application/json
Authorization: Bearer <your-token>
{
"query_text": "climate change impacts",
"min_similarity": 0.7,
"top_k": 10,
"similarity_metric": "cosine",
"root_node_ids": ["corpus-123"],
"node_types": ["TEXT"]
}
{
"results": [
{
"node": {
"node_id": "node-456",
"root_node_id": "corpus-123",
"text_content": "Climate change impacts on ecosystems...",
"node_type": "TEXT"
},
"score": 0.89
}
],
"total_count": 15
}
See the API Reference for complete endpoint documentation.
Sparse Lexical Search
Sparse search automatically generates sparse embeddings from your query text and finds nodes with overlapping token weights, ranking results by lexical similarity. The system handles embedding generation internally, so you simply provide your query text.
Sparse vectors represent text as weighted token mappings within a large vocabulary space. Only non-zero weights are stored, making sparse vectors memory-efficient despite their large vocabulary space. Sparse similarity calculations focus on token overlap, providing lexical precision that complements semantic search.
Sparse search supports the same similarity metrics as dense search, enabling consistent ranking strategies across both search modes. The search request includes your query text, similarity threshold, result limits, and optional filters. Results include similarity scores based on token overlap, enabling you to understand lexical relevance.
- Request
- Response
POST /v1/search/sparse
Content-Type: application/json
Authorization: Bearer <your-token>
{
"query_text": "machine learning algorithm",
"min_similarity": 0.6,
"top_k": 10
}
{
"results": [
{
"node": {
"node_id": "node-789",
"text_content": "The machine learning algorithm processes...",
"node_type": "TEXT"
},
"score": 0.85
}
],
"total_count": 8
}
Exact Text Search
Exact search bypasses embeddings entirely, using text matching for queries requiring literal string matching. This mode serves queries that need exact phrase matching or pattern recognition, providing a fallback when vector search is inappropriate.
Exact search uses case-insensitive pattern matching to find nodes containing your query text. The search scans text content directly, enabling fast literal matching without embedding generation or vector calculations. This mode excels at finding specific identifiers, exact phrases, or patterns that require precise text matching.
The search request includes the query text and optional filters for root nodes and node types. Results are ranked by relevance, with exact matches appearing first. This mode provides immediate results without embedding generation, making it suitable for simple text queries.
- Request
- Response
POST /v1/search/exact
Content-Type: application/json
Authorization: Bearer <your-token>
{
"query": "Section 3.2",
"root_node_ids": ["corpus-123"]
}
{
"results": [
{
"node": {
"node_id": "node-321",
"text_content": "Section 3.2 describes the methodology...",
"node_type": "TEXT"
},
"score": 1.0
}
],
"total_count": 1
}
Filtering and Scoping
All search modes support filtering by root node identifiers and node types. Root node filtering restricts searches to specific documents, enabling document-scoped queries that find content within particular corpus boundaries. Node type filtering restricts searches to specific content types, enabling queries that find only text nodes, only image nodes, or only table nodes.
These filters combine to create sophisticated query patterns. You can search for text nodes within specific documents, or find images across multiple documents, or locate tables in a particular corpus. Filter combinations enable precise queries that match your exact search requirements.
Filters are optional, so you can search across all documents and all node types when appropriate. When filters are omitted, searches explore the complete namespace content, enabling discovery across document boundaries and content types.
- Document-Scoped
- Type-Filtered
- Combined Filters
{
"query_text": "methodology",
"root_node_ids": ["corpus-123"],
"top_k": 20
}
Searches only within the specified document.
{
"query_text": "data analysis",
"node_types": ["TEXT"],
"top_k": 20
}
Searches only text nodes across all documents.
{
"query_text": "results",
"root_node_ids": ["corpus-123", "corpus-456"],
"node_types": ["TEXT", "TABLE"],
"top_k": 20
}
Searches text and table nodes in specific documents.
Understanding Search Results
Search results include matched nodes with similarity scores that indicate relevance. Higher scores indicate closer matches, enabling you to rank results by quality and filter by relevance thresholds. Results are ordered by similarity, with best matches appearing first.
Each result includes the complete node information: content, positions, metadata, and relationships. You can use this information to understand context, navigate to related content, or retrieve additional details. Result nodes include their hierarchical position, enabling you to understand where content appears within document structures.
Search responses include total counts indicating how many nodes matched your query, enabling you to understand result set sizes.
- Use
min_similaritythresholds to filter low-quality results - Higher similarity scores indicate better matches
- Consider combining multiple search modes for comprehensive results
- Use filters to focus searches on relevant content types
Next Steps
Now that you understand search operations, explore:
- Node Navigation: Navigate from search results to related content
- Common Patterns: Real-world search workflows
- Embeddings Concept: Deep dive into how semantic search works
Understanding these concepts will help you use search effectively:
- Embeddings: How semantic search works
- Node: What you're searching for
- Namespace: Search scope boundaries