Workflows

Automated Processing Pipelines

Definition

Workflows in MatsuDB represent automated processing pipelines that transform and enrich document content. When a document is uploaded or nodes are created, workflows execute sequences of operations that parse content, generate embeddings, and augment nodes with additional information. These workflows operate asynchronously, enabling responsive system behavior while complex processing occurs in the background.

Workflows are triggered automatically by the rules and triggers system, responding to events in the node lifecycle. They orchestrate multiple activities—discrete processing steps that perform specific operations—coordinating their execution, managing retries, and tracking progress. This orchestration ensures reliable processing even when individual steps encounter temporary failures.

Core Philosophy: Orchestrated Processing

Workflows embody the bonsai philosophy by enabling selective, configurable processing. Different workflows handle different aspects of content transformation: parsing workflows decompose documents into nodes, embedding workflows generate semantic representations, and augmentation workflows enrich nodes with additional context. Each workflow can be configured independently, allowing organizations to customize their processing pipelines according to their needs.

The workflow system separates orchestration from execution. Workflows define the sequence and coordination of operations, while activities perform the actual work. This separation enables workflows to manage retries, handle failures gracefully, and track progress without coupling to specific implementation details. Workflows can be monitored, cancelled, and resumed, providing visibility and control over long-running processing operations.

Asynchronous Processing

Workflows operate asynchronously, enabling responsive system behavior while complex processing occurs in the background. This allows users to continue working while documents are being processed.

Workflow Types

MatsuDB provides several workflow types, each addressing a specific aspect of document processing:

Parsing Workflows
Embedding Workflows
Augmentation Workflows

Transform uploaded documents into structured node hierarchies. When a corpus node is created, parsing workflows extract content, identify structural elements, and create artifact nodes with child nodes representing paragraphs, images, tables, and other document components.

Supports multiple formats: PDF, CSV, spreadsheets, and text files.

Workflow Lifecycle

Workflows progress through distinct phases from initiation to completion:

Initiation: When triggered, a workflow receives input parameters specifying which nodes to process and what operations to perform
Configuration: The workflow loads configuration settings that determine timeouts, retry policies, and processing parameters
Execution: Workflows coordinate multiple activities, each performing a specific processing step. Activities execute sequentially or in parallel as required by the workflow logic
Status Tracking: The workflow tracks the status of each node being processed, updating status information as activities complete or fail
Completion: Upon completion, workflows update node statuses to reflect success or failure, enabling downstream systems and users to understand processing outcomes

Workflows handle failures through retry mechanisms, attempting failed activities multiple times before marking nodes as failed. This resilience ensures that temporary issues—network interruptions, service unavailability, or transient errors—do not cause permanent processing failures. Workflows can be cancelled or terminated, gracefully stopping processing and updating node statuses accordingly.

Workflow Chaining

Workflows naturally chain together as processing progresses. A parsing workflow creates nodes, which triggers embedding workflows for text nodes and augmentation workflows for specialized node types. These workflows execute independently, enabling parallel processing of different node types while maintaining coordination through status tracking.

The chaining occurs automatically through the rules and triggers system. When a parsing workflow creates text nodes, embedding triggers detect the new nodes and initiate embedding workflows. When formula nodes are created, formula augmentation triggers initiate augmentation workflows. This automatic chaining enables end-to-end processing pipelines without manual intervention.

Workflow chaining respects namespace boundaries and configuration settings. Each workflow in the chain operates within its namespace context, applying namespace-specific configurations. This ensures that processing pipelines remain isolated and customizable per organization.

Effective Usage Principles

Workflow configuration should align with organizational processing requirements and resource constraints. Timeout settings should account for expected processing times while preventing workflows from running indefinitely. Retry policies should balance reliability with resource consumption, retrying transient failures without exhausting resources on permanent errors.

Monitoring workflow execution and node statuses enables proactive identification of processing issues. Failed workflows should be investigated to understand root causes, whether configuration issues, resource constraints, or external service problems. Status information provides the visibility needed to diagnose and resolve processing problems.

Workflow design should consider idempotency and retryability. Activities should be designed to handle repeated execution safely, enabling workflows to retry failed operations without causing duplicate processing or data corruption. This design enables reliable processing even when failures occur.

Relationship to Other Concepts

Workflows are triggered by the rules and triggers system, responding to events in the node lifecycle. They operate within namespace boundaries, ensuring that processing configurations and statuses are isolated per organization. Workflows process nodes, updating their content, embeddings, and metadata as processing progresses.

Workflows integrate with the status tracking system, updating node statuses throughout execution. They use configuration scoped by namespace, enabling organizational customization of processing behavior. Workflows embody the bonsai philosophy by enabling selective, configurable processing that adapts to organizational needs.

Knowledge Platform Transformation

The workflow system enables MatsuDB to transform static documents into living, queryable knowledge through automated processing pipelines. Without workflows, documents would remain unprocessed, and nodes would lack embeddings and enrichment. With workflows, the system automatically extracts structure, generates semantic representations, and enriches content, enabling the sophisticated querying and analysis capabilities that make MatsuDB a knowledge platform.

Definition​

Core Philosophy: Orchestrated Processing​

Workflow Types​

Workflow Lifecycle​

Workflow Chaining​

Effective Usage Principles​

Relationship to Other Concepts​