Position
"Where" proves as fundamental as "what." Matsu Position provides a universal, multimodal abstraction for describing location across vastly different coordinate systems: pages, documents, hierarchies, temporal flows, and beyond.
Definition
An image occupies pixel coordinates while residing at a tree depth. A cell claims row-column address and spatial extent. A sentence has character offsets and hierarchical level. These varied expressions of "where" resist reduction to a single form, yet systems must navigate between them fluidly. Position transcends metadata—it's essential for context, relationships, navigation, and connections. Rich positional information transforms abstract nodes into situated knowledge that knows its origins and neighbors.
Core Philosophy: Multimodal Representation
Information exists in multiple coordinate systems simultaneously. A figure has page coordinates, logical structure position, sequence placement, and relationships to surrounding text—all valid simultaneously. This enables fluid movement between location modes: use spatial coordinates when finding nearby content, hierarchical position when understanding document role, ordering relationships when seeking the next in sequence. The consistent interface makes the coordinate system transparent.
This approach aligns with human cognition. "Third paragraph on page five" plus "methodology section" plus "after the blue chart" plus "near the beginning"—these framings coexist naturally. Matsu Position mirrors human and agent understanding rather than imposing artificial constraints.
Universal Structure
| Component | Purpose | Examples |
|---|---|---|
| Type | Declares coordinate system | bbox, cell, line, hierarchical |
| Path | Integer sequence tracing route | [1, 2, 3], [7, 3, 2] |
| Geometry | Spatial extent when relevant | Bounding boxes (x0,y0,x1,y1), cell coordinates (row,col), line ranges (start,end) |
The same mechanism locates pixel regions, spreadsheet cells, video frames, document elements, and genomic sequences. Consider a gene node at path [7, 3, 2] with geometric data encoding base pair start/end positions. Systems manipulate typed positions without understanding every coordinate system's specifics. A heading exists in the document tree and occupies page coordinates, enabling both structural and spatial queries simultaneously.
Position Types
- Bounding Box
- Cell
- Line
Structure: (x0, y0, x1, y1) + page_number
Use Cases: Visual elements on 2D surfaces
Operations: Overlap, distance, containment, spatial relationships
Bounding boxes represent rectangular regions on pages, enabling spatial queries and layout analysis.
Structure: sheet_name + row + col
Use Cases: Tabular data grids
Operations: Sum columns, compare rows, trace formulas
Cell positions enable spreadsheet-style navigation and operations on tabular data.
Structure: line_number + column_start + column_stop
Use Cases: Code, structured text
Operations: Version tracking, diff operations, nested structure
Line positions enable precise text location and code analysis.
Queries and Operations
Spatial queries answer questions about geometric relationships: distance, overlap, containment, proximity. Users or agents think in terms of "chart in upper left" or "text next to image," and the system supports these spatial intuitions. This enables layout analysis including column detection, reading order determination, and alignment recognition.
Structural queries enable descendant finding, sibling identification, and ancestor tracing, all without tree traversal. This respects the document's intended organization. Combined with spatial queries, it creates multidimensional navigation.
Temporal positions encode time offsets for video frames, timestamps for versions, and sequence numbers for streams, enabling time-aware navigation that tracks layout changes across versions and observes structural evolution.
A single node can have multiple positions simultaneously, each representing a different coordinate system. This enables flexible navigation and querying across different dimensions.
Applications
Rich positional information enables faithful document reconstruction. Positions serve as assembly instructions, placing content back where it originally appeared while respecting layout, organization, and sequence. This enables traditional document views despite internal decomposition into nodes.
Positional context determines processing workflows. Content near images receives different treatment than isolated text, aligning with the bonsai philosophy of selective treatment based on location. Cross-document correlation emerges when similar positions across documents reveal templates, standards, and patterns. Layout conventions and structural regularities become visible at the population level.
Relationship to Nodes
Positions and nodes form an inseparable partnership. Multiple positions per node situate it in different coordinate systems—not mere metadata, but fundamental grounding. Without positions, nodes float in informational space, connected only through parent-child and sequential relationships. With positions, nodes gain spatial, geometric, and structural grounding. They know what they contain, what contains them, and where they exist across multiple dimensions.
The position framework bridges abstract information and concrete instantiation, embedding the node graph in multiple coordinate spaces. Positions give nodes their place in the world.