Position

Universal Location System

"Where" proves as fundamental as "what." Matsu Position provides a universal, multimodal abstraction for describing location across vastly different coordinate systems: pages, documents, hierarchies, temporal flows, and beyond.

Definition

An image occupies pixel coordinates while residing at a tree depth. A cell claims row-column address and spatial extent. A sentence has character offsets and hierarchical level. These varied expressions of "where" resist reduction to a single form, yet systems must navigate between them fluidly. Position transcends metadata—it's essential for context, relationships, navigation, and connections. Rich positional information transforms abstract nodes into situated knowledge that knows its origins and neighbors.

Core Philosophy: Multimodal Representation

Information exists in multiple coordinate systems simultaneously. A figure has page coordinates, logical structure position, sequence placement, and relationships to surrounding text—all valid simultaneously. This enables fluid movement between location modes: use spatial coordinates when finding nearby content, hierarchical position when understanding document role, ordering relationships when seeking the next in sequence. The consistent interface makes the coordinate system transparent.

Human Cognition Alignment

This approach aligns with human cognition. "Third paragraph on page five" plus "methodology section" plus "after the blue chart" plus "near the beginning"—these framings coexist naturally. Matsu Position mirrors human and agent understanding rather than imposing artificial constraints.

Universal Structure

Component	Purpose	Examples
Type	Declares coordinate system	`bbox`, `cell`, `line`, `hierarchical`
Path	Integer sequence tracing route	`[1, 2, 3]`, `[7, 3, 2]`
Geometry	Spatial extent when relevant	Bounding boxes `(x0,y0,x1,y1)`, cell coordinates `(row,col)`, line ranges `(start,end)`

The same mechanism locates pixel regions, spreadsheet cells, video frames, document elements, and genomic sequences. Consider a gene node at path [7, 3, 2] with geometric data encoding base pair start/end positions. Systems manipulate typed positions without understanding every coordinate system's specifics. A heading exists in the document tree and occupies page coordinates, enabling both structural and spatial queries simultaneously.

Position Types

Bounding Box
Cell
Line

Structure: (x0, y0, x1, y1) + page_number

Use Cases: Visual elements on 2D surfaces

Operations: Overlap, distance, containment, spatial relationships

Bounding boxes represent rectangular regions on pages, enabling spatial queries and layout analysis.

Structure: sheet_name + row + col

Use Cases: Tabular data grids

Operations: Sum columns, compare rows, trace formulas

Cell positions enable spreadsheet-style navigation and operations on tabular data.

Structure: line_number + column_start + column_stop

Use Cases: Code, structured text

Operations: Version tracking, diff operations, nested structure

Line positions enable precise text location and code analysis.

Queries and Operations

Spatial queries answer questions about geometric relationships: distance, overlap, containment, proximity. Users or agents think in terms of "chart in upper left" or "text next to image," and the system supports these spatial intuitions. This enables layout analysis including column detection, reading order determination, and alignment recognition.

Structural queries enable descendant finding, sibling identification, and ancestor tracing, all without tree traversal. This respects the document's intended organization. Combined with spatial queries, it creates multidimensional navigation.

Temporal positions encode time offsets for video frames, timestamps for versions, and sequence numbers for streams, enabling time-aware navigation that tracks layout changes across versions and observes structural evolution.

Multiple Positions Per Node

A single node can have multiple positions simultaneously, each representing a different coordinate system. This enables flexible navigation and querying across different dimensions.

Applications

Rich positional information enables faithful document reconstruction. Positions serve as assembly instructions, placing content back where it originally appeared while respecting layout, organization, and sequence. This enables traditional document views despite internal decomposition into nodes.

Positional context determines processing workflows. Content near images receives different treatment than isolated text, aligning with the bonsai philosophy of selective treatment based on location. Cross-document correlation emerges when similar positions across documents reveal templates, standards, and patterns. Layout conventions and structural regularities become visible at the population level.

Relationship to Nodes

Positions and nodes form an inseparable partnership. Multiple positions per node situate it in different coordinate systems—not mere metadata, but fundamental grounding. Without positions, nodes float in informational space, connected only through parent-child and sequential relationships. With positions, nodes gain spatial, geometric, and structural grounding. They know what they contain, what contains them, and where they exist across multiple dimensions.

The position framework bridges abstract information and concrete instantiation, embedding the node graph in multiple coordinate spaces. Positions give nodes their place in the world.

Learn More

To understand how nodes use positions, see the Node concept documentation. Positions are also namespace-scoped, ensuring spatial queries operate within namespace boundaries.

Definition​

Core Philosophy: Multimodal Representation​

Universal Structure​

Position Types​

Queries and Operations​

Applications​

Relationship to Nodes​