OMNI-CORE LogoOMNI-CORE
omni-mdxomni-3D (soon)Open SourceAbout
GitHubDocumentation
OMNI-CORE

Knowledge must flow freely to shape the future.

Ecosystem

  • omni-mdx
  • omni-3D

Resources

  • Documentation
  • Interactive Playground

Legal & Open Source

  • GitHub Organization
  • Notice

TOAQ GROUP © 2024 - 2026

Released under the MIT License.

Navigation

Getting Started

  • Introduction
    • Web & Next.js
    • Python Engine
    • Build from Source
  • Syntax Guide

Web Integration

  • Next.js Integration
  • Binary AST Transfer
  • Custom Components
  • Unified & Plugins Ecosystem Integration
    • Basic App Router
    • Advanced Rendering
    • Live Client Editor

Python

  • Introduction & Core Engine
    • Basic Parsing & Traversal
    • Advanced Analysis & RAG
    • Native Qt Rendering
    • HTML & Web Rendering
    • Basic Parsing
    • Advanced Analysis
    • HTML Rendering
    • Qt Rendering

Architecture & Core

    • Design Philosophy
    • The Rendering Pipeline
    • Lexing & Tokenization
    • AST Node Design
    • Math & JSX Handling
    • Protocol Specification
    • Zero-Copy Decoding
    • Memory Lifecycle
    • WASM Bindings (Browser)
    • Node.js Native Addons
    • Python Bindings (PyO3)
  • Security
    • Benchmarks
    • Fuzzing Results
Docs
Architecture
Parser
AST Node Design

AST Node Design

Last Updated March 24, 2026

Once the lexer has tokenized the input string, the parser’s job is to build the Abstract Syntax Tree (AST).

In Omni-Core, the AST is the ultimate source of truth. It is a strictly typed, recursive data structure in Rust that guarantees every host language (Python, TypeScript) receives exactly the same logical representation of the document.

AST Representation
$$ E=mc^2 $$
AstNode
node_type: "root"
children: [
AstNode
node_type: "BlockMath"
content: null
children: [
AstNode Virtual Child
node_type: "text"
content: "E=mc^2"
children: []
]
]

The Unified AstNode Structure

To keep the OCP Binary Protocol fast and the memory layout predictable, Omni-Core uses a single, unified AstNode struct for everything—from paragraphs to JSX components and mathematical formulas.

rust
// The core Rust structure
pub struct AstNode {
    pub node_type: String,                     // e.g., "p", "BlockMath", "MyComponent"
    pub content: Option,               // Only populated for raw "text" nodes
    pub attributes: HashMap, // JSX or HTML props
    pub children: Vec,                // Nested elements
    pub self_closing: bool,                    // e.g., 
}

This structural simplicity is powerful. A high-level SDK only needs to implement one deserialization function to traverse the entire document.

The “Virtual Text Child” Pattern

Handling complex formats like LaTeX within MDX introduces a significant architectural challenge: Where should the math string live in the AST ?

Early versions of Omni-MDX stored the formula directly in the content field of the InlineMath or BlockMath node. However, this led to critical bugs in tree-walking algorithms (like word counters or summary extractors). If a script recursively asked for the text of all children, it would skip the math nodes because it expected text to only live inside node_type: "text".

To solve this, Omni-Core enforces the Virtual Text Child pattern:
Non-text nodes (like BlockMath or JSX components) never use their own content field to store raw inner text. Instead, the Rust parser automatically injects a virtual child node of type "text".

ℹ️ Information

If you write $$ E=mc^2 $$, the AST does not look like:
{ type: "BlockMath", content: "E=mc^2" }

It looks like this:
{ type: "BlockMath", children: [ { type: "text", content: "E=mc^2" } ] }

Universal Traversal

Because of this architectural rule, writing a text_content() function in Python or JavaScript becomes trivially simple and 100% reliable:

python
def text_content(node):
    # Base case: it's a pure text node
    if node.node_type == "text":
        return node.content or ""
    
    # Recursive case: gather text from all children
    return "".join(text_content(child) for child in node.children)

Whether the node is a standard p tag, a complex <Accordion> component, or a BlockMath formula, the recursive extraction works perfectly without needing special if/else checks for every new node type.

Attribute System

JSX attributes are stored in a typed HashMap. To support the full React ecosystem, AttrValue is an enum that can hold plain strings (prop="hello"), booleans (isActive), raw JS expressions (count={42}), or even entire nested ASTs if a component is passed as a prop !

Boosted by omni-mdx native node

On this page

  • The Unified Structure
  • The “Virtual Text Child” Pattern
  • Universal Traversal
  • Attribute System
Edit this page on GitHub

Caught a typo or want to improve the docs? Submitting a PR is the best way to help!