AXIOM

Why We Built AXIOM

The motivation, science, and engineering behind hyperdimensional computing for modern AI systems. What it is, how it works, and where it fits.

The Problem with Traditional Embeddings

Modern NLP relies on embedding models (Word2Vec, BERT, OpenAI ada-002, Cohere Embed) to convert text into dense floating-point vectors. These work well but come with significant trade-offs:

Expensive to Generate

Dense embeddings require GPU compute or paid API calls. OpenAI charges per token. At scale (millions of documents), costs add up fast. You pay every time you re-embed.

Opaque and Uninterpretable

Each dimension in a dense embedding has no clear meaning. You cannot inspect a vector and understand what it represents. Debugging is guesswork.

Vendor Lock-in

Your vectors only work with the model that generated them. Switch providers? Re-embed your entire corpus. Provider goes down? Your search breaks.

Non-Deterministic

Some embedding models produce slightly different vectors for the same input across different API versions or hardware. This breaks caching and reproducibility.

What is Hyperdimensional Computing

Hyperdimensional Computing (HDC) is a brain-inspired computing paradigm that represents information as high-dimensional binary vectors called hypervectors. Instead of learning representations from data (like neural networks do), HDC constructs them using algebraic operations.

The Key Insight

In very high dimensions (10,000+), randomly generated vectors are nearly orthogonal to each other. This means:

  • You can combine many vectors without interference (they don't collide)
  • You can retrieve original components from a combined vector
  • Similarity between vectors directly reflects semantic similarity
  • Operations are purely binary (XOR, shift) -- extremely fast on any CPU

AXIOM Specifics

Dimensions

10,048

bits per vector

Storage

1,256

bytes per vector (157 u64 words)

Encoding

Binary

each dimension is 0 or 1

How AXIOM Encodes Text

AXIOM uses character-level compositional encoding. Here is how text becomes a vector, step by step:

1

Character Atoms

Each printable ASCII character (a-z, 0-9, punctuation, space) gets a unique random 10,048-bit vector called an "atom." These are generated once and remain fixed.

atom('a') = [1,0,1,1,0,0,1,0,1,1,...] (10,048 random bits)
atom('b') = [0,1,0,1,1,0,0,1,1,0,...] (10,048 random bits)
atom('c') = [1,1,0,0,1,1,0,1,0,0,...] (10,048 random bits)
// 95 total atoms for printable ASCII
2

Position Encoding (Permute)

Each character is cyclically rotated by its position index. This encodes word order -- "cat" and "act" produce different vectors because the same characters are at different positions.

For "cat":
  position 1: permute(atom('c'), 1)  → shift all bits right by 1
  position 2: permute(atom('a'), 2)  → shift all bits right by 2
  position 3: permute(atom('t'), 3)  → shift all bits right by 3

For "act":
  position 1: permute(atom('a'), 1)  → different from atom('a') at position 2
  position 2: permute(atom('c'), 2)  → different result
  position 3: permute(atom('t'), 3)  → same as "cat" position 3
3

Binding (XOR)

All positioned character vectors are XOR-combined into a single vector. XOR is the fundamental binding operation in HDC -- it creates a new vector that is dissimilar to both inputs but can be reversed.

vector("cat") = permute(atom('c'), 1) XOR
                permute(atom('a'), 2) XOR
                permute(atom('t'), 3)

Properties of XOR binding:
  A XOR B XOR B = A          (self-inverse: you can recover A)
  A XOR B ≠ A and ≠ B        (result is orthogonal to inputs)
  A XOR B = B XOR A           (commutative)
4

Result: A Deterministic Fingerprint

The final vector is a 10,048-bit binary fingerprint of the input text. It captures character composition, character order, and n-gram patterns. The same input always produces the exact same vector. Similar texts produce similar vectors.

Vector Operations

HDC defines three fundamental operations. AXIOM uses all three internally and they are the building blocks for any hyperdimensional application.

BIND (XOR)

Associates two concepts together. The result is dissimilar to both inputs. Self-inverse: binding twice with the same vector undoes the operation.

Use cases:
  - Associate a word with its position: bind(word, position)
  - Create key-value pairs: bind(key, value)
  - Encode relationships: bind(subject, role)

Math: C = A XOR B
  similarity(C, A) ≈ 0.5  (orthogonal -- C hides A)
  similarity(C, B) ≈ 0.5  (orthogonal -- C hides B)
  A = C XOR B              (recover A from C using B)

PERMUTE (Cyclic Shift)

Rotates all bits cyclically. Encodes sequence position. Each shift amount produces a vector orthogonal to the original.

Use cases:
  - Encode word order: permute(word, position_index)
  - Encode time steps: permute(event, timestamp)
  - Create sequence representations

Math: B = permute(A, n)
  similarity(A, B) ≈ 0.5   (different shift = orthogonal)
  A = unpermute(B, n)       (perfectly reversible)

BUNDLE (Superposition)

Combines multiple vectors using majority voting. The result is similar to all inputs. This is how you represent sets or collections.

Use cases:
  - Represent a set of concepts: bundle(concept1, concept2, concept3)
  - Aggregate document vectors: bundle(sentence1, sentence2, ...)
  - Create category prototypes: bundle(all examples of "sports")

Math: C = bundle(A, B, D)
  similarity(C, A) > 0.55   (C is similar to A)
  similarity(C, B) > 0.55   (C is similar to B)
  similarity(C, D) > 0.55   (C is similar to D)
  Can bundle ~20-30 vectors before signal degrades

How Similarity Works

AXIOM uses normalized Hamming distance to compute similarity between two binary hypervectors:

similarity = 1.0 - (differing_bits / total_bits)

Algorithm:
  1. XOR the two vectors: diff = vector_a XOR vector_b
  2. Count set bits in diff: differing_bits = popcount(diff)
  3. Normalize: similarity = 1.0 - (differing_bits / 10048)

Example:
  If 2,000 out of 10,048 bits differ:
  similarity = 1.0 - (2000 / 10048) = 0.801

  If 5,024 bits differ (half):
  similarity = 1.0 - (5024 / 10048) = 0.500 (orthogonal/random)

Performance:
  - XOR + popcount on 157 u64 words = ~157 CPU operations
  - Sub-microsecond per comparison
  - No floating-point math needed

AXIOM vs Traditional Embeddings

AspectAXIOM HDCOpenAI / BERT / Cohere
TypeBinary (0/1 per bit)Dense floats (32-bit per dim)
Dimensions10,048 bits768 - 4,096 floats
Vector Size1,256 bytes3,072 - 16,384 bytes
TrainingNone requiredMassive datasets + GPU
ComputeCPU only (XOR + popcount)GPU required for generation
DeterministicAlways (same input = same output)Varies by provider/version
Latency< 5ms per encode50-500ms per API call
CostSelf-hosted, no per-token fee$0.02-0.10 per 1M tokens
Vendor Lock-inNoneTied to embedding model
InterpretabilityCompositional (inspect operations)Black box
Semantic QualityGood (character-level)Excellent (contextual)
ComposabilityAlgebraic (bind, permute, bundle)Concatenation only
Offline CapableYes (no network needed)Requires API calls

When to Use AXIOM

Semantic Search

Index documents once, search by meaning. Perfect for knowledge bases, FAQs, documentation sites. Encode your corpus, store vectors, compare at query time.

RAG Pipelines

Retrieve relevant context for LLMs. Encode documents and queries into vectors, find top-K matches, pass to GPT/Claude as context. Deterministic retrieval.

Content Deduplication

Find near-duplicate documents, articles, or support tickets. Compare vectors pairwise, flag pairs above a similarity threshold.

Recommendation Systems

Recommend similar content based on semantic similarity. No cold start -- new items can be encoded instantly without retraining.

Document Clustering

Group similar documents using vector similarity. Use K-means or hierarchical clustering on AXIOM vectors.

Text Classification

Classify text by comparing against prototype vectors for each category. Bundle examples of each class, compare new text against prototypes.

Edge / Offline Deployment

Run on devices without internet. AXIOM is CPU-only, no GPU, no cloud API needed. Encode locally on mobile or IoT devices.

Privacy-Preserving NLP

Process text locally without sending to third-party APIs. No data leaves your infrastructure. Vectors don't leak the original text.

When NOT to Use AXIOM

AXIOM is not a replacement for large language models or deep embedding models. Here is where traditional approaches are better:

Deep Semantic Understanding

If you need to understand that "the bank of the river" and "the financial bank" are different meanings of "bank," use contextual embeddings (BERT, GPT). AXIOM encodes at character level and doesn't capture deep contextual meaning.

Cross-Language Similarity

AXIOM works on character sequences. "cat" in English and "gato" in Spanish produce dissimilar vectors. For multilingual applications, use multilingual embedding models.

Synonym Understanding

"automobile" and "car" share no characters and will have low similarity in AXIOM. If synonym matching is critical, use trained embeddings that learn semantic equivalence.

Architecture and Design Principles

Stateless Compute

AXIOM operates as a pure function: text in, vector out. No sessions, no database, no state to manage. This enables:

Horizontal scaling: add instances freely
Zero-downtime deployments
Any instance handles any request
No data migration between versions

Deterministic by Design

The same text input always produces the exact same vector output. This is fundamental: you can cache vectors, reproduce results, version control your embeddings, and distribute computation across nodes without synchronization.

Binary Efficiency

All operations are bitwise (XOR, shift, popcount). This means:

  • No floating-point math anywhere in the pipeline
  • CPU-only -- no GPU required
  • 157 XOR operations to compare two vectors (sub-microsecond)
  • 1,256 bytes per vector vs 3-16 KB for dense embeddings

Use Cases with Code

content_classifier.py -- Zero-shot text classification
from axiom_client import AxiomClient

axiom = AxiomClient("https://your-instance.com", "your-api-key")

# Step 1: Define categories with example texts
categories = {
    "sports": [
        "football match score",
        "basketball championship",
        "tennis tournament results",
    ],
    "technology": [
        "software development tools",
        "artificial intelligence research",
        "cloud computing infrastructure",
    ],
    "finance": [
        "stock market trading",
        "investment portfolio management",
        "cryptocurrency exchange rates",
    ],
}

# Step 2: Encode category prototypes
# (Each prototype is the "average" of its examples)
prototypes = {}
for category, examples in categories.items():
    batch = axiom.batch_encode(examples)
    # Store all example vectors for this category
    prototypes[category] = [v["vector"] for v in batch["vectors"]]

# Step 3: Classify new text
def classify(text: str) -> tuple[str, float]:
    """Classify text into the best matching category."""
    query_vec = axiom.encode(text)["vector"]

    best_category = ""
    best_score = 0.0

    for category, example_vecs in prototypes.items():
        # Average similarity across all examples in category
        scores = [axiom.similarity(query_vec, ev) for ev in example_vecs]
        avg_score = sum(scores) / len(scores)

        if avg_score > best_score:
            best_score = avg_score
            best_category = category

    return best_category, best_score

# Test
category, confidence = classify("new GPU released for gaming")
print(f"Category: {category}, Confidence: {confidence:.4f}")
# Output: Category: technology, Confidence: 0.5823
faq_bot.ts -- FAQ matching chatbot
// FAQ Chatbot: match user questions to known answers

interface FAQ {
  question: string;
  answer: string;
  vector: number[];
}

const faqData = [
  { q: "How do I reset my password?", a: "Go to Settings > Security > Reset Password." },
  { q: "What payment methods do you accept?", a: "We accept Visa, Mastercard, and PayPal." },
  { q: "How do I cancel my subscription?", a: "Go to Settings > Billing > Cancel Plan." },
  { q: "What is your refund policy?", a: "Full refund within 30 days of purchase." },
  { q: "How do I contact support?", a: "Email support@example.com or use live chat." },
];

// Index: encode all FAQ questions
async function indexFAQs(): Promise<FAQ[]> {
  const batch = await fetch("/api/batch-encode", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({ texts: faqData.map((f) => f.q) }),
  }).then((r) => r.json());

  return batch.vectors.map((v: any, i: number) => ({
    question: faqData[i].q,
    answer: faqData[i].a,
    vector: v.vector,
  }));
}

// Search: find best matching FAQ
async function findAnswer(userQuestion: string, faqs: FAQ[]): Promise<string> {
  const queryVec = await fetch("/api/encode", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({ text: userQuestion }),
  }).then((r) => r.json());

  let bestMatch = { answer: "Sorry, I don't have an answer for that.", score: 0 };

  for (const faq of faqs) {
    const { similarity } = await fetch("/api/similarity", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ vector_a: queryVec.vector, vector_b: faq.vector }),
    }).then((r) => r.json());

    if (similarity > bestMatch.score) {
      bestMatch = { answer: faq.answer, score: similarity };
    }
  }

  // Only return answer if confidence is high enough
  if (bestMatch.score < 0.55) {
    return "I'm not sure about that. Please contact support.";
  }

  return bestMatch.answer;
}

// Usage
const faqs = await indexFAQs();
const answer = await findAnswer("how can I change my password?", faqs);
console.log(answer); // "Go to Settings > Security > Reset Password."
content_recommender.py -- Content recommendation engine
from axiom_client import AxiomClient

axiom = AxiomClient("https://your-instance.com", "your-api-key")

# Article database
articles = [
    {"id": 1, "title": "Introduction to Machine Learning", "content": "..."},
    {"id": 2, "title": "Deep Learning with PyTorch", "content": "..."},
    {"id": 3, "title": "Natural Language Processing Basics", "content": "..."},
    {"id": 4, "title": "Computer Vision with CNNs", "content": "..."},
    {"id": 5, "title": "Reinforcement Learning Explained", "content": "..."},
    {"id": 6, "title": "Web Development with React", "content": "..."},
    {"id": 7, "title": "Database Design Patterns", "content": "..."},
]

# Encode all article titles
batch = axiom.batch_encode([a["title"] for a in articles])
for i, v in enumerate(batch["vectors"]):
    articles[i]["vector"] = v["vector"]

def recommend(article_id: int, top_k: int = 3) -> list:
    """Recommend similar articles based on title similarity."""
    source = next(a for a in articles if a["id"] == article_id)

    results = []
    for article in articles:
        if article["id"] == article_id:
            continue
        sim = axiom.similarity(source["vector"], article["vector"])
        results.append({"title": article["title"], "score": sim})

    results.sort(key=lambda x: x["score"], reverse=True)
    return results[:top_k]

# "Readers who liked 'Introduction to Machine Learning' also liked:"
recs = recommend(1)
for r in recs:
    print(f"  {r['score']:.4f} | {r['title']}")
# Output:
#   0.6234 | Deep Learning with PyTorch
#   0.5891 | Reinforcement Learning Explained
#   0.5654 | Natural Language Processing Basics

Start Building