AXIOM

Integration Guide

AXIOM + AI Pipelines

AXIOM is a character-level similarity engine, not a semantic one. It encodes text into 10,048-bit binary vectors using trigram bundling — fast, deterministic, and CPU-only. It complements embeddings; it does not replace them.

What AXIOM adds to an AI pipeline

Semantic embeddings (OpenAI, Cohere, BERT) are expensive and slow to call. AXIOM runs in microseconds on a single CPU core and catches structurally similar text — typos, near-duplicates, and reworded copies — before you spend money on an embedding API. Think of AXIOM as the cheap pre-filter that shrinks the problem, with embeddings handling the semantic nuance on whatever remains.

< 5 ms
per encode
10,048-bit
binary vector
CPU-only
no GPU needed
Deterministic
same input = same output

Real similarity scores

Measured from actual API responses. AXIOM captures surface-level character overlap, not meaning. Notice "dog" vs "cat" is essentially random noise.

Input AInput BScoreWhat this means
hello worldhello world1.0000Exact match
hello worldhello worl0.8649One char dropped
the cat sat on the matthe cat sat on the hat0.8547One word differs
machine learningmachine learning model0.8252Added word shares trigrams
rust programmingpython programming0.7481Shared suffix, different prefix
dogcat0.5012Effectively random — no semantic overlap

Integration Patterns

Browse the patterns below — pick the ones relevant to your stack. Each includes a complete code example and honest guidance on when it helps and when it does not.

Pattern 1

Pre-filtering for RAG

Use AXIOM to deduplicate and near-match before calling an embedding API — so you only embed the candidates that survive the structural filter.

When to use

  • — High-volume RAG with many similar documents
  • — When embedding API cost is a concern
  • — User queries are textually close to stored docs

When NOT to use

  • — Queries use synonyms or paraphrases
  • — Multilingual retrieval
  • — Intent-based or conceptual queries

rag_prefilter.ts

// RAG pre-filter: AXIOM first, embedding API only for survivors

const AXIOM_URL = "https://api.axiom.dev";
const AXIOM_KEY = process.env.AXIOM_API_KEY!;

async function axiomEncode(text: string): Promise<number[]> {
  const res = await fetch(`${AXIOM_URL}/api/encode`, {
    method: "POST",
    headers: { "Content-Type": "application/json", "X-API-Key": AXIOM_KEY },
    body: JSON.stringify({ text }),
  });
  return (await res.json()).vector;
}

async function axiomSimilarity(a: number[], b: number[]): Promise<number> {
  const res = await fetch(`${AXIOM_URL}/api/similarity`, {
    method: "POST",
    headers: { "Content-Type": "application/json", "X-API-Key": AXIOM_KEY },
    body: JSON.stringify({ vector_a: a, vector_b: b }),
  });
  return (await res.json()).similarity;
}

// ─────────────────────────────────────────────────────────────────────
// Step 1: At index time, encode every document with AXIOM (fast & free)
// Step 2: At query time, AXIOM narrows 10,000 docs → 50 candidates
// Step 3: Only call the embedding API for those 50 candidates
// ─────────────────────────────────────────────────────────────────────

interface Doc { id: string; text: string; axiomVector: number[] }

async function buildIndex(docs: { id: string; text: string }[]): Promise<Doc[]> {
  const res = await fetch(`${AXIOM_URL}/api/batch-encode`, {
    method: "POST",
    headers: { "Content-Type": "application/json", "X-API-Key": AXIOM_KEY },
    body: JSON.stringify({ texts: docs.map(d => d.text) }),
  });
  const { vectors } = await res.json();
  return docs.map((d, i) => ({ ...d, axiomVector: vectors[i].vector }));
}

async function prefilter(
  query: string,
  index: Doc[],
  topK = 50,
  threshold = 0.62
): Promise<Doc[]> {
  const queryVec = await axiomEncode(query);

  const scored = await Promise.all(
    index.map(async doc => ({
      doc,
      score: await axiomSimilarity(queryVec, doc.axiomVector),
    }))
  );

  return scored
    .filter(s => s.score >= threshold)
    .sort((a, b) => b.score - a.score)
    .slice(0, topK)
    .map(s => s.doc);
}

// Usage
const index = await buildIndex(myDocs);  // do once at startup

const candidates = await prefilter("how do I reset my password?", index);
console.log(`AXIOM narrowed ${myDocs.length} docs → ${candidates.length} candidates`);
// "AXIOM narrowed 10000 docs → 12 candidates"

// Now call your embedding API ONLY for the 12 survivors
const embeddings = await openai.embeddings.create({
  model: "text-embedding-3-small",
  input: candidates.map(c => c.text),
});
// 10,000x fewer embedding calls

Expected output

AXIOM narrowed 10000 docs → 12 candidates

# Embedding API called 12 times instead of 10,000

Pattern 2

Training Data Deduplication

Clean near-duplicate examples from fine-tuning datasets before training. Duplicate data inflates loss curves and wastes GPU hours.

When to use

  • — Datasets scraped from the web (high redundancy)
  • — RLHF data with repeated human-written prompts
  • — Any dataset larger than 10k examples

When NOT to use

  • — Paraphrase datasets (different surface, same meaning)
  • — When you want stylistic diversity regardless of overlap

dedup_dataset.py

import requests
from itertools import combinations

AXIOM_URL = "https://api.axiom.dev"
HEADERS = {"X-API-Key": "your-api-key", "Content-Type": "application/json"}


def batch_encode(texts: list[str]) -> list[list[int]]:
    """Encode texts in chunks of 50 (API batch limit)."""
    all_vectors = []
    for i in range(0, len(texts), 50):
        chunk = texts[i : i + 50]
        r = requests.post(
            f"{AXIOM_URL}/api/batch-encode",
            headers=HEADERS,
            json={"texts": chunk},
        )
        all_vectors.extend(v["vector"] for v in r.json()["vectors"])
    return all_vectors


def get_similarity(vec_a: list[int], vec_b: list[int]) -> float:
    r = requests.post(
        f"{AXIOM_URL}/api/similarity",
        headers=HEADERS,
        json={"vector_a": vec_a, "vector_b": vec_b},
    )
    return r.json()["similarity"]


def deduplicate(examples: list[dict], threshold: float = 0.82) -> list[dict]:
    """
    Remove near-duplicate training examples.
    threshold=0.82 catches typos and minor rewording while
    preserving genuinely different phrasings.
    """
    texts = [ex["prompt"] for ex in examples]
    vectors = batch_encode(texts)

    duplicates: set[int] = set()
    for (i, vi), (j, vj) in combinations(enumerate(vectors), 2):
        if i in duplicates or j in duplicates:
            continue
        sim = get_similarity(vi, vj)
        if sim >= threshold:
            print(f"Dup (sim={sim:.4f})")
            print(f"  KEEP   [{i}]: {texts[i][:70]}")
            print(f"  REMOVE [{j}]: {texts[j][:70]}")
            duplicates.add(j)

    kept = [ex for i, ex in enumerate(examples) if i not in duplicates]
    print(f"\nRemoved {len(duplicates)} duplicates.")
    print(f"Dataset: {len(examples)} → {len(kept)} examples")
    return kept


# ── Usage ──────────────────────────────────────────────────────────────
import json

with open("raw_finetune.jsonl") as f:
    raw = [json.loads(line) for line in f]

clean = deduplicate(raw, threshold=0.82)

with open("clean_finetune.jsonl", "w") as f:
    for ex in clean:
        f.write(json.dumps(ex) + "\n")

Expected output

Dup (sim=0.8547)

KEEP [0]: the cat sat on the mat

REMOVE [4]: the cat sat on the hat

Removed 312 duplicates.

Dataset: 5000 → 4688 examples

Pattern 4

Cache Key Generation

AXIOM vectors are deterministic — identical input always produces the same bit pattern. Use this as a cache key to avoid re-embedding identical or near-identical inputs.

When to use

  • — High-traffic LLM endpoints with repeated queries
  • — Avoiding duplicate embedding API charges
  • — Response caching for chatbots

When NOT to use

  • — Queries where context or user identity matters
  • — When you need exact-string matching only (a plain hash is simpler)

cache_key.py

import requests
import hashlib
import redis

AXIOM_URL = "https://api.axiom.dev"
HEADERS = {"X-API-Key": "your-api-key", "Content-Type": "application/json"}

cache = redis.Redis(host="localhost", port=6379, db=0)


def axiom_vector(text: str) -> list[int]:
    r = requests.post(
        f"{AXIOM_URL}/api/encode",
        headers=HEADERS,
        json={"text": text},
    )
    return r.json()["vector"]


def vector_to_cache_key(vector: list[int]) -> str:
    """
    Hash the binary vector into a short cache key.
    Identical inputs → identical vectors → identical keys.
    """
    raw = bytes(vector)
    return hashlib.sha256(raw).hexdigest()[:32]


def get_similarity(va: list[int], vb: list[int]) -> float:
    r = requests.post(
        f"{AXIOM_URL}/api/similarity",
        headers=HEADERS,
        json={"vector_a": va, "vector_b": vb},
    )
    return r.json()["similarity"]


def cached_embed_and_respond(query: str, embed_fn, llm_fn) -> str:
    """
    1. Encode with AXIOM (fast, deterministic)
    2. Check cache for an exact or near-identical query
    3. On miss: call embed_fn + llm_fn, store in cache
    """
    vec = axiom_vector(query)
    exact_key = vector_to_cache_key(vec)

    # Exact cache hit (same query, same vector, same key)
    cached = cache.get(exact_key)
    if cached:
        print(f"Cache HIT (exact): {exact_key[:12]}...")
        return cached.decode()

    # Near-duplicate check: scan recent cache entries
    # (in production, use a small in-memory recent-vector store)
    recent_keys = cache.keys("vec:*")
    for rk in recent_keys[:200]:          # check last 200 unique queries
        stored_vec = list(cache.hget(rk, "vector") or b"")
        if not stored_vec:
            continue
        sim = get_similarity(vec, stored_vec)
        if sim >= 0.92:                   # nearly identical text
            cached_response = cache.hget(rk, "response")
            print(f"Cache HIT (near-dup, sim={sim:.4f})")
            return cached_response.decode()

    # Cache miss — call the expensive functions
    print("Cache MISS — calling embed + LLM")
    embedding = embed_fn(query)
    response  = llm_fn(query, embedding)

    # Store result
    cache.set(exact_key, response, ex=3600)
    cache.hset(f"vec:{exact_key}", mapping={
        "vector": bytes(vec),
        "response": response,
    })
    cache.expire(f"vec:{exact_key}", 3600)

    return response


# ── Usage ──────────────────────────────────────────────────────────────
r1 = cached_embed_and_respond("How do I reset my password?", embed, llm)
# Cache MISS — calling embed + LLM

r2 = cached_embed_and_respond("How do I reset my password?", embed, llm)
# Cache HIT (exact): a3f8b2c1d9e4...

r3 = cached_embed_and_respond("how do i resett my pasword?", embed, llm)
# Cache HIT (near-dup, sim=0.9312)

Expected output

Cache MISS — calling embed + LLM

Cache HIT (exact): a3f8b2c1d9e4...

Cache HIT (near-dup, sim=0.9312)

# Third call (typo) avoided both embed API + LLM call entirely

Pattern 5

Content Drift Monitoring

Compare new content batches against a stored baseline to detect when a document has changed significantly — useful for re-embedding triggers, version control, or data-quality alerts.

When to use

  • — Monitoring scraped web content for changes
  • — Deciding when to re-embed a document
  • — Detecting document tampering or unexpected edits

When NOT to use

  • — Detecting meaning shifts caused by replacing synonyms
  • — When you need diff-level granularity (use a diff tool)

drift_monitor.py

import requests
import json
from pathlib import Path

AXIOM_URL = "https://api.axiom.dev"
HEADERS = {"X-API-Key": "your-api-key", "Content-Type": "application/json"}
BASELINE_FILE = Path("baseline_vectors.json")


def encode(text: str) -> list[int]:
    r = requests.post(
        f"{AXIOM_URL}/api/encode", headers=HEADERS, json={"text": text}
    )
    return r.json()["vector"]


def similarity(va: list[int], vb: list[int]) -> float:
    r = requests.post(
        f"{AXIOM_URL}/api/similarity",
        headers=HEADERS,
        json={"vector_a": va, "vector_b": vb},
    )
    return r.json()["similarity"]


def save_baseline(docs: dict[str, str]) -> None:
    """Encode and store baseline vectors for a set of named documents."""
    baseline = {}
    for doc_id, text in docs.items():
        baseline[doc_id] = {"text": text[:120], "vector": encode(text)}
    BASELINE_FILE.write_text(json.dumps(baseline))
    print(f"Baseline saved: {len(baseline)} documents")


def check_drift(
    new_docs: dict[str, str],
    threshold: float = 0.85,
) -> list[dict]:
    """
    Compare current content against baseline.
    Returns documents that have drifted below the threshold.
    """
    baseline = json.loads(BASELINE_FILE.read_text())
    alerts = []

    for doc_id, new_text in new_docs.items():
        if doc_id not in baseline:
            alerts.append({"id": doc_id, "status": "new", "score": None})
            continue

        baseline_vec = baseline[doc_id]["vector"]
        new_vec = encode(new_text)
        score = similarity(baseline_vec, new_vec)

        status = "stable" if score >= threshold else "drifted"
        if status == "drifted":
            alerts.append({
                "id": doc_id,
                "status": status,
                "score": round(score, 4),
                "baseline_preview": baseline[doc_id]["text"][:60],
                "new_preview": new_text[:60],
            })
            print(f"DRIFT [{doc_id}] sim={score:.4f}")
            print(f"  was: {baseline[doc_id]['text'][:60]}")
            print(f"  now: {new_text[:60]}")

    return alerts


# ── Usage ──────────────────────────────────────────────────────────────

# Day 1: Save baseline
save_baseline({
    "pricing": "Our free tier includes 100 API calls per day.",
    "support": "Email us at support@example.com for help.",
    "terms":   "By using this service you agree to our terms.",
})

# Day 7: Compare new versions
alerts = check_drift({
    "pricing": "Our free tier includes 100 API calls per day.",     # unchanged
    "support": "Contact our support team via the in-app chat.",     # changed
    "terms":   "By using this platform you agree to our terms.",   # minor edit
})

# DRIFT [support] sim=0.6823
#   was: Email us at support@example.com for help.
#   now: Contact our support team via the in-app chat.

Expected output

Baseline saved: 3 documents

DRIFT [support] sim=0.6823

was: Email us at support@example.com for help.

now: Contact our support team via the in-app chat.

# "pricing" sim=1.0000 (unchanged, no alert)

# "terms" sim=0.8912 (above threshold, no alert)

Pattern 6

Edge / Offline Processing

Where there is no internet access, no GPU, and no cloud API budget — AXIOM runs on a single CPU core and needs no external dependencies.

When to use

  • — Air-gapped systems (healthcare, defence, finance)
  • — Mobile apps that cannot call an embedding API
  • — IoT devices or embedded systems
  • — Environments where data must not leave the device

When NOT to use

  • — When semantic understanding is required (embeddings are better)
  • — If you have GPU access and latency tolerances of 200 ms+

edge_search.py — self-hosted AXIOM, no internet required

# Self-hosted AXIOM on a local machine or Raspberry Pi.
# Requires: Docker + ~256 MB RAM. No GPU, no internet, no API key.
#
#   docker run -p 8080:8080 axiom-public:latest
#
# Then point all calls at http://localhost:8080

import requests

LOCAL_URL = "http://localhost:8080"
# No API key needed for self-hosted instance with no auth configured


def encode_local(text: str) -> list[int]:
    r = requests.post(
        f"{LOCAL_URL}/api/encode",
        json={"text": text},
        timeout=2,
    )
    return r.json()["vector"]


def similarity_local(va: list[int], vb: list[int]) -> float:
    r = requests.post(
        f"{LOCAL_URL}/api/similarity",
        json={"vector_a": va, "vector_b": vb},
        timeout=2,
    )
    return r.json()["similarity"]


# ── Offline document search ────────────────────────────────────────────

class OfflineSearch:
    def __init__(self):
        self.index: list[tuple[str, list[int]]] = []

    def add(self, text: str) -> None:
        vec = encode_local(text)
        self.index.append((text, vec))

    def search(self, query: str, top_k: int = 3) -> list[tuple[str, float]]:
        qv = encode_local(query)
        scored = [
            (text, similarity_local(qv, vec))
            for text, vec in self.index
        ]
        return sorted(scored, key=lambda x: x[1], reverse=True)[:top_k]


# No network calls beyond localhost — works fully air-gapped
search = OfflineSearch()
search.add("Patient admitted with chest pain and shortness of breath.")
search.add("MRI scan scheduled for Tuesday morning.")
search.add("Discharge summary: patient stable, follow-up in 2 weeks.")

results = search.search("chest and breathing problems")
for text, score in results:
    print(f"{score:.4f} | {text}")

Expected output

0.7841 | Patient admitted with chest pain and shortness of breath.

0.5120 | Discharge summary: patient stable, follow-up in 2 weeks.

0.4388 | MRI scan scheduled for Tuesday morning.

# "shortness of breath" → "breathing problems": 0.7841 (shared trigrams)

# All calls go to localhost. Zero external traffic.

Honest comparison: AXIOM vs embeddings

These are not competing products. They excel at different tasks.

TaskAXIOMSemantic Embeddings
Exact duplicate detectionExcellentOverkill
Near-duplicate / typo detectionExcellentGood
Synonym matching (buy vs purchase)PoorExcellent
Cross-language similarityNoExcellent (multilingual models)
Paraphrase detectionPoorExcellent
Latency per encode< 5 ms, CPU100–500 ms, GPU API call
Cost at 10M opsFixed VPS cost~$1–10 per 1M tokens
Works offline / air-gappedYesNo (unless self-hosted model)
Deterministic outputAlwaysUsually (model version dependent)
Sentiment / intent classificationNoYes (fine-tuned models)

The hybrid approach

The best production pipelines use both. AXIOM handles the structural pre-filter; embeddings handle the semantic re-ranking.

Recommended pipeline for RAG

1
AXIOM — encode query~3 ms, CPU, free

Produces a 10,048-bit vector deterministically

2
AXIOM — pre-filter corpus~5 ms for 10k docs

Narrows 10,000 candidates to ~50 structural matches

3
Embedding API — re-rank 50~200 ms, 50 API calls

Semantic understanding reduces 50 to top 5

4
LLM — generate answer1–5 s

GPT / Claude produces final response from top 5 context docs

Total: ~6 seconds for a 10,000-document corpus. Without AXIOM pre-filtering you would need 10,000 embedding API calls instead of 50 — approximately 200x more expensive.

hybrid_rag.ts — AXIOM pre-filter + OpenAI re-rank

import OpenAI from "openai";

const AXIOM = "https://api.axiom.dev";
const KEY   = process.env.AXIOM_API_KEY!;
const openai = new OpenAI();

const post = (path: string, body: object) =>
  fetch(AXIOM + path, {
    method: "POST",
    headers: { "Content-Type": "application/json", "X-API-Key": KEY },
    body: JSON.stringify(body),
  }).then(r => r.json());

interface Doc { id: string; text: string; axiomVec?: number[] }

// ── Stage 1: AXIOM structural pre-filter ──────────────────────────────

async function axiomPrefilter(query: string, corpus: Doc[], topK = 50): Promise<Doc[]> {
  const { vector: qv } = await post("/api/encode", { text: query });

  const scored = await Promise.all(
    corpus.map(async doc => {
      const { similarity } = await post("/api/similarity", {
        vector_a: qv,
        vector_b: doc.axiomVec,
      });
      return { doc, score: similarity as number };
    })
  );

  return scored
    .filter(s => s.score >= 0.60)
    .sort((a, b) => b.score - a.score)
    .slice(0, topK)
    .map(s => s.doc);
}

// ── Stage 2: Semantic re-rank with OpenAI embeddings ──────────────────

async function semanticRerank(query: string, candidates: Doc[], topK = 5): Promise<Doc[]> {
  const inputs  = [query, ...candidates.map(c => c.text)];
  const { data } = await openai.embeddings.create({
    model: "text-embedding-3-small",
    input: inputs,
  });

  const [qEmbed, ...docEmbeds] = data.map(d => d.embedding);

  const cosineSim = (a: number[], b: number[]) => {
    const dot  = a.reduce((s, v, i) => s + v * b[i], 0);
    const magA = Math.sqrt(a.reduce((s, v) => s + v * v, 0));
    const magB = Math.sqrt(b.reduce((s, v) => s + v * v, 0));
    return dot / (magA * magB);
  };

  return candidates
    .map((doc, i) => ({ doc, score: cosineSim(qEmbed, docEmbeds[i]) }))
    .sort((a, b) => b.score - a.score)
    .slice(0, topK)
    .map(s => s.doc);
}

// ── Full hybrid pipeline ───────────────────────────────────────────────

async function hybridSearch(query: string, corpus: Doc[]): Promise<Doc[]> {
  console.time("axiom-prefilter");
  const candidates = await axiomPrefilter(query, corpus, 50);
  console.timeEnd("axiom-prefilter");
  console.log(`AXIOM: ${corpus.length} → ${candidates.length} candidates`);

  if (candidates.length === 0) return [];

  console.time("openai-rerank");
  const top5 = await semanticRerank(query, candidates, 5);
  console.timeEnd("openai-rerank");

  return top5;
}

// Usage
const results = await hybridSearch("password reset instructions", myCorpus);
// axiom-prefilter: 8ms
// AXIOM: 10000 → 23 candidates
// openai-rerank: 312ms   (23 calls, not 10000)

Limitations — read before using in production

AXIOM does character-level trigram matching. This is useful but narrow. The following tasks are outside its scope — using AXIOM for them will produce incorrect or misleading results.

Synonyms and paraphrases

"buy shoes" vs "purchase footwear" → ~0.42 (near-random). AXIOM shares no trigrams across these words.

Semantic similarity

"dog" vs "cat" → 0.5012. AXIOM has no concept of meaning. Do not use it for concept search.

Cross-language

"hello" vs "hola" → ~0.30. No shared character patterns across language families.

Sentiment / intent

"I love this product" vs "I hate this product" → high similarity (~0.81). Structurally similar, semantically opposite.

Long-document summarisation matching

A 10-sentence summary vs a 1-page article will score low even if they convey the same information.

Replacing embeddings entirely

AXIOM is a pre-filter. For any task requiring semantic understanding, you still need an embedding model.

Ready to add AXIOM to your pipeline?

Try the API in the playground or read the full API reference.