Integration Guide
AXIOM + AI Pipelines
AXIOM is a character-level similarity engine, not a semantic one. It encodes text into 10,048-bit binary vectors using trigram bundling — fast, deterministic, and CPU-only. It complements embeddings; it does not replace them.
What AXIOM adds to an AI pipeline
Semantic embeddings (OpenAI, Cohere, BERT) are expensive and slow to call. AXIOM runs in microseconds on a single CPU core and catches structurally similar text — typos, near-duplicates, and reworded copies — before you spend money on an embedding API. Think of AXIOM as the cheap pre-filter that shrinks the problem, with embeddings handling the semantic nuance on whatever remains.
Real similarity scores
Measured from actual API responses. AXIOM captures surface-level character overlap, not meaning. Notice "dog" vs "cat" is essentially random noise.
| Input A | Input B | Score | What this means |
|---|---|---|---|
| hello world | hello world | 1.0000 | Exact match |
| hello world | hello worl | 0.8649 | One char dropped |
| the cat sat on the mat | the cat sat on the hat | 0.8547 | One word differs |
| machine learning | machine learning model | 0.8252 | Added word shares trigrams |
| rust programming | python programming | 0.7481 | Shared suffix, different prefix |
| dog | cat | 0.5012 | Effectively random — no semantic overlap |
Integration Patterns
Browse the patterns below — pick the ones relevant to your stack. Each includes a complete code example and honest guidance on when it helps and when it does not.
Pre-filtering for RAG
Use AXIOM to deduplicate and near-match before calling an embedding API — so you only embed the candidates that survive the structural filter.
When to use
- — High-volume RAG with many similar documents
- — When embedding API cost is a concern
- — User queries are textually close to stored docs
When NOT to use
- — Queries use synonyms or paraphrases
- — Multilingual retrieval
- — Intent-based or conceptual queries
rag_prefilter.ts
// RAG pre-filter: AXIOM first, embedding API only for survivors
const AXIOM_URL = "https://api.axiom.dev";
const AXIOM_KEY = process.env.AXIOM_API_KEY!;
async function axiomEncode(text: string): Promise<number[]> {
const res = await fetch(`${AXIOM_URL}/api/encode`, {
method: "POST",
headers: { "Content-Type": "application/json", "X-API-Key": AXIOM_KEY },
body: JSON.stringify({ text }),
});
return (await res.json()).vector;
}
async function axiomSimilarity(a: number[], b: number[]): Promise<number> {
const res = await fetch(`${AXIOM_URL}/api/similarity`, {
method: "POST",
headers: { "Content-Type": "application/json", "X-API-Key": AXIOM_KEY },
body: JSON.stringify({ vector_a: a, vector_b: b }),
});
return (await res.json()).similarity;
}
// ─────────────────────────────────────────────────────────────────────
// Step 1: At index time, encode every document with AXIOM (fast & free)
// Step 2: At query time, AXIOM narrows 10,000 docs → 50 candidates
// Step 3: Only call the embedding API for those 50 candidates
// ─────────────────────────────────────────────────────────────────────
interface Doc { id: string; text: string; axiomVector: number[] }
async function buildIndex(docs: { id: string; text: string }[]): Promise<Doc[]> {
const res = await fetch(`${AXIOM_URL}/api/batch-encode`, {
method: "POST",
headers: { "Content-Type": "application/json", "X-API-Key": AXIOM_KEY },
body: JSON.stringify({ texts: docs.map(d => d.text) }),
});
const { vectors } = await res.json();
return docs.map((d, i) => ({ ...d, axiomVector: vectors[i].vector }));
}
async function prefilter(
query: string,
index: Doc[],
topK = 50,
threshold = 0.62
): Promise<Doc[]> {
const queryVec = await axiomEncode(query);
const scored = await Promise.all(
index.map(async doc => ({
doc,
score: await axiomSimilarity(queryVec, doc.axiomVector),
}))
);
return scored
.filter(s => s.score >= threshold)
.sort((a, b) => b.score - a.score)
.slice(0, topK)
.map(s => s.doc);
}
// Usage
const index = await buildIndex(myDocs); // do once at startup
const candidates = await prefilter("how do I reset my password?", index);
console.log(`AXIOM narrowed ${myDocs.length} docs → ${candidates.length} candidates`);
// "AXIOM narrowed 10000 docs → 12 candidates"
// Now call your embedding API ONLY for the 12 survivors
const embeddings = await openai.embeddings.create({
model: "text-embedding-3-small",
input: candidates.map(c => c.text),
});
// 10,000x fewer embedding callsExpected output
AXIOM narrowed 10000 docs → 12 candidates
# Embedding API called 12 times instead of 10,000
Training Data Deduplication
Clean near-duplicate examples from fine-tuning datasets before training. Duplicate data inflates loss curves and wastes GPU hours.
When to use
- — Datasets scraped from the web (high redundancy)
- — RLHF data with repeated human-written prompts
- — Any dataset larger than 10k examples
When NOT to use
- — Paraphrase datasets (different surface, same meaning)
- — When you want stylistic diversity regardless of overlap
dedup_dataset.py
import requests
from itertools import combinations
AXIOM_URL = "https://api.axiom.dev"
HEADERS = {"X-API-Key": "your-api-key", "Content-Type": "application/json"}
def batch_encode(texts: list[str]) -> list[list[int]]:
"""Encode texts in chunks of 50 (API batch limit)."""
all_vectors = []
for i in range(0, len(texts), 50):
chunk = texts[i : i + 50]
r = requests.post(
f"{AXIOM_URL}/api/batch-encode",
headers=HEADERS,
json={"texts": chunk},
)
all_vectors.extend(v["vector"] for v in r.json()["vectors"])
return all_vectors
def get_similarity(vec_a: list[int], vec_b: list[int]) -> float:
r = requests.post(
f"{AXIOM_URL}/api/similarity",
headers=HEADERS,
json={"vector_a": vec_a, "vector_b": vec_b},
)
return r.json()["similarity"]
def deduplicate(examples: list[dict], threshold: float = 0.82) -> list[dict]:
"""
Remove near-duplicate training examples.
threshold=0.82 catches typos and minor rewording while
preserving genuinely different phrasings.
"""
texts = [ex["prompt"] for ex in examples]
vectors = batch_encode(texts)
duplicates: set[int] = set()
for (i, vi), (j, vj) in combinations(enumerate(vectors), 2):
if i in duplicates or j in duplicates:
continue
sim = get_similarity(vi, vj)
if sim >= threshold:
print(f"Dup (sim={sim:.4f})")
print(f" KEEP [{i}]: {texts[i][:70]}")
print(f" REMOVE [{j}]: {texts[j][:70]}")
duplicates.add(j)
kept = [ex for i, ex in enumerate(examples) if i not in duplicates]
print(f"\nRemoved {len(duplicates)} duplicates.")
print(f"Dataset: {len(examples)} → {len(kept)} examples")
return kept
# ── Usage ──────────────────────────────────────────────────────────────
import json
with open("raw_finetune.jsonl") as f:
raw = [json.loads(line) for line in f]
clean = deduplicate(raw, threshold=0.82)
with open("clean_finetune.jsonl", "w") as f:
for ex in clean:
f.write(json.dumps(ex) + "\n")Expected output
Dup (sim=0.8547)
KEEP [0]: the cat sat on the mat
REMOVE [4]: the cat sat on the hat
Removed 312 duplicates.
Dataset: 5000 → 4688 examples
Typo-Tolerant Search Layer
Run AXIOM as a first pass to catch typos and near-identical queries. Only forward structurally dissimilar queries to your semantic search layer.
When to use
- — Search bars with user-typed queries (typos are common)
- — e-commerce product search
- — Support ticket routing
When NOT to use
- — "buy shoes" should match "purchase footwear" (synonym gap)
- — Cross-language search
- — Conceptual or intent-based queries
typo_search.ts
// Two-stage search: AXIOM (typo tolerance) → semantic (meaning)
const AXIOM = "https://api.axiom.dev";
const KEY = process.env.AXIOM_API_KEY!;
const post = (path: string, body: object) =>
fetch(AXIOM + path, {
method: "POST",
headers: { "Content-Type": "application/json", "X-API-Key": KEY },
body: JSON.stringify(body),
}).then(r => r.json());
// ─── Build a simple in-memory index ───────────────────────────────────
interface Entry { text: string; vector: number[]; metadata: Record<string, string> }
async function buildTypoIndex(
items: { text: string; metadata: Record<string, string> }[]
): Promise<Entry[]> {
const { vectors } = await post("/api/batch-encode", {
texts: items.map(i => i.text),
});
return items.map((item, i) => ({
text: item.text,
vector: vectors[i].vector,
metadata: item.metadata,
}));
}
// ─── Stage 1: AXIOM catches typos and near-matches ────────────────────
async function axiomSearch(
query: string,
index: Entry[],
threshold = 0.72
): Promise<Entry[]> {
const { vector: qv } = await post("/api/encode", { text: query });
const results: Array<Entry & { score: number }> = [];
for (const entry of index) {
const { similarity } = await post("/api/similarity", {
vector_a: qv,
vector_b: entry.vector,
});
if (similarity >= threshold) results.push({ ...entry, score: similarity });
}
return results.sort((a, b) => b.score - a.score);
}
// ─── Stage 2: Semantic search for structurally dissimilar queries ──────
async function twoStageSearch(query: string, index: Entry[]) {
// AXIOM pass
const axiomHits = await axiomSearch(query, index, 0.72);
if (axiomHits.length > 0) {
console.log(`AXIOM found ${axiomHits.length} match(es) — skipping semantic`);
return axiomHits;
}
// Fall through to semantic search (your existing embedding pipeline)
console.log("AXIOM found nothing — forwarding to semantic search");
return semanticSearch(query); // your existing function
}
// ─── Demo ─────────────────────────────────────────────────────────────
const catalog = await buildTypoIndex([
{ text: "wireless bluetooth headphones", metadata: { id: "p1" } },
{ text: "running shoes size 10", metadata: { id: "p2" } },
{ text: "mechanical keyboard rgb", metadata: { id: "p3" } },
]);
// Typo query → AXIOM catches it
const r1 = await twoStageSearch("wireles bluethooth headphons", catalog);
// AXIOM found 1 match — skipping semantic
// score=0.8134 | wireless bluetooth headphones
// Synonym query → falls to semantic
const r2 = await twoStageSearch("noise cancelling ear buds", catalog);
// AXIOM found nothing — forwarding to semantic searchExpected output (typo query)
AXIOM found 1 match — skipping semantic
score=0.8134 | wireless bluetooth headphones
# "wireles bluethooth headphons" → 0.8134 (catches the typos)
# "noise cancelling ear buds" → 0.4819 (below threshold, semantic takes over)
Cache Key Generation
AXIOM vectors are deterministic — identical input always produces the same bit pattern. Use this as a cache key to avoid re-embedding identical or near-identical inputs.
When to use
- — High-traffic LLM endpoints with repeated queries
- — Avoiding duplicate embedding API charges
- — Response caching for chatbots
When NOT to use
- — Queries where context or user identity matters
- — When you need exact-string matching only (a plain hash is simpler)
cache_key.py
import requests
import hashlib
import redis
AXIOM_URL = "https://api.axiom.dev"
HEADERS = {"X-API-Key": "your-api-key", "Content-Type": "application/json"}
cache = redis.Redis(host="localhost", port=6379, db=0)
def axiom_vector(text: str) -> list[int]:
r = requests.post(
f"{AXIOM_URL}/api/encode",
headers=HEADERS,
json={"text": text},
)
return r.json()["vector"]
def vector_to_cache_key(vector: list[int]) -> str:
"""
Hash the binary vector into a short cache key.
Identical inputs → identical vectors → identical keys.
"""
raw = bytes(vector)
return hashlib.sha256(raw).hexdigest()[:32]
def get_similarity(va: list[int], vb: list[int]) -> float:
r = requests.post(
f"{AXIOM_URL}/api/similarity",
headers=HEADERS,
json={"vector_a": va, "vector_b": vb},
)
return r.json()["similarity"]
def cached_embed_and_respond(query: str, embed_fn, llm_fn) -> str:
"""
1. Encode with AXIOM (fast, deterministic)
2. Check cache for an exact or near-identical query
3. On miss: call embed_fn + llm_fn, store in cache
"""
vec = axiom_vector(query)
exact_key = vector_to_cache_key(vec)
# Exact cache hit (same query, same vector, same key)
cached = cache.get(exact_key)
if cached:
print(f"Cache HIT (exact): {exact_key[:12]}...")
return cached.decode()
# Near-duplicate check: scan recent cache entries
# (in production, use a small in-memory recent-vector store)
recent_keys = cache.keys("vec:*")
for rk in recent_keys[:200]: # check last 200 unique queries
stored_vec = list(cache.hget(rk, "vector") or b"")
if not stored_vec:
continue
sim = get_similarity(vec, stored_vec)
if sim >= 0.92: # nearly identical text
cached_response = cache.hget(rk, "response")
print(f"Cache HIT (near-dup, sim={sim:.4f})")
return cached_response.decode()
# Cache miss — call the expensive functions
print("Cache MISS — calling embed + LLM")
embedding = embed_fn(query)
response = llm_fn(query, embedding)
# Store result
cache.set(exact_key, response, ex=3600)
cache.hset(f"vec:{exact_key}", mapping={
"vector": bytes(vec),
"response": response,
})
cache.expire(f"vec:{exact_key}", 3600)
return response
# ── Usage ──────────────────────────────────────────────────────────────
r1 = cached_embed_and_respond("How do I reset my password?", embed, llm)
# Cache MISS — calling embed + LLM
r2 = cached_embed_and_respond("How do I reset my password?", embed, llm)
# Cache HIT (exact): a3f8b2c1d9e4...
r3 = cached_embed_and_respond("how do i resett my pasword?", embed, llm)
# Cache HIT (near-dup, sim=0.9312)Expected output
Cache MISS — calling embed + LLM
Cache HIT (exact): a3f8b2c1d9e4...
Cache HIT (near-dup, sim=0.9312)
# Third call (typo) avoided both embed API + LLM call entirely
Content Drift Monitoring
Compare new content batches against a stored baseline to detect when a document has changed significantly — useful for re-embedding triggers, version control, or data-quality alerts.
When to use
- — Monitoring scraped web content for changes
- — Deciding when to re-embed a document
- — Detecting document tampering or unexpected edits
When NOT to use
- — Detecting meaning shifts caused by replacing synonyms
- — When you need diff-level granularity (use a diff tool)
drift_monitor.py
import requests
import json
from pathlib import Path
AXIOM_URL = "https://api.axiom.dev"
HEADERS = {"X-API-Key": "your-api-key", "Content-Type": "application/json"}
BASELINE_FILE = Path("baseline_vectors.json")
def encode(text: str) -> list[int]:
r = requests.post(
f"{AXIOM_URL}/api/encode", headers=HEADERS, json={"text": text}
)
return r.json()["vector"]
def similarity(va: list[int], vb: list[int]) -> float:
r = requests.post(
f"{AXIOM_URL}/api/similarity",
headers=HEADERS,
json={"vector_a": va, "vector_b": vb},
)
return r.json()["similarity"]
def save_baseline(docs: dict[str, str]) -> None:
"""Encode and store baseline vectors for a set of named documents."""
baseline = {}
for doc_id, text in docs.items():
baseline[doc_id] = {"text": text[:120], "vector": encode(text)}
BASELINE_FILE.write_text(json.dumps(baseline))
print(f"Baseline saved: {len(baseline)} documents")
def check_drift(
new_docs: dict[str, str],
threshold: float = 0.85,
) -> list[dict]:
"""
Compare current content against baseline.
Returns documents that have drifted below the threshold.
"""
baseline = json.loads(BASELINE_FILE.read_text())
alerts = []
for doc_id, new_text in new_docs.items():
if doc_id not in baseline:
alerts.append({"id": doc_id, "status": "new", "score": None})
continue
baseline_vec = baseline[doc_id]["vector"]
new_vec = encode(new_text)
score = similarity(baseline_vec, new_vec)
status = "stable" if score >= threshold else "drifted"
if status == "drifted":
alerts.append({
"id": doc_id,
"status": status,
"score": round(score, 4),
"baseline_preview": baseline[doc_id]["text"][:60],
"new_preview": new_text[:60],
})
print(f"DRIFT [{doc_id}] sim={score:.4f}")
print(f" was: {baseline[doc_id]['text'][:60]}")
print(f" now: {new_text[:60]}")
return alerts
# ── Usage ──────────────────────────────────────────────────────────────
# Day 1: Save baseline
save_baseline({
"pricing": "Our free tier includes 100 API calls per day.",
"support": "Email us at support@example.com for help.",
"terms": "By using this service you agree to our terms.",
})
# Day 7: Compare new versions
alerts = check_drift({
"pricing": "Our free tier includes 100 API calls per day.", # unchanged
"support": "Contact our support team via the in-app chat.", # changed
"terms": "By using this platform you agree to our terms.", # minor edit
})
# DRIFT [support] sim=0.6823
# was: Email us at support@example.com for help.
# now: Contact our support team via the in-app chat.Expected output
Baseline saved: 3 documents
DRIFT [support] sim=0.6823
was: Email us at support@example.com for help.
now: Contact our support team via the in-app chat.
# "pricing" sim=1.0000 (unchanged, no alert)
# "terms" sim=0.8912 (above threshold, no alert)
Edge / Offline Processing
Where there is no internet access, no GPU, and no cloud API budget — AXIOM runs on a single CPU core and needs no external dependencies.
When to use
- — Air-gapped systems (healthcare, defence, finance)
- — Mobile apps that cannot call an embedding API
- — IoT devices or embedded systems
- — Environments where data must not leave the device
When NOT to use
- — When semantic understanding is required (embeddings are better)
- — If you have GPU access and latency tolerances of 200 ms+
edge_search.py — self-hosted AXIOM, no internet required
# Self-hosted AXIOM on a local machine or Raspberry Pi.
# Requires: Docker + ~256 MB RAM. No GPU, no internet, no API key.
#
# docker run -p 8080:8080 axiom-public:latest
#
# Then point all calls at http://localhost:8080
import requests
LOCAL_URL = "http://localhost:8080"
# No API key needed for self-hosted instance with no auth configured
def encode_local(text: str) -> list[int]:
r = requests.post(
f"{LOCAL_URL}/api/encode",
json={"text": text},
timeout=2,
)
return r.json()["vector"]
def similarity_local(va: list[int], vb: list[int]) -> float:
r = requests.post(
f"{LOCAL_URL}/api/similarity",
json={"vector_a": va, "vector_b": vb},
timeout=2,
)
return r.json()["similarity"]
# ── Offline document search ────────────────────────────────────────────
class OfflineSearch:
def __init__(self):
self.index: list[tuple[str, list[int]]] = []
def add(self, text: str) -> None:
vec = encode_local(text)
self.index.append((text, vec))
def search(self, query: str, top_k: int = 3) -> list[tuple[str, float]]:
qv = encode_local(query)
scored = [
(text, similarity_local(qv, vec))
for text, vec in self.index
]
return sorted(scored, key=lambda x: x[1], reverse=True)[:top_k]
# No network calls beyond localhost — works fully air-gapped
search = OfflineSearch()
search.add("Patient admitted with chest pain and shortness of breath.")
search.add("MRI scan scheduled for Tuesday morning.")
search.add("Discharge summary: patient stable, follow-up in 2 weeks.")
results = search.search("chest and breathing problems")
for text, score in results:
print(f"{score:.4f} | {text}")Expected output
0.7841 | Patient admitted with chest pain and shortness of breath.
0.5120 | Discharge summary: patient stable, follow-up in 2 weeks.
0.4388 | MRI scan scheduled for Tuesday morning.
# "shortness of breath" → "breathing problems": 0.7841 (shared trigrams)
# All calls go to localhost. Zero external traffic.
Honest comparison: AXIOM vs embeddings
These are not competing products. They excel at different tasks.
| Task | AXIOM | Semantic Embeddings |
|---|---|---|
| Exact duplicate detection | Excellent | Overkill |
| Near-duplicate / typo detection | Excellent | Good |
| Synonym matching (buy vs purchase) | Poor | Excellent |
| Cross-language similarity | No | Excellent (multilingual models) |
| Paraphrase detection | Poor | Excellent |
| Latency per encode | < 5 ms, CPU | 100–500 ms, GPU API call |
| Cost at 10M ops | Fixed VPS cost | ~$1–10 per 1M tokens |
| Works offline / air-gapped | Yes | No (unless self-hosted model) |
| Deterministic output | Always | Usually (model version dependent) |
| Sentiment / intent classification | No | Yes (fine-tuned models) |
The hybrid approach
The best production pipelines use both. AXIOM handles the structural pre-filter; embeddings handle the semantic re-ranking.
Recommended pipeline for RAG
Produces a 10,048-bit vector deterministically
Narrows 10,000 candidates to ~50 structural matches
Semantic understanding reduces 50 to top 5
GPT / Claude produces final response from top 5 context docs
Total: ~6 seconds for a 10,000-document corpus. Without AXIOM pre-filtering you would need 10,000 embedding API calls instead of 50 — approximately 200x more expensive.
hybrid_rag.ts — AXIOM pre-filter + OpenAI re-rank
import OpenAI from "openai";
const AXIOM = "https://api.axiom.dev";
const KEY = process.env.AXIOM_API_KEY!;
const openai = new OpenAI();
const post = (path: string, body: object) =>
fetch(AXIOM + path, {
method: "POST",
headers: { "Content-Type": "application/json", "X-API-Key": KEY },
body: JSON.stringify(body),
}).then(r => r.json());
interface Doc { id: string; text: string; axiomVec?: number[] }
// ── Stage 1: AXIOM structural pre-filter ──────────────────────────────
async function axiomPrefilter(query: string, corpus: Doc[], topK = 50): Promise<Doc[]> {
const { vector: qv } = await post("/api/encode", { text: query });
const scored = await Promise.all(
corpus.map(async doc => {
const { similarity } = await post("/api/similarity", {
vector_a: qv,
vector_b: doc.axiomVec,
});
return { doc, score: similarity as number };
})
);
return scored
.filter(s => s.score >= 0.60)
.sort((a, b) => b.score - a.score)
.slice(0, topK)
.map(s => s.doc);
}
// ── Stage 2: Semantic re-rank with OpenAI embeddings ──────────────────
async function semanticRerank(query: string, candidates: Doc[], topK = 5): Promise<Doc[]> {
const inputs = [query, ...candidates.map(c => c.text)];
const { data } = await openai.embeddings.create({
model: "text-embedding-3-small",
input: inputs,
});
const [qEmbed, ...docEmbeds] = data.map(d => d.embedding);
const cosineSim = (a: number[], b: number[]) => {
const dot = a.reduce((s, v, i) => s + v * b[i], 0);
const magA = Math.sqrt(a.reduce((s, v) => s + v * v, 0));
const magB = Math.sqrt(b.reduce((s, v) => s + v * v, 0));
return dot / (magA * magB);
};
return candidates
.map((doc, i) => ({ doc, score: cosineSim(qEmbed, docEmbeds[i]) }))
.sort((a, b) => b.score - a.score)
.slice(0, topK)
.map(s => s.doc);
}
// ── Full hybrid pipeline ───────────────────────────────────────────────
async function hybridSearch(query: string, corpus: Doc[]): Promise<Doc[]> {
console.time("axiom-prefilter");
const candidates = await axiomPrefilter(query, corpus, 50);
console.timeEnd("axiom-prefilter");
console.log(`AXIOM: ${corpus.length} → ${candidates.length} candidates`);
if (candidates.length === 0) return [];
console.time("openai-rerank");
const top5 = await semanticRerank(query, candidates, 5);
console.timeEnd("openai-rerank");
return top5;
}
// Usage
const results = await hybridSearch("password reset instructions", myCorpus);
// axiom-prefilter: 8ms
// AXIOM: 10000 → 23 candidates
// openai-rerank: 312ms (23 calls, not 10000)Limitations — read before using in production
AXIOM does character-level trigram matching. This is useful but narrow. The following tasks are outside its scope — using AXIOM for them will produce incorrect or misleading results.
Synonyms and paraphrases
"buy shoes" vs "purchase footwear" → ~0.42 (near-random). AXIOM shares no trigrams across these words.
Semantic similarity
"dog" vs "cat" → 0.5012. AXIOM has no concept of meaning. Do not use it for concept search.
Cross-language
"hello" vs "hola" → ~0.30. No shared character patterns across language families.
Sentiment / intent
"I love this product" vs "I hate this product" → high similarity (~0.81). Structurally similar, semantically opposite.
Long-document summarisation matching
A 10-sentence summary vs a 1-page article will score low even if they convey the same information.
Replacing embeddings entirely
AXIOM is a pre-filter. For any task requiring semantic understanding, you still need an embedding model.
Ready to add AXIOM to your pipeline?
Try the API in the playground or read the full API reference.