🔓 Going fully open source — source code is now public

We don't trust anything. We verify everything.

Most intelligence platforms trust the LLM, trust the database, trust the network. AI hallucinations pollute their graph. A single SQL update rewrites history. A compromised auth provider opens every door. Enki is built on the opposite assumption: nothing is trusted until it's cryptographically verified. Six gates stand between a signal arriving and a fact appearing in your knowledge graph.

The discipline layer

Six cryptographic gates.

Signed at the device

Pulse app or optional hardware key

Governed at the vocabulary

Pack-signed, fail-closed active set

Extracted via the LDU

Open mouth — raw candidates, never canonical

Admitted via the GDU

Locked jaw — 5-layer collapse, base-60 address

Hash-chained at admission

Tamper one row, the chain breaks

Encrypted at rest

LUKS volumes + AES-256-GCM key envelopes

Gate 1

Signed at the device.

Every admin action and every signal-ingest call carries a hardware-rooted Ed25519 signature. Compromise the server remotely and you still can't enroll devices, push updates, or admit signals without approval from the Pulse app — or the optional hardware key.

What attackers can't do

Pair a rogue device without the device-owner pressing approve in Pulse
Forge an admin push by stealing a session token — challenge-response requires the live private key
Replay an old signed request — every signature is bound to a server-issued challenge with a timestamp

Backing it up

Algorithm: Ed25519, RFC 8032
Key storage: AES-256-GCM envelope, PBKDF2-HMAC-SHA256 with 600k iterations
Anti-replay: challenge_id | peer_node_id | timestamp, canonicalized per RFC 8785
Optional hardware: hardware key (air-gapped signing device), private key never leaves the device

Gate 2

Governed at the vocabulary.

The AI can't invent entity types. Every category, every subtype, every relationship the system will ever recognize is defined in a signed pack. The boot loader refuses to start in production unless every pack's content hash matches the signed active set. Tamper with a vocab JSON file and the system fails closed — it doesn't run.

What attackers can't do

Slip a new entity type into the system without going through pack ceremony
Modify the resolution rules without invalidating the active-set hash
Force the loader into a degraded fallback — production mode is fail-closed

Backing it up

Active set: allow-list of admitted packs with version pins + SHA-256 content hashes
Namespace registry: prevents semantic collisions between packs
Canonical hash: RFC 8785 JCS + Unicode NFC normalization before SHA-256
Activation ceremony: explicit human acceptance via immutable activation records
Phase 4.2 enforcement: direct PackLoader() usage forbidden without GDU context

Gate 3

Extracted via the LDU.

The Local Definition Universe is the open mouth. The AI extracts freely — people, organizations, relationships, facts, contradictions — into a provisional working space. Nothing here is canonical. Nothing here is in your knowledge graph yet. Vision models run the same pipeline on photos: faces, text, objects, intelligence value.

What the LDU does

Tier 1: deterministic keyword + form classifier (~10K forms/sec, no LLM)
Tier 2: LLM extraction (Gemma 4 default, Llama / Mistral / Qwen swappable)
Vision: LLaVA-7B + MediaPipe face detection + PaddleOCR with Ollama fallback
Intelligence classifier flags faces / IDs / weapons / vessels / aircraft / crime scenes
Redaction-region detection on OCR — flagged for compliance, never logged as content

Why open mouth, locked jaw

If the AI were allowed to write directly to the knowledge graph, every hallucination would become a permanent fact. The LDU/GDU split lets the LLM be creative in extraction while the graph stays disciplined. The LDU is full of provisional things. The GDU only admits what resolves cleanly.

Gate 4

Admitted via the GDU.

The Governed Definition Universe is the locked jaw. Every candidate entity passes through five resolution layers before it can become a node. 47 articles about the same company produce one canonical node with 47 sources attached — not 47 duplicates. Every admitted node gets a permanent base-60 address that never changes, never gets reused, and works as an audit-stable handle across the federation.

The 5-layer collapse

Hard identifier match (CIK, MMSI, ICAO hex, VIN, plate)
Normalized name (Unicode NFC, case, whitespace, punctuation stripped)
Alias lookup against known aliases for promoted entities
AI-powered similarity against the form-distribution model (base-60 cyclical LM)
Spatial proximity (PostGIS distance within type-specific tolerance)

Why base-60 addresses

60 has 12 divisors. The address space is sexagesimal (Sumerian). 14 top categories totaling 135 valid positions out of 3,600 possible two-digit addresses. 96.25% of addresses are invalid by construction — a strong hallucination filter for any predictor trying to invent a new type.

e.g. Jeffrey Epstein → 0.0.171025 · aircraft type → 0.3.1832 · form "PBL" → 11.4.202

Gate 5

Hash-chained at admission.

Every admission is recorded in an append-only log with a SHA-256 link to the previous row. Tamper with any historical record and the chain breaks at that point — every subsequent hash is wrong. Evidence you can defend under courtroom-grade scrutiny.

What attackers can't do

Modify an admitted fact without breaking the chain at that row and all rows after
Insert a back-dated row — the previous-hash linkage anchors temporal order
Quietly delete history — gaps are detectable by sequence and hash

Deterministic AI for audit

Extraction runs at temperature 0.0. Same input, same output, every time. Re-run any extraction from any year-old log entry and get the byte-identical result. Required for regulated environments. Required for any case that ends up in court.

Gate 6

Encrypted at rest.

The database volume is LUKS-encrypted. The hardware-rooted signing key is wrapped in an AES-256-GCM envelope with PBKDF2 600,000-iteration key derivation. Container processes drop privileges and run with read-only source mounts. Pull the drive and you get ciphertext.

Hardening floor

SecureBoot enabled at firmware level (G1 prod)
LUKS volume encryption on the postgres data directory
Container `no-new-privileges:true` + per-service CPU/memory limits
All non-public ports bound to 127.0.0.1, not 0.0.0.0
Source-code mounts read-only — api process cannot rewrite its own code
Daily encrypted pg_dump backup with integrity verification on restore

Access control

RFC 6238 TOTP MFA with bcrypt-hashed single-use backup codes
Per-IP rate-limit tracking on auth attempts
Optional Cloudflare Access edge policy with email + service-token rules
Air-gap mode — node fully operational with zero external connectivity

Why most platforms can't do this.

Built-for-trust isn't a feature you can add later. It's a substrate choice. Once a system has decided the AI's output is canonical, that decision propagates through every table, every API, every report. Enki was built with the opposite choice from the first commit.

Most intelligence platforms

AI output written directly to the canonical DB
Schema mutable by anyone with SQL access
No append-only chain — historical edits invisible
Cloud-only — your data lives on someone else's hardware
Auth = "trust the IDP"

Enki

AI output enters a provisional space (LDU), not the canonical graph
Schema bound to signed packs — no silent mutation
Hash-chained admission — tampering is mathematically detectable
Run on your own hardware — full air-gap supported
Auth = device signature + optional physical key + TOTP MFA

Two ways in.

Browse the public federation for free — read what we've already pulled from FBI, CIA, USAF, AARO, PACER, SEC. Or get your own node and feed it your own data.

Browse the open library Get your own node