🔓 Going fully open source in 13d 4h — full server, Pulse, desktop & Sentinel source public July 13, 2026
← Enki Systems · read as Markdown · Field Report (applied results) · indexed & published by ENKI · enkisystems.com

Enki: A Deterministic, Governed Substrate for Composable Entity and Sense Resolution

Architecture Paper — defines the governed substrate and its guarantees (determinism, abstention, auditability, composability). For one empirical result the substrate makes possible, see the Field Report: Offline Federated Extraction.

Author: Brad Harris (Founder & Chief Architect, Enki Systems) · Status: draft · Companion docs: SYSTEM · EVIDENCE · PROVENANCE_ETHICS · FIGURES

Motivation

Two analysts — or a node and its federated peer — ingest the same case file and must reach the same graph of who-is-who, be able to show why each link was made, undo any of them, and merge their findings without retraining a model. Today's learned linkers fail all four: reorder the inputs and the answer drifts, the model can't explain a merge, and two models don't combine. And because this is fusion over real, named people, those four properties — determinism, audit, reversibility, mergeability — are requirements, not niceties. That is the problem Enki addresses.

Abstract

We present Enki, a substrate for all-source information fusion whose design goal is not state-of-the-art accuracy but four systems properties opaque learned systems cannot provide: determinism (identical inputs yield byte-identical graphs on every node), auditability (every resolution is a reversible, attributable record), composability (nodes merge knowledge by addition, not retraining), and abstention (ambiguous evidence is parked, not guessed — a safety property). Enki resolves entities (and, architecturally, word senses) through a single governed scoring policy — a log-linear scorer with a hard veto and a reject-option threshold — over a content-addressed grid; because addresses and policy are content-hashed, two installations assign the same referent the same address and combine by summing sufficient statistics. On real captured data (64k entities from 44k signals) we show: determinism by construction (reproducible fingerprints across shuffled orders) where a textbook streaming linker produces eight different clusterings of the same 1,499 mentions and Enki produces one; composition that is exact and order-independent, with the predicted failure modes (incompatible policy hash, overlapping evidence) demonstrated as negative controls, and erasure that composes under remove-wins semantics. On accuracy we report an honest negative: identifier resolution is precise (98%) but covers 0.8% of entities, and name-evidence selective prediction has no high-precision operating point on any slice (≤71% precision, no ≥90% knee) — which is why the architecture is identifier-first and abstains. The resolution math is deliberately standard; the contribution is the architecture and its guarantees, reported with the experiment that could have refuted them.

1. Contributions

  1. Determinism by content-addressing. Identical inputs produce byte-identical

graphs on every node, by construction (addresses are content hashes; scores use a stable axis order + canonical tiebreak). A textbook streaming linker drifts to 8 clusterings on reordered input; Enki gives 1 (§6 of EVIDENCE).

  1. Composability-by-summation, stated precisely (sufficient statistics; sum at the

evidence level; valid iff policy hashes match; dedup required; vetoes union separately) and proven with negative controls — including that erasure composes under remove-wins (§7–§8 of EVIDENCE).

  1. Governance as substrate: fail-closed canonicalization, reversible auditable

decisions, and right-to-be-forgotten erasure as a first-class primitive (§5 of SYSTEM; PROVENANCE_ETHICS).

  1. An honest characterization of resolution accuracy — a reported negative:

identifier resolution is precise but sparse, and name-evidence selective prediction has no high-precision knee, which motivates the identifier-first + abstention design (§3c of EVIDENCE). The engine is architecturally one governed scorer for both entity and sense resolution; we demonstrate it on entity resolution — the sense path is wired live and harness-validated but not yet measured on a real corpus, so we make the weaker, supportable claim.

2. The system

The end-to-end chain — capture → LDU (permissive admission) → extraction (LLM proposes) → GDU gateway (single fail-closed write path) → collapse (the equation) → content-addressed grid → governance → federation — is described in SYSTEM.md, with the substrate, the equation, a worked collapse, and the governance spine. The defining stance is the asymmetry: permissive capture, fail-closed canonicalization.

3. Evaluation

All results are on real captured data and reproducible; see EVIDENCE.md. We report determinism fingerprints, the determinism-drift baseline, the two-node composability + erase×merge experiments with negative controls, confluence (the same graph fingerprint under randomized processing order, and reproduced on a node rebuilt from scratch — so independent miners converge regardless of order or grouping), the risk-coverage curve (the decisive accuracy test — reported as a negative), and honest data-quality measurements (identifier sparsity and contamination). Scale is framed as prototype feasibility, not a scale result; accuracy is explicitly not claimed as a strength.

4. Limitations

Stated, not hidden: weights are hand-set (learnable from the decision log via max-entropy fitting, since the prior channel already accumulates the sufficient statistics); name-evidence selective prediction has no high-precision operating point (§3c), so high-precision resolution depends on identifiers, which cover ~0.8% of entities and are ~7–10% contaminated; abstention is a safety property and needs an absolute-confidence rule to fully deliver it; the sense path is integrated but unmeasured on a real corpus; the determinism-drift baseline needs a strong deterministic comparator + VI calibration; cross-node erasure is best-effort for non-cooperative peers; and the tombstone address is a hash of PII (a confirmation oracle, not anonymization — see PROVENANCE_ETHICS).

5. Related work

Enki is an assembly of well-understood components; we name them so the architectural contribution is clear by contrast.

  • Fellegi, I. P., & Sunter, A. B. (1969). A Theory for Record Linkage. JASA.

identifier-first matching and the match / possible-match / non-match split (our collapse / park / create).

  • Chow, C. K. (1970). On Optimum Recognition Error and Reject Tradeoff. *IEEE

Trans. Information Theory.* — the threshold-plus-abstain decision.

  • Berger, A., Della Pietra, S. & Della Pietra, V. (1996). A Maximum Entropy

Approach to Natural Language Processing. Computational Linguistics. — the Σ w·e log-linear scorer.

  • Yarowsky, D. (1995). Unsupervised Word Sense Disambiguation Rivaling Supervised

Methods. ACL. — bootstrapping / feedback priors.

  • Navigli, R. (2009). Word Sense Disambiguation: A Survey. *ACM Computing

Surveys.* — the WSD feature landscape.

  • Collins, A. M., & Loftus, E. F. (1975). A Spreading-Activation Theory of

Semantic Processing. Psychological Review. — expansion / relation walk.

  • Miller, G. A. (1995). WordNet: A Lexical Database for English. CACM; Fellbaum

(1998). — the ontology.

  • Merkle, R. (1987); Git; IPFS. — content-addressed storage.
  • Pitman–Koopman–Darmois (exponential families / sufficient statistics) — the

formal basis for "counts add."

6. Conclusion

Enki shows that an information-fusion substrate can be deterministic, abstaining, auditable, and composable-by-summation using only standard resolution math, by making addresses and policy content-hashed data and treating every decision as a governed, reversible record. The math is old; insisting it all be the same governed scorer over a content-addressed grid — and the determinism, composability, and erasure guarantees that follow — is the contribution.


Enki: A Deterministic, Governed Substrate for Composable Resolution

Thesis. Enki is a deterministic, content-addressed, governed substrate that makes entity and sense resolution abstaining, auditable, and composable-by- summation across federated nodes.

It deliberately uses standard, well-understood resolution math so that the substrate stays deterministic and auditable. It does not claim state-of-the-art accuracy; it claims that the same inputs always produce the same graph, that every decision is reversible and attributable, and that two nodes' knowledge combines by addition rather than retraining. This document traces one signal end to end — the chain — and names every governance boundary it crosses.


1. Why a substrate, not a model

All-source intelligence fusion has a requirement most NLP systems do not: two analysts (or two installations) reading the same documents must reach the same conclusions, and must be able to show why and undo it. A neural entity linker is opaque (cannot explain a link), non-reproducible (reorder the inputs, get a different answer), and uncombinable (parameter averaging is non-linear). Enki trades raw accuracy for four properties that opacity cannot provide:

  1. Determinism — same inputs → byte-identical graph, on every node.
  2. Abstention — when the evidence is ambiguous, it parks rather than guesses.
  3. Auditability — every resolution is a reversible, attributable record.
  4. Composability — nodes merge by summation under a shared governing policy.

These are systems properties. The resolution math underneath is intentionally ordinary (see §6) so that the substrate's guarantees hold.


2. The chain — one signal, end to end

CAPTURE          signed signals from connectors / paired devices
  │              (signal_type, source_domain, observables, device_signature)
  ▼
LDU  "open mouth"   permissive admission — every form gets a home immediately,
  │                 nothing is rejected; provisional until governed
  ▼
EXTRACTION       LLM → mapper → typed candidates (entities, claims, events, places)
  │              boundary: the LLM PROPOSES; it never writes to the canonical store
  ▼
GDU GATEWAY  "locked jaw"   the SINGLE authorized write path; content-hashed packs
  │                          govern; fail-closed — ungoverned input is parked
  ▼
COLLAPSE         one governed scoring policy:  S = Σ wₖ·eₖ, veto, θ/δ → commit | park
  │              entities (id/exact/loose) AND senses (register/domain/collocation)
  ▼
BASE-60 GRID     three content-addressed layers, deterministic addresses:
  │                FORM 11.<hash(surface)>  →  CONCEPT 12.<pos>.<hash(lib:sense)>
  │                                         →  INSTANCE <cat.sub…> (entities)
  ▼
GOVERNANCE       reversible collapse_decisions; hash-chained admission log;
  │              right-to-be-forgotten erasure (PII removed; tombstone retires address)
  ▼
FEDERATION       nodes compose by SUMMATION under a shared policy hash
                 (union grids, sum priors; Ed25519-signed deltas; tombstones propagate)

Each boundary is a deliberate constraint. LDU is permissive (capture loses nothing); GDU is fail-closed (nothing becomes canonical unless a content-hashed pack governs it); the LLM is advisory (it proposes candidates, the governed gateway decides). The asymmetry — permissive capture, fail-closed canonicalization — is the core engineering stance.


3. The substrate: a content-addressed grid

Every surface form is a distribution over addresses in a base-60 grid. Three layers, all addressed by a hash of their content, never by a node-local counter:

layeraddressexample
FORM11.0.<sha(surface)>every string ("lead", "Boeing")
CONCEPT12.<pos>.<sha(library:sense_id)>every meaning (lead-the-metal, lead-the-sales-role)
INSTANCE<cat.sub.…> + canonical_entity_keyevery real-world referent

Because an address is a pure function of content, two nodes that load the same library place the same meaning at the same address — which is exactly what makes determinism hold by construction and composition possible (§7). The base-60 radix is presentational; the determinism comes from the hashing.


4. Collapse and expansion — one equation

For a form f in context C with candidate addresses A:

S(a | f, C) = Σₖ wₖ · eₖ(a, f, C), subject to veto if any enabled gⱼ fires;
collapse to argmaxₐ S iff S(a\) ≥ θ and S(a\) − S(a₂) ≥ δ, else PARK.

The evidence axes eₖ ∈ [0,1] are feature functions; the weights wₖ, the vetoes, and the thresholds θ/δ all come from a content-hashed policy pack, not code. Entity collapse uses id/exact/loose with an identifier_conflict veto; sense collapse uses register/domain/collocation/frequency/pinfrequency is the dominant-sense prior (1/sense_rank from WordNet's sense ordering, the most-frequent-sense default absent other evidence) and pin is a domain-conditional override (a domain pack asserting the correct sense, weighted above the frequency default, below only id); the engine is the same. The inverse, expansion, reads the distribution back out — the meaning-fan (softmax of S) and a relation walk over the concept graph.

This is presented as a single governed scoring policy reused across decision types — design discipline, not mathematical novelty (§6).

A worked collapse (synthetic, fully reproducible)

Document: "The mine increased lead output." Form lead binds to two senses — lead(metal, domain=chemistry) and lead(sales-role, domain=business). Context domain ≈ mining/chemistry. Weight vector (from the policy pack): w = {id:10, exact:3, loose:1.5, register:1, domain:1, collocation:2, prior:1}; thresholds θ=1.0, δ=0.5. Each eₖ∈[0,1]; absent-signal axes are neutral 0.5.

candidateid·10exact·3domain·1collocation·2other (reg .5,prior 0)S
lead-metal 12.0.f205…03.01.01.00.55.5
lead-role 12.0.1c0c…03.00.00.00.53.5

Two-part decision, both must hold: θ-test S(a\)=5.5 ≥ θ=1.0 ✓ (enough absolute evidence); δ-test margin 5.5−3.5 = 2.0 ≥ δ=0.5 ✓ (a clear winner) → collapse to lead-metal, one reversible decision written. Strip the domain context and both fall to 3.5; the margin → 0 < δ → the δ-test fails → park (the system knows it doesn't know). For an entity: "Boeing" carrying CIK 12927 scores id=1 (weight 10) against an existing "The Boeing Company" with the same CIK → S≈10 → merge; a second "Acme" with a different* CIK triggers the identifier_conflict veto → that candidate is eliminated, never merged.

Why "byte-identical" holds. The graph's identity is its content-addressed addresses (hashes), not the float scores — fingerprints hash addresses. The score itself is computed deterministically: axes are evaluated in the policy's (insertion-ordered) key order, and ties break on a canonical key (sort by −score, then address), so neither summation order nor a score tie can perturb the result.


5. Governance: the spine

  • Fail-closed. A form no library governs is parked, never invented.
  • Reversible decisions. Every collapse is a collapse_decision row

(evidence-for, score, tier) — auditable and undoable; their aggregate is the prior evidence axis (the cyclical feedback that lets the model improve without retraining).

  • Right-to-be-forgotten. erase_entity deletes an entity's PII-bearing row and

writes a tombstone with audit fields and the retired content-address. Crucially the hash-chained admission log stores references (entity id, address, type, mutation hashes), not the raw name/identifiers, so deleting the row does not break the chain — there is no hard immutability-vs-deletion conflict (the chain hashes its own records, not the entity row). The admission path refuses to re-create a tombstoned address (re-creation blocked across re-ingestion), and erasure composes across nodes under remove-wins semantics (§8 of EVIDENCE). Two residuals are stated, not glossed: the address is a hash of PII (a confirmation oracle, not anonymization) and cross-node erasure is best-effort for non-cooperative peers — see PROVENANCE_ETHICS.


6. Lineage (we own it)

The scorer is a log-linear / maximum-entropy model (Berger, Della Pietra & Della Pietra 1996). The commit/park decision is a reject-option classifier (Chow 1970) and mirrors the match / possible-match / non-match of Fellegi–Sunter (1969) record linkage — our "park" is their clerical-review zone. The sense features are textbook WSD (Yarowsky 1995; Navigli 2009); expansion is spreading activation (Collins & Loftus 1975) over WordNet (Miller 1995). Content-addressing is the Merkle/Git/IPFS pattern; "counts add" is the sufficient-statistics property of exponential families. There is no new equation. The contribution is the architecture: one governed, abstaining, auditable, composable scorer used uniformly across resolution decisions.


7. Composability (the load-bearing claim)

Because policy weights and addresses are content-hashed data, the equation is identical on every node. Therefore models add. Precisely:

Within a shared content-hashed policy, Enki composes by **unioning the
content-addressed grid and summing decision counts at the evidence level** —
mathematically clean because counts are sufficient statistics — **provided evidence
is deduplicated and veto rules are unioned separately.** This is accumulation of
sufficient statistics, not gradient training: weaker than retraining, but exact,
associative, and auditable.

Two caveats are part of the claim, not exceptions to it: summation is valid iff hash(policy_A) == hash(policy_B) (counts under different policies are not comparable), and it requires disjoint evidence (overlapping corpora double-count; the grid, a set, composes idempotently regardless). Both are demonstrated as negative controls in the evaluation.


8. Evaluation & limitations

Full results in EVIDENCE.md — all on real captured data. Headlines: determinism fingerprints reproducible across shuffled orders; a textbook streaming linker produces 8 different clusterings of the same 1,499 mentions while Enki produces one; two nodes merge to bit-identical state with the predicted limits demonstrated, and erasure composes under remove-wins.

On accuracy we report an honest negative (§3c of EVIDENCE): identifier resolution is precise (98%) but covers 0.8% of entities, and name-evidence selective prediction has no high-precision operating point (≤71% precision, no ≥90% knee at any threshold). This is why the architecture is identifier-first and abstains — abstention is a safety property (it avoids false-association harm), not a high-precision-commit claim. The low name-precision is evidence for the design, and we publish the experiment that could have refuted the thesis.

Stated limitations: weights are hand-set (learnable from the decision log); the sense path is now measured on a real corpus (§9 of EVIDENCE — 25/25 reproducible harness, dominant-sense coverage ~25-30%→~50%, cross-domain structuring, entity-aware co-occurrence graph), though a gold-labelled sense-accuracy number and context-resolution of always-ambiguous pos-collision forms (pos-priors/EM) remain open; scale is prototype feasibility, not a scale result. The contribution is the substrate's guarantees, not the numbers.


Enki Whitepaper — Empirical Evidence (real captured data)

Raw material for the evaluation section. Every number here is from real data captured and processed by the live G1 node, measured in an isolated read-only snapshot (enki_harness). Methodology and caveats are stated so the paper can be honest about exactly what each figure does and does not show.

Thesis it serves: Enki is a deterministic, content-addressed, governed substrate that makes entity and sense resolution auditable, composable-by-summation, and abstaining. Accuracy is reported honestly but is not the claim: name-evidence selective prediction has no high-precision operating point (§3c), so abstention is a safety property (avoid false associations), not a high-precision-commit claim.


1. The corpus (real, captured)

quantityvalue
signals (raw captured records — documents/events)43,930
entities (resolved; one signal yields many mentions → entities)64,206
entities with a strong identifier (CIK/QID/LEI)542 (0.8%)
entities promoted / canonical8,382
entity typesorganization 22k · person 19k · product 3.9k · facility 3k · place 2.4k · gov 2.3k · …

(A signal is a captured record — a document or event — not a single mention; one signal produces many extracted mentions, which resolve into entities. So entities > signals is expected. In §6, 1,499 mentions yield 190 entities — the mention→entity ratio.)

Honest framing: this is feasibility at prototype scale, NOT a scale result. We do not invite a Spanner/BigTable scale comparison; the contributions are determinism, auditability, composability — none of which need large numbers.

2. Identifier sparsity & contamination (real data quality)

id keyentitiesdistinct valuesin contaminated group (shared id)
cik31930723 (7.2%)
wikidata_id22320822 (9.9%)
lei0

Two facts the paper must own: strong identifiers are sparse (0.8% of entities), so most resolution leans on name evidence; and ~7–10% of the ids that exist are contaminated (same value on distinct entities — e.g. two unrelated orgs sharing CIK 1472494; three agencies sharing the Pentagon's Q11208). This both motivates the identifier_conflict veto and exposes its limit: a shared wrong id cannot be vetoed. A verified gold slice (unique id) is therefore required for any honest accuracy number.

3. Held-out resolution (generalisation, leave-one-out)

Protocol: for a multi-variant entity, hold out one real surface form, delete it from that entity's alias memory, normalise it exactly as the live gateway does, and resolve against the full real grid. Ground truth = the held-out form's true entity. RECALL CEILING = was the true entity even retrieved (separates retrieval from scoring). Non-destructive (transaction rolled back).

3a. Full set (4k/3k sample, labels contain known contamination)

evidencecoverageprecisionscoring-on-retrievable
name-only20.7%45.9%78.5%
name+id22.2%51.9%82.7%

3b. Identifier-path sanity check (NOT an accuracy headline — it is circular)

evidencerecallcoverageprecision
name-only1.8%12.7%6.9%
name+id100.0%100.0%98.2%

This is a sanity check, not a resolution result, and we label it so. The slice is defined by entities with a unique strong id, then "name+id" resolves them using that id — the name-only 1.8% → name+id 100% jump shows the id does all the work. It confirms the id path is clean (the 1.8% residual is contamination); it says nothing about generalization. The honest accuracy result is the risk-coverage curve (§3c).

3c. Risk–coverage — the decisive test (name evidence, id WITHHELD)

Does selective prediction hold — is there an operating point with high precision at non-trivial coverage when name evidence does the work? The id is used only to label verified-distinct entities; it is withheld from the resolver. A commit to the wrong entity is an error, including when the true entity was never retrieved (so margin-based abstention must catch that or it is not selective prediction).

slicerecall ceilingbest precision @ coveragehigh-precision (≥90%) knee?
unique-id entities (228 q)1.8%8.6% @ 15%none
unique-name entities (4,568 q)33.4%71.5% @ 39%none

The negative is fundamental. Even on unambiguous names, name evidence tops out at ~71% precision with no ≥90% operating point; raising the threshold lowers precision (the θ≥2 cliff to 0% is a lone exact-match to a wrong same-named entity committing with no competitor to trigger the margin). Structural cause: a margin (δ) cannot detect "the true entity is not in the candidate set."

Honest reading. Identifiers deliver precision (98%) but cover 0.8%; name evidence reaches ~70% at ~40% coverage with no high-precision knee. This is why the architecture is identifier-first and abstains — and we report it as a measured negative, not a hidden one. Abstention is therefore a safety property (it avoids false-association harm) rather than a high-precision-commit claim; making it deliver even that fully needs an absolute-confidence rule (abstain on lone weak/ambiguous matches), noted as future work. scripts/eval_risk_coverage.py, scripts/eval_rc_uniquename.py.

4. A real defect found and fixed by the evaluation

wikidata_id key mismatch: 223 entities carry their QID under key wikidata_id (enricher spelling) but IDENTIFIER_PRECEDENCE + the canonical_entity_key only knew qid/wikidata — so the live gateway's strong-id collapse and id-folding were silently skipped for them. Fixed in Python + the plpgsql trigger (migration 127), proven byte-identical (2e8f3891…).

gold slice, name+idbeforeafter
id-recall28.9%100.0%
precision77.6%98.2%

This is the value of held-out evaluation on real data — it found a real production bug, and the fix is part of the "works fully before publish" bar. (Caveat: the fix was found and confirmed on the same gold slice later reported in §3b — mild test-set reuse; the before/after lift is a fix demonstration, not an independent accuracy claim.)

5. Determinism (the load-bearing property — reproducible artifact)

Stable fingerprints across shuffled input orders (anyone can re-run to the same hex):

gatewhatresult
language collapselibrary → grid, 6 shuffled ordersidentical 165c00aa…
entity collapseresolver, 8 orders (fwd/rev/6 shuffles)identical 84434ff0…
full integration40 libraries, 120k senses, collapse+expansion, cross-libraryidentical a4e10c04…

Addresses are content-derived hashes, so determinism holds by construction, not by luck — this is what makes composition-by-summation possible.


6. Determinism-drift baseline (the differentiator, real mentions)

Same task — cluster 1,499 real surface mentions (190 true entities) into entities — two methods, 8 shuffled input orders:

VI calibration anchors (so the numbers mean something): all-singletons = 2.47, all-one-cluster = 4.84 — these bracket the scale.

methodunique clusterings / 8 ordersmean pairwise VI (drift)accuracy-to-truth (VI)
Enki (deterministic components)10.00001.209 (zero drift)
greedy (sorted input → deterministic)10.00001.316 (zero drift)
greedy (arrival order — deployed practice)80.52991.351–1.518 (drifts 0.167)

Two honest readings, not one:

  1. Drift is real and is the deployed-practice failure mode. Greedy single-link on

arrival order — the standard streaming linker — produces 8 different clusterings of the same 1,499 mentions merely reordered. Enki produces one.

  1. **Single-node determinism is cheap — and we say so.** Feed the same greedy

linker a canonically sorted order and it too gives one partition (VI 1.316). So the contribution is not "we are deterministic" (sorting achieves that on one node). The contribution is that content-addressing is deterministic AND composes across nodes: every node assigns the same address to the same referent, so two nodes' results merge by summation (§7) — a sorted node-local clustering cannot (node A and node B sort different data into different clusters). Determinism that composes is the property; single-node determinism is table stakes.

On quality: Enki's VI 1.209 sits well below both trivial anchors (2.47 / 4.84) and edges out even the deterministic sorted-greedy (1.316), so the clustering is genuinely informative, not merely "less bad." scripts/eval_determinism_drift.py (pure, reproducible). Remaining: a learned strong baseline (correlation clustering / HAC) would tighten the quality comparison further.

7. Composability-by-summation + negative controls (the headline)

Two nodes ingest DISJOINT halves of the 1,499 real mentions; each builds its grid (content-addressed canonical_entity_key set) and priors (per-key decision counts). Uses the REAL identity function — the actual mechanism.

claimresult
union(grid_A, grid_B) == grid_ABTrue (1,253 keys)
sum(priors_A, priors_B) == priors_ABTrue (1,499 decisions)
merged fingerprint == single-node(A∪B)True (5a2b44b7…)

Negative controls (the predicted limits, demonstrated):

  • Different policy hash → merge REJECTED. Counts under different policy versions

aren't comparable; addition is valid iff hash(policy_A) == hash(policy_B).

  • Overlapping corpora → naive sum DOUBLE-COUNTS (exactly the 100 shared mentions:

1599 vs 1499). Counts need disjoint evidence / dedup; the grid (a set) composes idempotently regardless — only the count channel needs care.

(The 1,253 keys vs §6's 190 true entities is not a contradiction: this experiment groups by exact canonical_entity_key to test the composition mechanism — union and sum — not resolution quality. Exact-name grouping over-fragments variants into singletons by design here; resolution quality is §3/§6's concern, composition arithmetic is this one's.)

Precise statement: within a shared content-hashed policy, Enki composes by unioning the content-addressed grid and summing decision counts at the evidence level — mathematically clean because counts are sufficient statistics — provided evidence is deduplicated and veto rules are unioned separately. Accumulation of sufficient statistics, not gradient training: weaker than retraining, but exact, associative, auditable. scripts/eval_federation_summation.py.

7b. Live cross-node reproduction (real, not synthetic)

Beyond the disjoint-halves arithmetic above, the full sense-collapse relationship graph was rebuilt from scratch on an independent database — fresh schema, the same corpus (30,826 paragraphs / 18,733 mentions), the same content-hashed lexicon packs — and produced a byte-identical fingerprint: a4a718bbcf6118c5 (9,216 edges), reproduced twice (a manual recipe and a packaged one-shot runner). Because every address is a hash of content and the policy is content-hashed, an independent node must land on the same graph.

This is now demonstrated across two genuinely different physical machines. Node A is the production server (G1: dedicated host, RTX 5060 GPU, full Docker stack). Node B is an ordinary laptop with no GPU, running the collapse from a native Python environment (not the app container) against a standalone Postgres rebuilt from the same image Dockerfile. A portable bundle carried the corpus envelope and the gitignored WordNet / sense-frequency packs; the four remaining lexicon inputs were verified byte-identical before the run (collapsepolicy 8ad54ddf, function_words 25259097, general 949f8c87, legal 4c0f210f). Node B produced:

node B (laptop, no GPU, native) : a4a718bbcf6118c5   (9,216 edges, max raw count 178)
node A (G1, docker stack, GPU)   : a4a718bbcf6118c5   (9,216 edges, max raw count 178)

Not only the final fingerprint but every intermediate (edge count, max raw count) matched. Note that no GPU is involved on either side — the collapse is deterministic arithmetic over content-addressed inputs; a GPU only participates in the upstream (non-deterministic) extraction, which federation deliberately does not re-run. A no-GPU machine reproducing the server's graph is therefore the strongest form of the claim: the §7 composability property holds on commodity hardware, with nothing shared but the corpus + packs + content-hashed policy. scripts/run_federation.sh, scripts/fp_cooccur.py.

8. Erase × merge convergence (erasure composes too)

Add-only composition (§7) is associative and exact, but erasure is subtractive — the two must be shown to commute or they contradict. Modeling the tombstone set as a remove-wins CRDT (a tombstone dominates any incoming counts for that address):

claimresult
merge(A,B) == merge(B,A) with an erasureTrue, bit-for-bit
erased-on-A entity stays erased after merging B (which carries counts for it)True (tombstone dominates)
negative control: naive add-only merge (no remove-wins)resurrects the erased entity

So federated erasure and exact composition are reconciled: the count channel is add-only (sufficient statistics); the tombstone set is remove-wins. Order-independent either way. scripts/eval_erase_merge.py. Residual (honest): this proves the semantics converge; a peer that is offline or declines the tombstone is not reached (federated deletion is best-effort for non-cooperative peers).


9. Sense layer — measured on a real corpus (the gap §8 named, now closed)

The entity layer (§§1–8) and the sense layer share one equation; until now only the entity layer carried numbers. The sense layer is now measured on real corpora (Epstein depositions in enki_research; COVID-origins House testimony), the lexicon loaded as reference data: WordNet oewn-2023 (120,135 senses / 212,033 form→sense bindings / 228,300 relations), a sense-frequency pack (212,071 per-lemma sense ranks), a morphology exception pack (4,416 irregulars), a function-word pack (165), and curated domain packs (general / legal / biomed).

9a. Reproducible validation harness — 25/25 PASS

scripts/validate_language_layer.py (read-only over the loaded lexicon) asserts, and all pass: governed/deterministic policy; abstain-on-ambiguity (pos-collision forms like use/research park); frequency dominant-sense correctness (fire→burning, deny→"declare untrue", admit→concede, increase/difference/evidence); morphology (using→use, went→go, mice→mouse; using no longer mis-collapses to "exploit"); pin-overrides-frequency (circumvent bare→"surround"/military vs +FOIA-pin→"avoid answering"); curated pins resolve (verified-bound, not inert); determinism (same form → same collapse).

9b. Frequency dominant-sense prior — coverage up, precision up

Adding the dominant-sense prior (1/sense_rank — the most-frequent-sense baseline) roughly doubled general-noun term coverage on real statements (~25-30% → ~50%) while raising precision: forms that previously mis-collapsed to obscure senses now take the common sense or safely park. The statement-promoter (sentence → agent + predicate-sense + polarity + resolved terms) structures 8/8 COVID statements (~52% term coverage) and 8/11 depositions (~35%; the rest are verbless descriptive lines, correctly unstructured) with one unchanged mechanism across legal and medical text — the cross-domain generality claim, measured.

9c. Entity-aware sense resolution (the decisive before/after)

The sense layer and the entity layer partition a text — proper nouns are entities and must not resolve to WordNet common senses. Corpus co-occurrence over the Epstein depositions makes the point:

top co-occurrence edges
blind (sense layer on entity-laden text)garbage — Epstein→"British sculptor", Maxwell→"physicist", palm→"hand", united→adjective
entity-aware (mask mention spans + capitalization guard) + prose-only + domain pins9,216 clean edgesquestion~answer~ask, remember~know~state~think~tell~see, objection~foundation, give~massage

9d. Honest negatives (the boundary, measured)

  • Joint-context via WordNet taxonomy does not work. The collocation axis with

relation-neighbours as context is inert when restricted to be safe (0/97 parked terms resolved on real corpora) and harmful when naive (a low-degree obscure sense wins on one coincidental match in a document soup). Reliable joint-context needs learned corpus co-occurrence, not taxonomy intersection.

  • Always-ambiguous pos-collision forms park unless pinned; resolving them from

context needs part-of-speech priors / EM (open — parking is safe by design).

Full reference + reproduction: docs/LANGUAGE_INTERLINGUA_VALIDATION_2026-06-25.md.


10. Text-quality validation — keeping garbage out of the meaning graph

Real corpora carry garbage: OCR errors and corrupted source PDF text layers. On enki_research, 8.3% of paragraphs the classifier had labelled prose were doubled-character OCR garbage (CCaassee 11::1155… = "Case 1:15-…" with every char doubled) — and most came from pdfplumber reading bad embedded text, not our OCR worker, so the fix is source-agnostic: judge the TEXT, not the provenance.

The lexicon is the oracle. core/documents/text_quality.py scores a chunk deterministically — no LLM — on four signals: dict_ratio (lemmatized fraction of tokens that are real lexicon words; OCR gibberish → ~0), func_ratio (function-word fraction; tabular records have ~none), has_predicate (a real verb resolves), and structure sanity (doubled-char runs, alpha ratio). It then routes into four lanes:

lanemeaninghandling
passreal sentenceinto the meaning graph (auto)
recordtabular / low function-wordsto the record extractor (auto)
dropgarbage / no real wordsraw, searchable evidence only (auto)
flagreal content, incomplete structurehuman review queue

Validation (component): OCR garbage 29/30 rejected, flight-log records 30/40 rejected, real deposition prose 58/60 accepted — dict_ratio is 0.00 on garbage vs 0.4-0.7 on real sentences.

Corpus run (the evidence): scored 25,315 candidate paragraphs — pass 72% · record 7% · drop 12% · flag 7%. So 92% routes deterministically and only 7% needs a human eye. Garbage (court-reporter footers, doubled-char scans) drops; records (police-report headers, legal citations) route away; the flag queue holds the genuinely ambiguous (date-heavy report lines, citations, form text). The collapse worker now gates sense resolution on this verdict, and the verdict is persisted (validation_score, review_status; migration 131) so the human reviews only the flagged middle — a band that shrinks as the lexicon/frames/co-occurrence improve. This is the abstain-rather-than-mis-map discipline applied to text quality: the non-deterministic part is flagged, never guessed, and never silently admitted.


11. What is human-validated vs. machine-deterministic (the trust question)

A fair reader asks: which of these numbers did a human check? Stated plainly:

  • Human-reviewed: the contradiction findings are surfaced automatically but read

and confirmed by a human against the two source quotes (e.g. Fauci/Collins "did not provide edits" vs. the drafting evidence); the gold-precision slice (§3b) is defined by human-verified unique-identifier entities; the OCR flag band (§10) is the explicit human-in-the-loop lane. Every public investigation finding is a verbatim quote copied character-for-character from the cited primary document and checked against it before publishing — see the real examples with sources at /explore (the Epstein record) and /covid (COVID origins), each line linking the official document.

  • Machine-deterministic (reproducible by anyone, no human in the loop): the

determinism fingerprints (§5–6), composability-by-summation (§7), erase/merge convergence (§8), the sense harness (§9, 25/25), and the text-quality band distribution (§10, 25,315 paragraphs). These are not opinions — they are functions of content-hashed inputs and re-run to the same result.

So the trust model is two-layered: determinism/auditability is proven by construction (anyone re-runs and gets the same hashes), and factual findings are human-checked against primary sources (and the uncertain residue is flagged for a human, never guessed). The reference implementation is in Appendix A (below) — the actual code, inline, no repository access required.


Caveats (stated, not hidden)

  • Accuracy is not the claim. Name-evidence selective prediction has no

high-precision knee (§3c); identifier precision is high but covers 0.8%. The claims are determinism, composability, auditability, and abstention-as-safety.

  • Weights are hand-set (id=10, exact=3, loose=1.5); learnable from the decision

log (the prior channel already accumulates the sufficient statistics).

  • Sense-layer collapse is now measured on a real corpus (§9) — 25/25 reproducible

harness checks, dominant-sense coverage ~25-30%→~50%, cross-domain statement structuring (legal + medical), and an entity-aware corpus relationship graph (9,216 clean edges). What remains unquantified is a gold-labelled sense-accuracy number (no sense-tagged gold set yet) and context-driven resolution of always-ambiguous pos-collision forms (needs pos-priors/EM; §9d). The entity+sense unification is demonstrated on both layers now.

  • Determinism-drift includes a deterministic comparator (sorted-greedy) and VI

calibration anchors (§6); a learned strong baseline (correlation clustering / HAC) would tighten quality further. The gold/risk-coverage slices are small where labels are clean (verified labels cost coverage).

  • Erase×merge convergence is proven for the semantics (remove-wins); cross-node

enforcement is best-effort (offline/non-cooperative peers).


Figures

Portable text figures for the whitepaper (render anywhere markdown does; promote to SVG/PNG at typeset time). Every number traces to EVIDENCE.md.


Figure 1 — The chain (one signal, end to end)

   CAPTURE                signed signals (connectors / paired devices)
      │                   source_domain · timestamp · device_signature
      ▼
 ┌─ LDU ──────────┐       permissive admission — every form gets a home;
 │  "open mouth"  │       nothing rejected; provisional until governed
 └────────────────┘
      ▼
   EXTRACTION            LLM → mapper → typed candidates    (LLM PROPOSES,
      │                  (entities · claims · events · places)   never writes)
      ▼
 ┌─ GDU GATEWAY ──┐      the SINGLE authorized write path;
 │  "locked jaw"  │      content-hashed packs govern; FAIL-CLOSED
 └────────────────┘      (ungoverned → parked)
      ▼
   COLLAPSE              S = Σ wₖ·eₖ , veto , θ/δ  →  COMMIT | PARK
      │                  one policy: entities (id/exact/loose) & senses
      ▼
   BASE-60 GRID          FORM 11.<hash> → CONCEPT 12.<pos>.<hash> → INSTANCE
      │                  content-addressed → identical on every node
      ▼
   GOVERNANCE            reversible decisions · hash-chained log · erasure
      │
      ▼
   FEDERATION            union grids + sum priors (shared policy hash);
                         remove-wins tombstones; Ed25519 deltas

   ── permissive capture ───────────────► fail-closed canonicalization ──

Figure 2 — The content-addressed grid (three layers)

                       "lead"  (a surface form)
                          │
            ┌─────────────┴─────────────┐  polysemy fan-out
   FORM     │  11.0.<sha("lead")>        │  (form_concept_bindings)
            └─────────────┬─────────────┘
                ┌─────────┴──────────┐
   CONCEPT  12.0.<sha(lib:lead-metal)>   12.0.<sha(lib:lead-role)>
            (domain=chemistry)           (domain=business)
                │           relation edges (concept_relations) → expansion
                │  hyponym/synonym…
   INSTANCE  <entity: "Lead, the element"> ……… canonical_entity_key
            (a concept may promote to a real-world referent)

   address = hash(content), never a node-local counter
   ⇒ same input → same address on EVERY node ⇒ models SUM (Fig. 5)

Figure 3 — Risk–coverage: the honest negative (name evidence, id withheld)

 precision
  100% ┤·············································  (ideal)
   90% ┤- - - - - - - - - - - - - - - - - - - - -   ← target knee (never reached)
   80% ┤
   70% ┤                         ● unique-name 71.5% @ 39%
   60% ┤
   50% ┤        full-set ~46% ●
   40% ┤
   30% ┤
   20% ┤
   10% ┤   ● unique-id 8.6% @ 15%
    0% ┼────┬────┬────┬────┬────┬────┬────┬────┬──►
        0%  5%  10%  15%  20%  25%  30%  35%  40%   coverage

   No operating point reaches ≥90% precision by NAME evidence on any slice.
   Raising θ LOWERS precision (lone exact-match to a wrong same-named entity).
   ⇒ identifiers carry precision (98%, but 0.8% coverage); the system ABSTAINS.
     Abstention is a SAFETY property, not a high-precision-commit claim.

Figure 4 — One worked collapse (synthetic, reproducible)

 "The mine increased  lead  output."        w = {id:10 exact:3 loose:1.5
                       └── form                   register:1 domain:1
                                                  collocation:2 prior:1}
                                              θ = 1.0   δ = 0.5
 candidate              id·10 exact·3 domain·1 colloc·2 reg·.5  →  S
 ─────────────────────────────────────────────────────────────────────
 lead-metal 12.0.f205…    0     3.0     1.0      1.0     0.5    = 5.5  ◄ winner
 lead-role  12.0.1c0c…    0     3.0     0.0      0.0     0.5    = 3.5

 θ-test: S* = 5.5 ≥ θ=1.0           ✓ (enough absolute evidence)
 δ-test: 5.5 − 3.5 = 2.0 ≥ δ=0.5    ✓ (a clear winner)
                                     ⇒ COLLAPSE → lead-metal  (reversible decision)

 strip domain context → both = 3.5 → margin 0 < δ → PARK (knows it doesn't know)
 entity case: same CIK ⇒ id=1·10 ⇒ merge;  different CIK ⇒ veto ⇒ never merge

Figure 5 — Composability: union grids + sum priors (and erase × merge)

   node A                      node B
   grid_A  {a,b,c}             grid_B  {c,d}            (c = shared content-address)
   priors  a:2 b:1 c:2         priors  c:2 d:3

        merge (shared policy hash) ─ commutative, exact:
        ┌───────────────────────────────────────────────┐
        │  grid   = grid_A ∪ grid_B      = {a,b,c,d}      │
        │  priors = priors_A + priors_B  = a:2 b:1 c:4 d:3│  (counts = suff. stats)
        └───────────────────────────────────────────────┘
        == a single node that ingested A∪B           (bit-identical, fp 5a2b44b7)

   negative controls (proven):  policy_hash(A) ≠ policy_hash(B) → REJECT
                                overlapping evidence            → double-counts

   erase × merge (remove-wins):  A erases c   ⇒  T = {c}
        final grid = (grid_A ∪ grid_B) − T ,  priors drop c
        merge(A,B) == merge(B,A)  with erasure   (tombstone dominates B's c:2)
        naive add-only (no remove-wins)          → RESURRECTS c   (why remove-wins)

Provenance, Privacy, and Dual-Use

Enki resolves real, named people and organizations into permanent, content-addressed, federated, summable records. That combination — identifiable individuals × permanent addresses × cross-node accumulation — demands an explicit account of where the data comes from, what the privacy basis is, and what the surveillance implications are. This section states them plainly, including the tensions the architecture does not fully resolve.

1. Data provenance

The corpus measured in this work is drawn from open sources and public records — news, regulatory filings (SEC), public registries (Wikidata), court and FOIA documents — captured through signed connectors. Every signal carries its source (source_domain), capture time, and a device signature, and every derived record is traceable to the signals that produced it through the hash-chained admission log. Provenance is therefore queryable: for any entity or claim, the system can list the exact source documents behind it. This is a precondition for the auditability claim, not a nice-to-have.

"Public record" is not "privacy-free." Court filings and FOIA releases routinely contain third-party PII — witnesses, victims, bystanders — who are not public figures and did not consent. Open availability is not a lawful basis for arbitrary processing of those individuals; minimization and the erasure path (§2) apply to them as much as to anyone, and a deployment over such sources inherits that obligation.

For this paper, public-facing examples use synthetic or public-record subjects, not private individuals. The worked example in the system document is synthetic; the quantitative results are reported in aggregate.

2. Privacy and legal basis

The legal basis for processing depends on the deployment, not the software: an open-source-intelligence deployment over public records, a law-enforcement deployment under warrant/authority, and a corporate-investigations deployment under contract are different bases with different obligations. Enki's design positions it to support compliance rather than presuppose it:

  • Provenance + audit make data-subject access requests answerable (what is held,

and on what source basis).

  • Right-to-be-forgotten is a first-class operation (erase_entity): it deletes

the entity's PII-bearing row, writes a tombstone, and blocks re-creation of that content-address across re-ingestion. The hash-chained audit log stores references (id, address, type, mutation hashes), not raw PII, so erasure does not break the chain (no immutability conflict). Across nodes it composes under remove-wins (a tombstone dominates incoming counts), proven order-independent. Two residuals, stated honestly, not glossed:

  • The tombstone's address is a hash of the entity's identity. A hash of personal

data is pseudonymized, not anonymized: a party already holding the name can recompute the hash and confirm membership — so the tombstone is a confirmation oracle, not a clean forget. Mitigation is a salted/keyed address for the tombstone set (at a cost to cross-node determinism that must be confronted); absent that, the address reveals nothing on its own but does not preclude confirmation. We also scrub free-text fields on erasure so a name cannot survive incidentally.

  • Cross-node erasure is best-effort. Once a signed delta has propagated, a peer

that is offline or declines to honor the tombstone keeps the data. Federated deletion is enforceable only for cooperative, reachable peers — a hard limit of any federated system, which we name rather than imply away.

  • Fail-closed canonicalization means ungoverned, speculative inferences are

parked, not asserted — the system does not manufacture claims about people.

What the software cannot do is establish a lawful basis or bound a deployment's scope; those are operator obligations. We state this rather than imply the software discharges them.

3. Dual-use and surveillance implications (the honest tension)

Enki is an intelligence-fusion substrate. The same properties that make it trustworthy for a warranted investigation — permanent addresses, cross-node accumulation, composability-by-summation — also make it a powerful surveillance instrument if deployed without authority. We do not claim the architecture neutralizes this; we claim it makes misuse visible and constrained in ways opaque systems do not:

  • Auditability over opacity. Every link is an attributable, reversible record. A

neural dossier cannot say why two records were merged or who caused it; Enki can. Accountability requires this.

  • Abstention over fabrication. The system parks on ambiguous evidence instead of

asserting a guess, reducing the false-association harm that is most damaging to individuals.

  • Governed, content-hashed policy. What the system is permitted to assert is

pinned to a versioned, inspectable policy — changes are diffs, not silent drift.

  • Erasure as a primitive, not an afterthought (§2).

These are mitigations, not guarantees — and one contingency must be explicit, because it's the uncomfortable core: auditability constrains an operator only if that operator is subject to an external auditor. The operator controls the policy packs and the audit log, so a deployment that removes external oversight is self-auditing surveillance — the architecture cannot, by itself, hold an unaccountable operator accountable. What it provides is the substrate for accountability (attributable, reversible, inspectable records); the accountability itself requires an external party with authority to inspect. We state this as a precondition, not an implied property. Likewise, "addresses are permanent" is in genuine tension with deletion — mitigated by erasure-as-primitive (with the residuals in §2), but deployment governance (access control, authority, retention limits, independent oversight) carries the rest, and is a requirement on deployments, not a feature of the software.

4. Positioning

Enki is offered for authorized use — open-source intelligence over public records, law enforcement under proper authority, and corporate investigations under contract — invite-only, not a general public surveillance product. We are explicit that "invite-only" is a business assurance, not an architectural one — every dual-use vendor says it, and it carries no technical weight. The architectural choices (audit, abstention, governance, erasure) make authorized uses accountable to an overseer; they are deliberately not sufficient, by themselves, to make unauthorized use safe. That gap is closed only by external governance — and we say so rather than let the architecture take credit for accountability it can only enable, not enforce.


Appendix A — Reference implementation (the load-bearing code, inline)

The substrate's claims rest on small, deterministic functions, reproduced here so a reader (or an agent) can verify the algorithm without repository access — this is the readable code, not a link out. Each is the real function, lightly condensed.

A.1 The one governed scoring equation (collapse)

# core/collapse/equation.py — the single equation reused for entity AND sense collapse.
def score(c, x, policy):
    for name, enabled in policy.vetoes.items():            # hard eliminations first
        if enabled and VETO_FNS.get(name) and VETO_FNS[name](c, x):
            return Scored(c, NEG_INF, {}, vetoed=True, reason=name)
    s = 0.0
    for axis, w in policy.weights.items():                 # weighted evidence, policy-driven
        fn = EVIDENCE_FNS.get(axis)
        if fn is None: continue
        s += w * fn(c, x)
    return Scored(c, s, ...)

def collapse(candidates, x, policy):
    live = sorted((score(c, x, policy) for c in candidates if not vetoed),
                  key=lambda s: (-s.score, s.candidate.address))   # deterministic tiebreak
    top = live[0]; second = live[1].score if len(live) > 1 else -inf
    if top.score >= policy.theta and top.score - second >= policy.delta:
        return CollapseResult("collapse", top, ...)        # confident → commit
    return CollapseResult("park", None, ...)               # ambiguous → abstain

Weights, vetoes, and the thresholds θ/δ live in a content-hashed policy pack, not in code — so the equation is identical (and deterministic) on every node, which is what makes composition-by-summation valid.

A.2 The two evidence axes that drive sense disambiguation

@evidence_axis("frequency")          # the dominant-sense prior
def _e_frequency(c, x):
    # 1/sense_rank from WordNet's frequency ordering — the most-frequent-sense default
    # absent other evidence (fire→burning, admit→concede). 0 when unranked.
    return max(0.0, min(1.0, float(c.freq)))

@evidence_axis("pin")                # domain-conditional override
def _e_pin(c, x):
    # a domain/curated pack asserts THIS sense for the form in context (circumvent→
    # "avoid answering" in FOIA, not the dominant "surround"). Weighted above all soft
    # evidence, below only the hard identifier.
    return max(0.0, min(1.0, float(x.pins.get(c.address, 0.0))))

A.3 The text-quality gate (keeping OCR/garbage out of the meaning graph)

# core/documents/text_quality.py — deterministic, lexicon-backed; no LLM.
def chunk_quality(cur, text, min_dict_ratio=0.5, min_func_ratio=0.12):
    toks = words(text); distinct = set(toks)
    # dict_ratio: fraction of tokens that are real lexicon words (lemmatized).
    #   OCR gibberish ("ccaassee") → ~0; real sentences → 0.4–0.9.
    known = {f for f in lexicon_forms(distinct | lemmas(distinct))}
    dict_ratio = sum(t in known for t in distinct) / len(distinct)
    # func_ratio: function words (the/was/and/…). Sentences flow through them; records ~0.
    func_ratio = sum(t in FUNCTION_WORDS for t in toks) / len(toks)
    has_predicate = any_token_binds_to_a_verb(distinct)
    structure_ok = not ocr_doubled(text) and alpha_ratio(text) >= 0.55
    is_real = dict_ratio >= min_dict_ratio and func_ratio >= min_func_ratio \
              and has_predicate and structure_ok
    return {...}

def band(v):                          # four lanes; the human sees only "flag"
    if v["is_real_sentence"]: return "pass"      # → meaning graph (auto)
    if v["dict_ratio"] < 0.2 or not v["structure_ok"]: return "drop"   # garbage → raw evidence
    if v["func_ratio"] < 0.08: return "record"   # tabular → record extractor (auto)
    return "flag"                                 # real-but-incomplete → human review

A.4 Reproduce the validation

The validation harness (validate_language_layer) asserts 25 properties over the loaded lexicon and prints 25/25 checks PASSED. It is deterministic: same lexicon + same content-hashed policy → same result on any node. The governed policy pack, the WordNet pack, and the sense-frequency pack are the only inputs.

Architecture paper by Brad Harris · Enki Systems · enkisystems.com. The empirical companion is the Field Report.