# Enki: A Deterministic, Governed Substrate for Composable Entity and Sense Resolution

**Architecture Paper** — defines the governed substrate and its guarantees (determinism,
abstention, auditability, composability). For one empirical result the substrate makes
possible, see the [Field Report: Offline Federated Extraction](../WHITEPAPER_OFFLINE_FEDERATED_EXTRACTION_2026-06-23.md).

**Author:** Brad Harris (Founder & Chief Architect, Enki Systems)  ·  **Status:** draft  ·  **Companion docs:**
[SYSTEM](SYSTEM.md) · [EVIDENCE](EVIDENCE.md) · [PROVENANCE_ETHICS](PROVENANCE_ETHICS.md) · [FIGURES](FIGURES.md)

## Motivation

Two analysts — or a node and its federated peer — ingest the same case file and must
reach the *same* graph of who-is-who, be able to *show why* each link was made, *undo*
any of them, and *merge* their findings without retraining a model. Today's learned
linkers fail all four: reorder the inputs and the answer drifts, the model can't
explain a merge, and two models don't combine. And because this is fusion over real,
named people, those four properties — determinism, audit, reversibility, mergeability
— are *requirements*, not niceties. That is the problem Enki addresses.

## Abstract

We present Enki, a substrate for all-source information fusion whose design goal is
**not** state-of-the-art accuracy but four systems properties opaque learned systems
cannot provide: **determinism** (identical inputs yield byte-identical graphs on every
node), **auditability** (every resolution is a reversible, attributable record),
**composability** (nodes merge knowledge by *addition*, not retraining), and
**abstention** (ambiguous evidence is parked, not guessed — a safety property). Enki
resolves entities (and, architecturally, word senses) through a *single governed
scoring policy* — a log-linear scorer with a hard veto and a reject-option threshold —
over a content-addressed grid; because addresses and policy are content-hashed, two
installations assign the same referent the same address and combine by **summing
sufficient statistics**. On real captured data (64k entities from 44k signals) we
show: **determinism by construction** (reproducible fingerprints across shuffled
orders) where a textbook streaming linker produces eight different clusterings of the
same 1,499 mentions and Enki produces one; **composition that is exact and
order-independent**, with the predicted failure modes (incompatible policy hash,
overlapping evidence) demonstrated as negative controls, and **erasure that composes**
under remove-wins semantics. On accuracy we report an honest **negative**: identifier
resolution is precise (98%) but covers 0.8% of entities, and name-evidence selective
prediction has **no high-precision operating point** on any slice (≤71% precision, no
≥90% knee) — which is *why* the architecture is identifier-first and abstains. The
resolution math is deliberately standard; the contribution is the architecture and its
guarantees, reported with the experiment that could have refuted them.

## 1. Contributions

1. **Determinism by content-addressing.** Identical inputs produce byte-identical
   graphs on every node, by construction (addresses are content hashes; scores use a
   stable axis order + canonical tiebreak). A textbook streaming linker drifts to 8
   clusterings on reordered input; Enki gives 1 (§6 of [EVIDENCE](EVIDENCE.md)).
2. **Composability-by-summation**, stated precisely (sufficient statistics; sum at the
   evidence level; valid iff policy hashes match; dedup required; vetoes union
   separately) and **proven with negative controls** — including that erasure composes
   under remove-wins (§7–§8 of [EVIDENCE](EVIDENCE.md)).
3. **Governance as substrate**: fail-closed canonicalization, reversible auditable
   decisions, and right-to-be-forgotten erasure as a first-class primitive
   (§5 of [SYSTEM](SYSTEM.md); [PROVENANCE_ETHICS](PROVENANCE_ETHICS.md)).
4. **An honest characterization of resolution accuracy** — a *reported negative*:
   identifier resolution is precise but sparse, and name-evidence selective prediction
   has no high-precision knee, which motivates the identifier-first + abstention design
   (§3c of [EVIDENCE](EVIDENCE.md)). The engine is *architecturally* one governed
   scorer for both entity and sense resolution; we demonstrate it on **entity**
   resolution — the sense path is wired live and harness-validated but **not yet
   measured on a real corpus**, so we make the weaker, supportable claim.

## 2. The system

The end-to-end chain — capture → LDU (permissive admission) → extraction (LLM
proposes) → GDU gateway (single fail-closed write path) → collapse (the equation) →
content-addressed grid → governance → federation — is described in
[SYSTEM.md](SYSTEM.md), with the substrate, the equation, a worked collapse, and the
governance spine. The defining stance is the asymmetry: **permissive capture,
fail-closed canonicalization.**

## 3. Evaluation

All results are on real captured data and reproducible; see [EVIDENCE.md](EVIDENCE.md).
We report determinism fingerprints, the determinism-drift baseline, the two-node
composability + erase×merge experiments with negative controls, **confluence**
(the same graph fingerprint under randomized processing order, and reproduced on a
node rebuilt from scratch — so independent miners converge regardless of order or
grouping), the **risk-coverage curve** (the decisive accuracy test — reported as a
negative), and honest data-quality measurements (identifier sparsity and contamination). Scale is framed as *prototype
feasibility*, not a scale result; accuracy is explicitly **not** claimed as a strength.

## 4. Limitations

Stated, not hidden: weights are hand-set (learnable from the decision log via
max-entropy fitting, since the `prior` channel already accumulates the sufficient
statistics); name-evidence selective prediction has **no high-precision operating
point** (§3c), so high-precision resolution depends on identifiers, which cover ~0.8%
of entities and are ~7–10% contaminated; abstention is a *safety* property and needs an
absolute-confidence rule to fully deliver it; the sense path is integrated but
unmeasured on a real corpus; the determinism-drift baseline needs a strong
deterministic comparator + VI calibration; cross-node erasure is best-effort for
non-cooperative peers; and the tombstone address is a hash of PII (a confirmation
oracle, not anonymization — see [PROVENANCE_ETHICS](PROVENANCE_ETHICS.md)).

## 5. Related work

Enki is an assembly of well-understood components; we name them so the architectural
contribution is clear by contrast.

- **Fellegi, I. P., & Sunter, A. B. (1969).** A Theory for Record Linkage. *JASA.* —
  identifier-first matching and the match / possible-match / non-match split (our
  collapse / park / create).
- **Chow, C. K. (1970).** On Optimum Recognition Error and Reject Tradeoff. *IEEE
  Trans. Information Theory.* — the threshold-plus-abstain decision.
- **Berger, A., Della Pietra, S. & Della Pietra, V. (1996).** A Maximum Entropy
  Approach to Natural Language Processing. *Computational Linguistics.* — the `Σ w·e`
  log-linear scorer.
- **Yarowsky, D. (1995).** Unsupervised Word Sense Disambiguation Rivaling Supervised
  Methods. *ACL.* — bootstrapping / feedback priors.
- **Navigli, R. (2009).** Word Sense Disambiguation: A Survey. *ACM Computing
  Surveys.* — the WSD feature landscape.
- **Collins, A. M., & Loftus, E. F. (1975).** A Spreading-Activation Theory of
  Semantic Processing. *Psychological Review.* — expansion / relation walk.
- **Miller, G. A. (1995).** WordNet: A Lexical Database for English. *CACM*; Fellbaum
  (1998). — the ontology.
- **Merkle, R. (1987); Git; IPFS.** — content-addressed storage.
- **Pitman–Koopman–Darmois** (exponential families / sufficient statistics) — the
  formal basis for "counts add."

## 6. Conclusion

Enki shows that an information-fusion substrate can be deterministic, abstaining,
auditable, and composable-by-summation using only standard resolution math, by making
addresses and policy content-hashed data and treating every decision as a governed,
reversible record. The math is old; insisting it all be the *same* governed scorer over
a content-addressed grid — and the determinism, composability, and erasure guarantees
that follow — is the contribution.

---

# Enki: A Deterministic, Governed Substrate for Composable Resolution

**Thesis.** *Enki is a deterministic, content-addressed, governed substrate that
makes entity and sense resolution **abstaining, auditable, and composable-by-
summation** across federated nodes.*

It deliberately uses standard, well-understood resolution math so that the substrate
stays deterministic and auditable. It does **not** claim state-of-the-art accuracy;
it claims that the *same* inputs always produce the *same* graph, that every decision
is reversible and attributable, and that two nodes' knowledge combines by **addition**
rather than retraining. This document traces one signal end to end — the chain — and
names every governance boundary it crosses.

---

## 1. Why a substrate, not a model

All-source intelligence fusion has a requirement most NLP systems do not: **two
analysts (or two installations) reading the same documents must reach the same
conclusions, and must be able to show why and undo it.** A neural entity linker is
opaque (cannot explain a link), non-reproducible (reorder the inputs, get a different
answer), and uncombinable (parameter averaging is non-linear). Enki trades raw
accuracy for four properties that opacity cannot provide:

1. **Determinism** — same inputs → byte-identical graph, on every node.
2. **Abstention** — when the evidence is ambiguous, it parks rather than guesses.
3. **Auditability** — every resolution is a reversible, attributable record.
4. **Composability** — nodes merge by summation under a shared governing policy.

These are *systems* properties. The resolution math underneath is intentionally
ordinary (see §6) so that the substrate's guarantees hold.

---

## 2. The chain — one signal, end to end

```
CAPTURE          signed signals from connectors / paired devices
  │              (signal_type, source_domain, observables, device_signature)
  ▼
LDU  "open mouth"   permissive admission — every form gets a home immediately,
  │                 nothing is rejected; provisional until governed
  ▼
EXTRACTION       LLM → mapper → typed candidates (entities, claims, events, places)
  │              boundary: the LLM PROPOSES; it never writes to the canonical store
  ▼
GDU GATEWAY  "locked jaw"   the SINGLE authorized write path; content-hashed packs
  │                          govern; fail-closed — ungoverned input is parked
  ▼
COLLAPSE         one governed scoring policy:  S = Σ wₖ·eₖ, veto, θ/δ → commit | park
  │              entities (id/exact/loose) AND senses (register/domain/collocation)
  ▼
BASE-60 GRID     three content-addressed layers, deterministic addresses:
  │                FORM 11.<hash(surface)>  →  CONCEPT 12.<pos>.<hash(lib:sense)>
  │                                         →  INSTANCE <cat.sub…> (entities)
  ▼
GOVERNANCE       reversible collapse_decisions; hash-chained admission log;
  │              right-to-be-forgotten erasure (PII removed; tombstone retires address)
  ▼
FEDERATION       nodes compose by SUMMATION under a shared policy hash
                 (union grids, sum priors; Ed25519-signed deltas; tombstones propagate)
```

Each boundary is a deliberate constraint. **LDU is permissive** (capture loses
nothing); **GDU is fail-closed** (nothing becomes canonical unless a content-hashed
pack governs it); **the LLM is advisory** (it proposes candidates, the governed
gateway decides). The asymmetry — permissive capture, fail-closed canonicalization —
is the core engineering stance.

---

## 3. The substrate: a content-addressed grid

Every surface form is a **distribution over addresses** in a base-60 grid. Three
layers, all addressed by a hash of their *content*, never by a node-local counter:

| layer | address | example |
|---|---|---|
| FORM | `11.0.<sha(surface)>` | every string ("lead", "Boeing") |
| CONCEPT | `12.<pos>.<sha(library:sense_id)>` | every meaning (lead-the-metal, lead-the-sales-role) |
| INSTANCE | `<cat.sub.…>` + `canonical_entity_key` | every real-world referent |

Because an address is a pure function of content, **two nodes that load the same
library place the same meaning at the same address** — which is exactly what makes
determinism hold by construction and composition possible (§7). The base-60 radix is
presentational; the determinism comes from the hashing.

---

## 4. Collapse and expansion — one equation

For a form *f* in context *C* with candidate addresses *A*:

> **S(a | f, C) = Σₖ wₖ · eₖ(a, f, C)**, subject to veto if any enabled gⱼ fires;
> **collapse to argmaxₐ S** iff S(a\*) ≥ θ **and** S(a\*) − S(a₂) ≥ δ, **else PARK.**

The evidence axes eₖ ∈ [0,1] are feature functions; the weights wₖ, the vetoes, and
the thresholds θ/δ all come from a **content-hashed policy pack**, not code. Entity
collapse uses `id`/`exact`/`loose` with an `identifier_conflict` veto; sense collapse
uses `register`/`domain`/`collocation`/`frequency`/`pin` — `frequency` is the
dominant-sense prior (1/sense_rank from WordNet's sense ordering, the most-frequent-sense
default absent other evidence) and `pin` is a domain-conditional override (a domain pack
asserting the correct sense, weighted above the frequency default, below only `id`); the
engine is the same. The inverse,
**expansion**, reads the distribution back out — the meaning-fan (softmax of S) and a
relation walk over the concept graph.

This is presented as **a single governed scoring policy reused across decision
types** — design discipline, not mathematical novelty (§6).

### A worked collapse (synthetic, fully reproducible)

Document: *"The mine increased **lead** output."* Form `lead` binds to two senses —
`lead`(metal, domain=chemistry) and `lead`(sales-role, domain=business). Context
domain ≈ mining/chemistry. Weight vector (from the policy pack):
`w = {id:10, exact:3, loose:1.5, register:1, domain:1, collocation:2, prior:1}`;
thresholds `θ=1.0, δ=0.5`. Each eₖ∈[0,1]; absent-signal axes are neutral 0.5.

| candidate | id·10 | exact·3 | domain·1 | collocation·2 | other (reg .5,prior 0) | **S** |
|---|---|---|---|---|---|---|
| lead-metal `12.0.f205…` | 0 | 3.0 | 1.0 | 1.0 | 0.5 | **5.5** |
| lead-role `12.0.1c0c…` | 0 | 3.0 | 0.0 | 0.0 | 0.5 | **3.5** |

Two-part decision, both must hold: **θ-test** S(a\*)=5.5 ≥ θ=1.0 ✓ (enough absolute
evidence); **δ-test** margin 5.5−3.5 = 2.0 ≥ δ=0.5 ✓ (a clear winner) →
**collapse to lead-metal**, one reversible decision written. Strip the domain context
and both fall to 3.5; the margin → 0 < δ → the δ-test fails → **park** (the system
knows it doesn't know). For an entity: "Boeing" carrying CIK 12927 scores `id`=1
(weight 10) against an existing "The Boeing Company" with the same CIK → S≈10 → merge;
a second "Acme" with a *different* CIK triggers the `identifier_conflict` veto → that
candidate is eliminated, never merged.

**Why "byte-identical" holds.** The graph's identity is its *content-addressed
addresses* (hashes), not the float scores — fingerprints hash addresses. The score
itself is computed deterministically: axes are evaluated in the policy's
(insertion-ordered) key order, and ties break on a canonical key (`sort by −score,
then address`), so neither summation order nor a score tie can perturb the result.

---

## 5. Governance: the spine

- **Fail-closed.** A form no library governs is parked, never invented.
- **Reversible decisions.** Every collapse is a `collapse_decision` row
  (evidence-for, score, tier) — auditable and undoable; their aggregate is the
  `prior` evidence axis (the cyclical feedback that lets the model improve without
  retraining).
- **Right-to-be-forgotten.** `erase_entity` deletes an entity's PII-bearing row and
  writes a tombstone with audit fields and the retired content-address. Crucially the
  hash-chained admission log stores **references** (entity id, address, type, mutation
  hashes), **not the raw name/identifiers**, so deleting the row does **not** break the
  chain — there is no hard immutability-vs-deletion conflict (the chain hashes its own
  records, not the entity row). The admission path refuses to re-create a tombstoned
  address (re-creation blocked across re-ingestion), and erasure composes across nodes
  under **remove-wins** semantics (§8 of [EVIDENCE](EVIDENCE.md)). Two residuals are
  stated, not glossed: the address is a *hash of PII* (a confirmation oracle, not
  anonymization) and cross-node erasure is best-effort for non-cooperative peers — see
  [PROVENANCE_ETHICS](PROVENANCE_ETHICS.md).

---

## 6. Lineage (we own it)

The scorer is a log-linear / maximum-entropy model (Berger, Della Pietra & Della
Pietra 1996). The commit/park decision is a reject-option classifier (Chow 1970) and
mirrors the match / possible-match / non-match of Fellegi–Sunter (1969) record
linkage — our "park" is their clerical-review zone. The sense features are textbook
WSD (Yarowsky 1995; Navigli 2009); expansion is spreading activation (Collins &
Loftus 1975) over WordNet (Miller 1995). Content-addressing is the Merkle/Git/IPFS
pattern; "counts add" is the sufficient-statistics property of exponential families.
**There is no new equation.** The contribution is the *architecture*: one governed,
abstaining, auditable, composable scorer used uniformly across resolution decisions.

---

## 7. Composability (the load-bearing claim)

Because policy weights and addresses are content-hashed data, the equation is
identical on every node. **Therefore models add.** Precisely:

> Within a shared content-hashed policy, Enki composes by **unioning the
> content-addressed grid** and **summing decision counts at the evidence level** —
> mathematically clean because counts are sufficient statistics — **provided evidence
> is deduplicated and veto rules are unioned separately.** This is accumulation of
> sufficient statistics, not gradient training: weaker than retraining, but exact,
> associative, and auditable.

Two caveats are part of the claim, not exceptions to it: summation is valid **iff
`hash(policy_A) == hash(policy_B)`** (counts under different policies are not
comparable), and it requires **disjoint evidence** (overlapping corpora double-count;
the grid, a set, composes idempotently regardless). Both are demonstrated as negative
controls in the evaluation.

---

## 8. Evaluation & limitations

Full results in [`EVIDENCE.md`](EVIDENCE.md) — all on real captured data. Headlines:
determinism fingerprints reproducible across shuffled orders; a textbook streaming
linker produces 8 different clusterings of the same 1,499 mentions while Enki produces
one; two nodes merge to bit-identical state with the predicted limits demonstrated,
and erasure composes under remove-wins.

On **accuracy** we report an honest negative (§3c of EVIDENCE): identifier resolution
is precise (98%) but covers 0.8% of entities, and name-evidence selective prediction
has **no high-precision operating point** (≤71% precision, no ≥90% knee at any
threshold). This is *why* the architecture is identifier-first and **abstains** —
abstention is a *safety* property (it avoids false-association harm), not a
high-precision-commit claim. The low name-precision is evidence *for* the design, and
we publish the experiment that could have refuted the thesis.

Stated limitations: weights are hand-set (learnable from the decision log); the sense
path is now **measured on a real corpus** (§9 of [EVIDENCE](EVIDENCE.md) — 25/25
reproducible harness, dominant-sense coverage ~25-30%→~50%, cross-domain structuring,
entity-aware co-occurrence graph), though a *gold-labelled* sense-accuracy number and
context-resolution of always-ambiguous pos-collision forms (pos-priors/EM) remain open;
scale is *prototype feasibility*, not a scale result. The contribution is the
substrate's guarantees, not the numbers.

---

# Enki Whitepaper — Empirical Evidence (real captured data)

Raw material for the evaluation section. Every number here is from **real data
captured and processed by the live G1 node**, measured in an isolated read-only
snapshot (`enki_harness`). Methodology and caveats are stated so the paper can be
honest about exactly what each figure does and does not show.

Thesis it serves: *Enki is a deterministic, content-addressed, governed substrate
that makes entity and sense resolution **auditable, composable-by-summation, and
abstaining**.* Accuracy is reported honestly but is **not** the claim: name-evidence
selective prediction has no high-precision operating point (§3c), so abstention is a
*safety* property (avoid false associations), not a high-precision-commit claim.

---

## 1. The corpus (real, captured)

| quantity | value |
|---|---|
| signals (raw captured records — documents/events) | 43,930 |
| entities (resolved; one signal yields *many* mentions → entities) | 64,206 |
| entities with a strong identifier (CIK/QID/LEI) | 542 (**0.8%**) |
| entities promoted / canonical | 8,382 |
| entity types | organization 22k · person 19k · product 3.9k · facility 3k · place 2.4k · gov 2.3k · … |

(A *signal* is a captured record — a document or event — not a single mention; one
signal produces many extracted mentions, which resolve into entities. So entities >
signals is expected. In §6, 1,499 *mentions* yield 190 entities — the mention→entity
ratio.)

**Honest framing:** this is *feasibility at prototype scale*, NOT a scale result.
We do not invite a Spanner/BigTable scale comparison; the contributions are
determinism, auditability, composability — none of which need large numbers.

## 2. Identifier sparsity & contamination (real data quality)

| id key | entities | distinct values | in contaminated group (shared id) |
|---|---|---|---|
| cik | 319 | 307 | 23 (**7.2%**) |
| wikidata_id | 223 | 208 | 22 (**9.9%**) |
| lei | 0 | – | – |

Two facts the paper must own: **strong identifiers are sparse (0.8% of entities)**,
so most resolution leans on name evidence; and **~7–10% of the ids that exist are
contaminated** (same value on distinct entities — e.g. two unrelated orgs sharing
CIK 1472494; three agencies sharing the Pentagon's Q11208). This both *motivates*
the `identifier_conflict` veto and exposes its *limit*: a shared **wrong** id cannot
be vetoed. A verified gold slice (unique id) is therefore required for any honest
accuracy number.

## 3. Held-out resolution (generalisation, leave-one-out)

Protocol: for a multi-variant entity, hold out one real surface form, delete it from
that entity's alias memory, normalise it exactly as the live gateway does, and
resolve against the full real grid. Ground truth = the held-out form's true entity.
`RECALL CEILING` = was the true entity even retrieved (separates retrieval from
scoring). Non-destructive (transaction rolled back).

### 3a. Full set (4k/3k sample, labels contain known contamination)
| evidence | coverage | precision | scoring-on-retrievable |
|---|---|---|---|
| name-only | 20.7% | 45.9% | **78.5%** |
| name+id | 22.2% | 51.9% | **82.7%** |

### 3b. Identifier-path sanity check (NOT an accuracy headline — it is circular)
| evidence | recall | coverage | precision |
|---|---|---|---|
| name-only | 1.8% | 12.7% | 6.9% |
| name+id | 100.0% | 100.0% | 98.2% |

**This is a sanity check, not a resolution result, and we label it so.** The slice is
*defined* by entities with a unique strong id, then "name+id" resolves them *using*
that id — the `name-only 1.8% → name+id 100%` jump shows the id does all the work. It
confirms the id path is clean (the 1.8% residual is contamination); it says nothing
about generalization. The honest accuracy result is the risk-coverage curve (§3c).

### 3c. Risk–coverage — the decisive test (name evidence, id WITHHELD)

Does selective prediction hold — is there an operating point with high precision at
non-trivial coverage when **name** evidence does the work? The id is used only to
*label* verified-distinct entities; it is withheld from the resolver. A commit to the
wrong entity is an error, **including when the true entity was never retrieved** (so
margin-based abstention must catch that or it is not selective prediction).

| slice | recall ceiling | best precision @ coverage | high-precision (≥90%) knee? |
|---|---|---|---|
| unique-**id** entities (228 q) | 1.8% | 8.6% @ 15% | **none** |
| unique-**name** entities (4,568 q) | 33.4% | **71.5% @ 39%** | **none** |

**The negative is fundamental.** Even on unambiguous names, name evidence tops out at
~71% precision with no ≥90% operating point; raising the threshold *lowers* precision
(the θ≥2 cliff to 0% is a lone exact-match to a *wrong* same-named entity committing
with no competitor to trigger the margin). Structural cause: a margin (δ) cannot
detect "the true entity is not in the candidate set."

**Honest reading.** **Identifiers deliver precision (98%) but cover 0.8%; name
evidence reaches ~70% at ~40% coverage with no high-precision knee.** This is *why*
the architecture is identifier-first and abstains — and we report it as a measured
negative, not a hidden one. Abstention is therefore a **safety** property (it avoids
false-association harm) rather than a high-precision-commit claim; making it deliver
even that fully needs an absolute-confidence rule (abstain on lone weak/ambiguous
matches), noted as future work. `scripts/eval_risk_coverage.py`,
`scripts/eval_rc_uniquename.py`.

## 4. A real defect found *and fixed* by the evaluation

`wikidata_id` key mismatch: 223 entities carry their QID under key `wikidata_id`
(enricher spelling) but `IDENTIFIER_PRECEDENCE` + the `canonical_entity_key` only
knew `qid`/`wikidata` — so the **live gateway's** strong-id collapse and id-folding
were silently skipped for them. Fixed in Python + the plpgsql trigger (migration
127), proven byte-identical (`2e8f3891…`).

| gold slice, name+id | before | after |
|---|---|---|
| id-recall | 28.9% | **100.0%** |
| precision | 77.6% | **98.2%** |

This is the value of held-out evaluation on real data — it found a real production
bug, and the fix is part of the "works fully before publish" bar. (Caveat: the fix was
found and confirmed on the same gold slice later reported in §3b — mild test-set
reuse; the before/after lift is a fix demonstration, not an independent accuracy
claim.)

## 5. Determinism (the load-bearing property — reproducible artifact)

Stable fingerprints across shuffled input orders (anyone can re-run to the same hex):

| gate | what | result |
|---|---|---|
| language collapse | library → grid, 6 shuffled orders | identical `165c00aa…` |
| entity collapse | resolver, 8 orders (fwd/rev/6 shuffles) | identical `84434ff0…` |
| full integration | 40 libraries, 120k senses, collapse+expansion, cross-library | identical `a4e10c04…` |

Addresses are content-derived hashes, so determinism holds **by construction**, not
by luck — this is what makes composition-by-summation possible.

---

## 6. Determinism-drift baseline (the differentiator, real mentions)

Same task — cluster 1,499 real surface mentions (190 true entities) into entities —
two methods, 8 shuffled input orders:

VI calibration anchors (so the numbers mean something): all-singletons = **2.47**,
all-one-cluster = **4.84** — these bracket the scale.

| method | unique clusterings / 8 orders | mean pairwise VI (drift) | accuracy-to-truth (VI) |
|---|---|---|---|
| **Enki (deterministic components)** | **1** | **0.0000** | **1.209** (zero drift) |
| greedy (sorted input → deterministic) | 1 | 0.0000 | 1.316 (zero drift) |
| greedy (arrival order — deployed practice) | **8** | 0.5299 | 1.351–1.518 (**drifts 0.167**) |

Two honest readings, not one:
1. **Drift is real and is the deployed-practice failure mode.** Greedy single-link on
   *arrival order* — the standard streaming linker — produces **8 different
   clusterings** of the same 1,499 mentions merely reordered. Enki produces one.
2. **Single-node determinism is *cheap* — and we say so.** Feed the *same* greedy
   linker a canonically *sorted* order and it too gives one partition (VI 1.316). So
   the contribution is **not** "we are deterministic" (sorting achieves that on one
   node). The contribution is that **content-addressing is deterministic AND composes
   across nodes**: every node assigns the same address to the same referent, so two
   nodes' results merge by summation (§7) — a sorted node-local clustering cannot
   (node A and node B sort different data into different clusters). Determinism that
   *composes* is the property; single-node determinism is table stakes.

On quality: Enki's VI **1.209 sits well below both trivial anchors** (2.47 / 4.84) and
edges out even the deterministic sorted-greedy (1.316), so the clustering is genuinely
informative, not merely "less bad." `scripts/eval_determinism_drift.py` (pure,
reproducible). *Remaining:* a learned strong baseline (correlation clustering / HAC)
would tighten the quality comparison further.

## 7. Composability-by-summation + negative controls (the headline)

Two nodes ingest DISJOINT halves of the 1,499 real mentions; each builds its grid
(content-addressed `canonical_entity_key` set) and priors (per-key decision counts).
Uses the REAL identity function — the actual mechanism.

| claim | result |
|---|---|
| union(grid_A, grid_B) == grid_AB | **True** (1,253 keys) |
| sum(priors_A, priors_B) == priors_AB | **True** (1,499 decisions) |
| merged fingerprint == single-node(A∪B) | **True** (`5a2b44b7…`) |

**Negative controls (the predicted limits, demonstrated):**
- **Different policy hash → merge REJECTED.** Counts under different policy versions
  aren't comparable; addition is valid iff `hash(policy_A) == hash(policy_B)`.
- **Overlapping corpora → naive sum DOUBLE-COUNTS** (exactly the 100 shared mentions:
  1599 vs 1499). Counts need disjoint evidence / dedup; the **grid (a set) composes
  idempotently regardless** — only the count channel needs care.

(The 1,253 keys vs §6's 190 true entities is not a contradiction: this experiment
groups by *exact* `canonical_entity_key` to test the *composition mechanism* — union
and sum — not resolution quality. Exact-name grouping over-fragments variants into
singletons by design here; resolution quality is §3/§6's concern, composition
arithmetic is this one's.)

Precise statement: *within a shared content-hashed policy, Enki composes by unioning
the content-addressed grid and summing decision counts at the evidence level —
mathematically clean because counts are sufficient statistics — provided evidence is
deduplicated and veto rules are unioned separately. Accumulation of sufficient
statistics, not gradient training: weaker than retraining, but exact, associative,
auditable.* `scripts/eval_federation_summation.py`.

### 7b. Live cross-node reproduction (real, not synthetic)

Beyond the disjoint-halves arithmetic above, the full sense-collapse relationship
graph was rebuilt **from scratch on an independent database** — fresh schema, the same
corpus (30,826 paragraphs / 18,733 mentions), the same content-hashed lexicon packs —
and produced a **byte-identical fingerprint**: `a4a718bbcf6118c5` (9,216 edges),
reproduced twice (a manual recipe and a packaged one-shot runner). Because every
address is a hash of content and the policy is content-hashed, an independent node
*must* land on the same graph.

**This is now demonstrated across two genuinely different physical machines.** Node A
is the production server (`G1`: dedicated host, RTX 5060 GPU, full Docker stack). Node B
is an ordinary **laptop with no GPU**, running the collapse from a **native Python
environment** (not the app container) against a standalone Postgres rebuilt from the
same image Dockerfile. A portable bundle carried the corpus envelope and the gitignored
WordNet / sense-frequency packs; the four remaining lexicon inputs were verified
byte-identical before the run (collapsepolicy `8ad54ddf`, function_words `25259097`,
general `949f8c87`, legal `4c0f210f`). Node B produced:

```
node B (laptop, no GPU, native) : a4a718bbcf6118c5   (9,216 edges, max raw count 178)
node A (G1, docker stack, GPU)   : a4a718bbcf6118c5   (9,216 edges, max raw count 178)
```

Not only the final fingerprint but every intermediate (edge count, max raw count)
matched. Note that **no GPU is involved on either side** — the collapse is deterministic
arithmetic over content-addressed inputs; a GPU only participates in the upstream
(non-deterministic) extraction, which federation deliberately does *not* re-run. A
no-GPU machine reproducing the server's graph is therefore the strongest form of the
claim: the §7 composability property holds on commodity hardware, with nothing shared
but the corpus + packs + content-hashed policy. `scripts/run_federation.sh`,
`scripts/fp_cooccur.py`.

## 8. Erase × merge convergence (erasure composes too)

Add-only composition (§7) is associative and exact, but erasure is *subtractive* —
the two must be shown to commute or they contradict. Modeling the tombstone set as a
**remove-wins CRDT** (a tombstone dominates any incoming counts for that address):

| claim | result |
|---|---|
| merge(A,B) == merge(B,A) **with** an erasure | **True**, bit-for-bit |
| erased-on-A entity stays erased after merging B (which carries counts for it) | **True** (tombstone dominates) |
| **negative control:** naive add-only merge (no remove-wins) | **resurrects** the erased entity |

So federated erasure and exact composition are reconciled: the count channel is
add-only (sufficient statistics); the tombstone set is remove-wins. Order-independent
either way. `scripts/eval_erase_merge.py`. *Residual (honest):* this proves the
*semantics* converge; a peer that is offline or declines the tombstone is not reached
(federated deletion is best-effort for non-cooperative peers).

---

## 9. Sense layer — measured on a real corpus (the gap §8 named, now closed)

The entity layer (§§1–8) and the sense layer share one equation; until now only the
entity layer carried numbers. The sense layer is now measured on real corpora (Epstein
depositions in `enki_research`; COVID-origins House testimony), the lexicon loaded as
reference data: WordNet oewn-2023 (**120,135** senses / **212,033** form→sense bindings
/ **228,300** relations), a sense-frequency pack (**212,071** per-lemma sense ranks), a
morphology exception pack (**4,416** irregulars), a function-word pack (165), and
curated domain packs (general / legal / biomed).

### 9a. Reproducible validation harness — 25/25 PASS
`scripts/validate_language_layer.py` (read-only over the loaded lexicon) asserts, and
all pass: governed/deterministic policy; **abstain-on-ambiguity** (pos-collision forms
like `use`/`research` park); **frequency dominant-sense** correctness (`fire`→burning,
`deny`→"declare untrue", `admit`→concede, `increase`/`difference`/`evidence`);
**morphology** (`using`→use, `went`→go, `mice`→mouse; `using` no longer mis-collapses
to "exploit"); **pin-overrides-frequency** (`circumvent` bare→"surround"/military vs
+FOIA-pin→"avoid answering"); **curated pins resolve** (verified-bound, not inert);
**determinism** (same form → same collapse).

### 9b. Frequency dominant-sense prior — coverage up, precision up
Adding the dominant-sense prior (1/sense_rank — the most-frequent-sense baseline)
roughly doubled general-noun term coverage on real statements (**~25-30% → ~50%**)
while *raising* precision: forms that previously mis-collapsed to obscure senses now
take the common sense or safely park. The statement-promoter (sentence → agent +
predicate-sense + polarity + resolved terms) structures **8/8** COVID statements
(~52% term coverage) and **8/11** depositions (~35%; the rest are verbless descriptive
lines, correctly unstructured) with one unchanged mechanism across legal *and* medical
text — the cross-domain generality claim, measured.

### 9c. Entity-aware sense resolution (the decisive before/after)
The sense layer and the entity layer **partition** a text — proper nouns are entities
and must not resolve to WordNet common senses. Corpus co-occurrence over the Epstein
depositions makes the point:

| | top co-occurrence edges |
|---|---|
| **blind** (sense layer on entity-laden text) | garbage — `Epstein`→"British sculptor", `Maxwell`→"physicist", `palm`→"hand", `united`→adjective |
| **entity-aware** (mask mention spans + capitalization guard) + prose-only + domain pins | **9,216 clean edges** — `question~answer~ask`, `remember~know~state~think~tell~see`, `objection~foundation`, `give~massage` |

### 9d. Honest negatives (the boundary, measured)
- **Joint-context via WordNet taxonomy does not work.** The `collocation` axis with
  relation-neighbours as context is *inert* when restricted to be safe (0/97 parked
  terms resolved on real corpora) and *harmful* when naive (a low-degree obscure sense
  wins on one coincidental match in a document soup). Reliable joint-context needs
  *learned corpus co-occurrence*, not taxonomy intersection.
- **Always-ambiguous pos-collision forms** park unless pinned; resolving them from
  context needs part-of-speech priors / EM (open — parking is safe by design).

Full reference + reproduction: `docs/LANGUAGE_INTERLINGUA_VALIDATION_2026-06-25.md`.

---

## 10. Text-quality validation — keeping garbage out of the meaning graph

Real corpora carry garbage: OCR errors *and* corrupted source PDF text layers. On
`enki_research`, **8.3%** of paragraphs the classifier had labelled `prose` were
doubled-character OCR garbage (`CCaassee 11::1155…` = "Case 1:15-…" with every char
doubled) — and most came from **pdfplumber** reading bad embedded text, not our OCR
worker, so the fix is **source-agnostic**: judge the TEXT, not the provenance.

The lexicon is the oracle. `core/documents/text_quality.py` scores a chunk
deterministically — no LLM — on four signals: `dict_ratio` (lemmatized fraction of
tokens that are real lexicon words; OCR gibberish → ~0), `func_ratio` (function-word
fraction; tabular records have ~none), `has_predicate` (a real verb resolves), and
structure sanity (doubled-char runs, alpha ratio). It then routes into four lanes:

| lane | meaning | handling |
|---|---|---|
| **pass** | real sentence | into the meaning graph (auto) |
| **record** | tabular / low function-words | to the record extractor (auto) |
| **drop** | garbage / no real words | raw, searchable evidence only (auto) |
| **flag** | real content, incomplete structure | human review queue |

**Validation (component):** OCR garbage 29/30 rejected, flight-log records 30/40
rejected, real deposition prose 58/60 accepted — `dict_ratio` is 0.00 on garbage vs
0.4-0.7 on real sentences.

**Corpus run (the evidence):** scored **25,315** candidate paragraphs —
**pass 72% · record 7% · drop 12% · flag 7%**. So **92% routes deterministically**
and only **7% needs a human eye**. Garbage (court-reporter footers, doubled-char
scans) drops; records (police-report headers, legal citations) route away; the flag
queue holds the genuinely ambiguous (date-heavy report lines, citations, form text).
The collapse worker now gates sense resolution on this verdict, and the verdict is
persisted (`validation_score`, `review_status`; migration 131) so the human reviews
only the flagged middle — a band that **shrinks as the lexicon/frames/co-occurrence
improve.** This is the abstain-rather-than-mis-map discipline applied to text quality:
the non-deterministic part is *flagged*, never guessed, and never silently admitted.

---

## 11. What is human-validated vs. machine-deterministic (the trust question)

A fair reader asks: *which of these numbers did a human check?* Stated plainly:

- **Human-reviewed:** the contradiction findings are surfaced automatically but read
  and confirmed by a human against the two source quotes (e.g. Fauci/Collins "did not
  provide edits" vs. the drafting evidence); the gold-precision slice (§3b) is defined
  by **human-verified** unique-identifier entities; the OCR flag band (§10) is the
  explicit *human-in-the-loop* lane. Every public investigation finding is a
  **verbatim quote copied character-for-character from the cited primary document and
  checked against it** before publishing — see the real examples with sources at
  `/explore` (the Epstein record) and `/covid` (COVID origins), each line linking the
  official document.
- **Machine-deterministic (reproducible by anyone, no human in the loop):** the
  determinism fingerprints (§5–6), composability-by-summation (§7), erase/merge
  convergence (§8), the sense harness (§9, 25/25), and the text-quality band
  distribution (§10, 25,315 paragraphs). These are *not* opinions — they are functions
  of content-hashed inputs and re-run to the same result.

So the trust model is two-layered: **determinism/auditability is proven by
construction** (anyone re-runs and gets the same hashes), and **factual findings are
human-checked against primary sources** (and the uncertain residue is flagged for a
human, never guessed). The reference implementation is in Appendix A (below) — the
actual code, inline, no repository access required.

---

## Caveats (stated, not hidden)
- **Accuracy is not the claim.** Name-evidence selective prediction has no
  high-precision knee (§3c); identifier precision is high but covers 0.8%. The claims
  are determinism, composability, auditability, and abstention-as-safety.
- Weights are **hand-set** (id=10, exact=3, loose=1.5); learnable from the decision
  log (the `prior` channel already accumulates the sufficient statistics).
- **Sense-layer collapse is now measured on a real corpus (§9)** — 25/25 reproducible
  harness checks, dominant-sense coverage ~25-30%→~50%, cross-domain statement
  structuring (legal + medical), and an entity-aware corpus relationship graph (9,216
  clean edges). What remains unquantified is a *gold-labelled* sense-accuracy number
  (no sense-tagged gold set yet) and context-driven resolution of always-ambiguous
  pos-collision forms (needs pos-priors/EM; §9d). The entity+sense unification is
  demonstrated on both layers now.
- Determinism-drift includes a deterministic comparator (sorted-greedy) and VI
  calibration anchors (§6); a *learned* strong baseline (correlation clustering / HAC)
  would tighten quality further. The gold/risk-coverage slices are **small** where
  labels are clean (verified labels cost coverage).
- Erase×merge convergence is proven for the *semantics* (remove-wins); cross-node
  enforcement is **best-effort** (offline/non-cooperative peers).

---

# Figures

Portable text figures for the whitepaper (render anywhere markdown does; promote to
SVG/PNG at typeset time). Every number traces to [EVIDENCE.md](EVIDENCE.md).

---

## Figure 1 — The chain (one signal, end to end)

```
   CAPTURE                signed signals (connectors / paired devices)
      │                   source_domain · timestamp · device_signature
      ▼
 ┌─ LDU ──────────┐       permissive admission — every form gets a home;
 │  "open mouth"  │       nothing rejected; provisional until governed
 └────────────────┘
      ▼
   EXTRACTION            LLM → mapper → typed candidates    (LLM PROPOSES,
      │                  (entities · claims · events · places)   never writes)
      ▼
 ┌─ GDU GATEWAY ──┐      the SINGLE authorized write path;
 │  "locked jaw"  │      content-hashed packs govern; FAIL-CLOSED
 └────────────────┘      (ungoverned → parked)
      ▼
   COLLAPSE              S = Σ wₖ·eₖ , veto , θ/δ  →  COMMIT | PARK
      │                  one policy: entities (id/exact/loose) & senses
      ▼
   BASE-60 GRID          FORM 11.<hash> → CONCEPT 12.<pos>.<hash> → INSTANCE
      │                  content-addressed → identical on every node
      ▼
   GOVERNANCE            reversible decisions · hash-chained log · erasure
      │
      ▼
   FEDERATION            union grids + sum priors (shared policy hash);
                         remove-wins tombstones; Ed25519 deltas

   ── permissive capture ───────────────► fail-closed canonicalization ──
```

---

## Figure 2 — The content-addressed grid (three layers)

```
                       "lead"  (a surface form)
                          │
            ┌─────────────┴─────────────┐  polysemy fan-out
   FORM     │  11.0.<sha("lead")>        │  (form_concept_bindings)
            └─────────────┬─────────────┘
                ┌─────────┴──────────┐
   CONCEPT  12.0.<sha(lib:lead-metal)>   12.0.<sha(lib:lead-role)>
            (domain=chemistry)           (domain=business)
                │           relation edges (concept_relations) → expansion
                │  hyponym/synonym…
   INSTANCE  <entity: "Lead, the element"> ……… canonical_entity_key
            (a concept may promote to a real-world referent)

   address = hash(content), never a node-local counter
   ⇒ same input → same address on EVERY node ⇒ models SUM (Fig. 5)
```

---

## Figure 3 — Risk–coverage: the honest negative (name evidence, id withheld)

```
 precision
  100% ┤·············································  (ideal)
   90% ┤- - - - - - - - - - - - - - - - - - - - -   ← target knee (never reached)
   80% ┤
   70% ┤                         ● unique-name 71.5% @ 39%
   60% ┤
   50% ┤        full-set ~46% ●
   40% ┤
   30% ┤
   20% ┤
   10% ┤   ● unique-id 8.6% @ 15%
    0% ┼────┬────┬────┬────┬────┬────┬────┬────┬──►
        0%  5%  10%  15%  20%  25%  30%  35%  40%   coverage

   No operating point reaches ≥90% precision by NAME evidence on any slice.
   Raising θ LOWERS precision (lone exact-match to a wrong same-named entity).
   ⇒ identifiers carry precision (98%, but 0.8% coverage); the system ABSTAINS.
     Abstention is a SAFETY property, not a high-precision-commit claim.
```

---

## Figure 4 — One worked collapse (synthetic, reproducible)

```
 "The mine increased  lead  output."        w = {id:10 exact:3 loose:1.5
                       └── form                   register:1 domain:1
                                                  collocation:2 prior:1}
                                              θ = 1.0   δ = 0.5
 candidate              id·10 exact·3 domain·1 colloc·2 reg·.5  →  S
 ─────────────────────────────────────────────────────────────────────
 lead-metal 12.0.f205…    0     3.0     1.0      1.0     0.5    = 5.5  ◄ winner
 lead-role  12.0.1c0c…    0     3.0     0.0      0.0     0.5    = 3.5

 θ-test: S* = 5.5 ≥ θ=1.0           ✓ (enough absolute evidence)
 δ-test: 5.5 − 3.5 = 2.0 ≥ δ=0.5    ✓ (a clear winner)
                                     ⇒ COLLAPSE → lead-metal  (reversible decision)

 strip domain context → both = 3.5 → margin 0 < δ → PARK (knows it doesn't know)
 entity case: same CIK ⇒ id=1·10 ⇒ merge;  different CIK ⇒ veto ⇒ never merge
```

---

## Figure 5 — Composability: union grids + sum priors (and erase × merge)

```
   node A                      node B
   grid_A  {a,b,c}             grid_B  {c,d}            (c = shared content-address)
   priors  a:2 b:1 c:2         priors  c:2 d:3

        merge (shared policy hash) ─ commutative, exact:
        ┌───────────────────────────────────────────────┐
        │  grid   = grid_A ∪ grid_B      = {a,b,c,d}      │
        │  priors = priors_A + priors_B  = a:2 b:1 c:4 d:3│  (counts = suff. stats)
        └───────────────────────────────────────────────┘
        == a single node that ingested A∪B           (bit-identical, fp 5a2b44b7)

   negative controls (proven):  policy_hash(A) ≠ policy_hash(B) → REJECT
                                overlapping evidence            → double-counts

   erase × merge (remove-wins):  A erases c   ⇒  T = {c}
        final grid = (grid_A ∪ grid_B) − T ,  priors drop c
        merge(A,B) == merge(B,A)  with erasure   (tombstone dominates B's c:2)
        naive add-only (no remove-wins)          → RESURRECTS c   (why remove-wins)
```

---

# Provenance, Privacy, and Dual-Use

Enki resolves real, named people and organizations into permanent, content-addressed,
federated, summable records. That combination — *identifiable individuals* ×
*permanent addresses* × *cross-node accumulation* — demands an explicit account of
where the data comes from, what the privacy basis is, and what the surveillance
implications are. This section states them plainly, including the tensions the
architecture does **not** fully resolve.

## 1. Data provenance

The corpus measured in this work is drawn from **open sources and public records** —
news, regulatory filings (SEC), public registries (Wikidata), court and FOIA
documents — captured through signed connectors. Every signal carries its source
(`source_domain`), capture time, and a device signature, and every derived record is
traceable to the signals that produced it through the hash-chained admission log.
Provenance is therefore *queryable*: for any entity or claim, the system can list the
exact source documents behind it. This is a precondition for the auditability claim,
not a nice-to-have.

**"Public record" is not "privacy-free."** Court filings and FOIA releases routinely
contain third-party PII — witnesses, victims, bystanders — who are not public figures
and did not consent. Open availability is not a lawful basis for arbitrary processing
of those individuals; minimization and the erasure path (§2) apply to them as much as
to anyone, and a deployment over such sources inherits that obligation.

**For this paper, public-facing examples use synthetic or public-record subjects, not
private individuals.** The worked example in the system document is synthetic; the
quantitative results are reported in aggregate.

## 2. Privacy and legal basis

The legal basis for processing depends on the deployment, not the software: an
open-source-intelligence deployment over public records, a law-enforcement deployment
under warrant/authority, and a corporate-investigations deployment under contract are
different bases with different obligations. Enki's design positions it to *support*
compliance rather than presuppose it:

- **Provenance + audit** make data-subject access requests answerable (what is held,
  and on what source basis).
- **Right-to-be-forgotten is a first-class operation** (`erase_entity`): it deletes
  the entity's PII-bearing row, writes a tombstone, and blocks re-creation of that
  content-address across re-ingestion. The hash-chained audit log stores *references*
  (id, address, type, mutation hashes), not raw PII, so erasure does not break the
  chain (no immutability conflict). Across nodes it composes under remove-wins (a
  tombstone dominates incoming counts), proven order-independent.
  **Two residuals, stated honestly, not glossed:**
  - The tombstone's address is a **hash of the entity's identity**. A hash of personal
    data is *pseudonymized, not anonymized*: a party already holding the name can
    recompute the hash and confirm membership — so the tombstone is a *confirmation
    oracle*, not a clean forget. Mitigation is a salted/keyed address for the tombstone
    set (at a cost to cross-node determinism that must be confronted); absent that, the
    address reveals nothing on its own but does not preclude confirmation. We also
    scrub free-text fields on erasure so a name cannot survive incidentally.
  - **Cross-node erasure is best-effort.** Once a signed delta has propagated, a peer
    that is offline or declines to honor the tombstone keeps the data. Federated
    deletion is enforceable only for cooperative, reachable peers — a hard limit of any
    federated system, which we name rather than imply away.
- **Fail-closed canonicalization** means ungoverned, speculative inferences are
  parked, not asserted — the system does not manufacture claims about people.

What the software cannot do is *establish* a lawful basis or *bound* a deployment's
scope; those are operator obligations. We state this rather than imply the software
discharges them.

## 3. Dual-use and surveillance implications (the honest tension)

Enki is an intelligence-fusion substrate. The same properties that make it
trustworthy for a warranted investigation — permanent addresses, cross-node
accumulation, composability-by-summation — also make it a powerful surveillance
instrument if deployed without authority. We do not claim the architecture neutralizes
this; we claim it makes misuse **visible and constrained** in ways opaque systems do
not:

- **Auditability over opacity.** Every link is an attributable, reversible record. A
  neural dossier cannot say why two records were merged or who caused it; Enki can.
  Accountability requires this.
- **Abstention over fabrication.** The system parks on ambiguous evidence instead of
  asserting a guess, reducing the false-association harm that is most damaging to
  individuals.
- **Governed, content-hashed policy.** What the system is permitted to assert is
  pinned to a versioned, inspectable policy — changes are diffs, not silent drift.
- **Erasure as a primitive**, not an afterthought (§2).

These are mitigations, not guarantees — and one contingency must be explicit, because
it's the uncomfortable core: **auditability constrains an operator only if that
operator is subject to an external auditor.** The operator controls the policy packs
*and* the audit log, so a deployment that removes external oversight is *self-auditing
surveillance* — the architecture cannot, by itself, hold an unaccountable operator
accountable. What it provides is the *substrate* for accountability (attributable,
reversible, inspectable records); the *accountability itself* requires an external
party with authority to inspect. We state this as a precondition, not an implied
property. Likewise, "addresses are permanent" is in genuine tension with deletion —
mitigated by erasure-as-primitive (with the residuals in §2), but deployment
governance (access control, authority, retention limits, independent oversight)
carries the rest, and is a requirement *on deployments*, not a feature *of the
software*.

## 4. Positioning

Enki is offered for **authorized** use — open-source intelligence over public records,
law enforcement under proper authority, and corporate investigations under contract —
invite-only, not a general public surveillance product. We are explicit that
"invite-only" is a **business assurance, not an architectural one** — every dual-use
vendor says it, and it carries no technical weight. The architectural choices (audit,
abstention, governance, erasure) make authorized uses *accountable to an overseer*;
they are deliberately not sufficient, by themselves, to make unauthorized use safe.
That gap is closed only by external governance — and we say so rather than let the
architecture take credit for accountability it can only enable, not enforce.

---

# Appendix A — Reference implementation (the load-bearing code, inline)

The substrate's claims rest on small, deterministic functions, reproduced here so a
reader (or an agent) can verify the algorithm **without repository access** — this is
the readable code, not a link out. Each is the real function, lightly condensed.

## A.1 The one governed scoring equation (collapse)

```python
# core/collapse/equation.py — the single equation reused for entity AND sense collapse.
def score(c, x, policy):
    for name, enabled in policy.vetoes.items():            # hard eliminations first
        if enabled and VETO_FNS.get(name) and VETO_FNS[name](c, x):
            return Scored(c, NEG_INF, {}, vetoed=True, reason=name)
    s = 0.0
    for axis, w in policy.weights.items():                 # weighted evidence, policy-driven
        fn = EVIDENCE_FNS.get(axis)
        if fn is None: continue
        s += w * fn(c, x)
    return Scored(c, s, ...)

def collapse(candidates, x, policy):
    live = sorted((score(c, x, policy) for c in candidates if not vetoed),
                  key=lambda s: (-s.score, s.candidate.address))   # deterministic tiebreak
    top = live[0]; second = live[1].score if len(live) > 1 else -inf
    if top.score >= policy.theta and top.score - second >= policy.delta:
        return CollapseResult("collapse", top, ...)        # confident → commit
    return CollapseResult("park", None, ...)               # ambiguous → abstain
```

Weights, vetoes, and the thresholds θ/δ live in a **content-hashed policy pack**, not
in code — so the equation is identical (and deterministic) on every node, which is
what makes composition-by-summation valid.

## A.2 The two evidence axes that drive sense disambiguation

```python
@evidence_axis("frequency")          # the dominant-sense prior
def _e_frequency(c, x):
    # 1/sense_rank from WordNet's frequency ordering — the most-frequent-sense default
    # absent other evidence (fire→burning, admit→concede). 0 when unranked.
    return max(0.0, min(1.0, float(c.freq)))

@evidence_axis("pin")                # domain-conditional override
def _e_pin(c, x):
    # a domain/curated pack asserts THIS sense for the form in context (circumvent→
    # "avoid answering" in FOIA, not the dominant "surround"). Weighted above all soft
    # evidence, below only the hard identifier.
    return max(0.0, min(1.0, float(x.pins.get(c.address, 0.0))))
```

## A.3 The text-quality gate (keeping OCR/garbage out of the meaning graph)

```python
# core/documents/text_quality.py — deterministic, lexicon-backed; no LLM.
def chunk_quality(cur, text, min_dict_ratio=0.5, min_func_ratio=0.12):
    toks = words(text); distinct = set(toks)
    # dict_ratio: fraction of tokens that are real lexicon words (lemmatized).
    #   OCR gibberish ("ccaassee") → ~0; real sentences → 0.4–0.9.
    known = {f for f in lexicon_forms(distinct | lemmas(distinct))}
    dict_ratio = sum(t in known for t in distinct) / len(distinct)
    # func_ratio: function words (the/was/and/…). Sentences flow through them; records ~0.
    func_ratio = sum(t in FUNCTION_WORDS for t in toks) / len(toks)
    has_predicate = any_token_binds_to_a_verb(distinct)
    structure_ok = not ocr_doubled(text) and alpha_ratio(text) >= 0.55
    is_real = dict_ratio >= min_dict_ratio and func_ratio >= min_func_ratio \
              and has_predicate and structure_ok
    return {...}

def band(v):                          # four lanes; the human sees only "flag"
    if v["is_real_sentence"]: return "pass"      # → meaning graph (auto)
    if v["dict_ratio"] < 0.2 or not v["structure_ok"]: return "drop"   # garbage → raw evidence
    if v["func_ratio"] < 0.08: return "record"   # tabular → record extractor (auto)
    return "flag"                                 # real-but-incomplete → human review
```

## A.4 Reproduce the validation

The validation harness (`validate_language_layer`) asserts 25 properties over the
loaded lexicon and prints `25/25 checks PASSED`. It is deterministic: same lexicon +
same content-hashed policy → same result on any node. The governed policy pack, the
WordNet pack, and the sense-frequency pack are the only inputs.
