Auditor's Guide: Inspecting Lár Agents

Audience: Compliance Officers, Quality Assurance, Notified Bodies, and external Auditors. Scope: How to verify a high-risk AI system built with Lár, in accordance with the EU AI Act (2026) and Nannini et al. (2026) — "AI Agents Under EU Law: A Compliance Architecture for AI Providers" (arXiv:2604.04604v1).

1. What You Are Looking At

Lár is a "Glass Box" agent execution engine. Every agent is a deterministic graph of explicit steps. You do not need to understand the underlying LLM to audit a Lár agent — you need to read the three artefacts it produces on every run.

No external tooling required. No LangSmith account. No vendor access. The artefacts are plain JSON files, signed with a key your organisation controls.

For the canonical reference implementation showing all 13 compliance primitives firing in sequence, see: EU AI Act Finance Showcase →

2. The Three Required Artefacts

Request these from the engineering team for any conformity assessment or incident investigation. Every Lár Enterprise run writes all three to enterprise_audit/.

Artefact 1 — Action Inventory (`compliance_manifest.json`)

Regulatory basis: Step 9 (Nannini et al.), Annex IV

Generated by static graph traversal before the agent runs. This is the regulatory map — what the agent could do, not what it did.

What to verify: - External actions: Every tool, LLM call, and API endpoint the agent can reach. If an action appears at runtime that is not in the manifest, that is an Art. 3(23) Substantial Modification event. - Affected parties: Actions flagged THIRD_PARTY must have Art. 50 disclosure evidence in the causal trace. - Unvaulted tools: tools_without_credential_vault must be zero in production. Any tool not using CredentialVault holds standing credentials — a Art. 15(4) violation. - Risk flags: HIGH severity flags (e.g., AdaptiveNode present) require explicit provider documentation before CE-marking submission.

Artefact 2 — Causal Trace (`run_<uuid>.json`)

Regulatory basis: Art. 12 — Record-Keeping

The forensic log of what actually happened. Lár records a state diff at every step — not a text dump, but the exact variables added, removed, or modified, the rendered prompt, token usage, and outcome.

What to verify: 1. Sequentiality: Step numbers are consecutive with no gaps. 2. Causality: The variable written in step N is the variable read by the router in step N+1. The decision chain is reconstructible from the log alone. 3. PII redaction: Search the log for known PII values (names, SSNs, IBANs). If present, the PIIRedactionEngine was not applied before signing — GDPR Art. 17 violation. 4. HMAC signature: The final entry contains a cryptographic signature. Verify it (see Section 3). If verification fails, the log was tampered with after execution. 5. Bias scan: Steps involving BiasFilterNode should show a scan result in state_diff. If a protected characteristic was detected and no HumanJuryNode interrupt followed, flag for review.

Artefact 3 — Authority Ledger (`authority_ledger.json`)

Regulatory basis: Art. 14 — Human Oversight (Fourth Tier)

Records every human oversight exercise. This is the evidentiary chain the paper's footnote 18 requires: action proposal → risk assessment → human determination → execution outcome.

What to verify: - Stakeholder identity and role: stakeholder_id and stakeholder_role must match the organisation's authorised reviewer registry. - Rationale: Every record must contain a non-empty rationale. An approval with no rationale is rubber-stamping — it will fail an audit for negligent oversight. - Timestamp realism: Cross-reference timestamp against the causal trace. If a HumanJuryNode decision appears in under 10 seconds on a complex case, investigate. - Risk score: The risk_score_key field shows the upstream RiskScorerNode output that triggered the interrupt. Verify the score matches the action's PolicyRegistry classification.

3. Cryptographic Verification

A log is only useful if it cannot be silently altered. Lár signs the causal trace and authority ledger using HMAC-SHA256 with an enterprise-controlled secret key.

Verification procedure (no coding required):

python examples/compliance/11_verify_audit_log.py run_<uuid>.json your_enterprise_secret_key

Expected output: [+] VERIFICATION SUCCESSFUL

If output is [-] VERIFICATION FAILED, the log was modified after execution. Treat this as an integrity incident.

Key management: The HMAC secret should be stored in your organisation's secrets manager (AWS KMS, HashiCorp Vault). The engineering team should not have access to the production key — this ensures the audit trail cannot be altered even by the team that built the agent.

4. The Nannini et al. (2026) 12-Step Coverage Map

Use this table to verify which steps the agent's architecture addresses. Request the compliance manifest and causal trace to confirm runtime coverage.

Step	Requirement	Lár Primitive	Where to Verify
0	Scope: Art. 3(1) AI system definition	`DOMAIN_PRESETS` classification record	Provider documentation
1	GPAI layer: Art. 53 documentation	LiteLLM model config	`run_metadata.model` in causal trace
2	Classify: Annex III / high-risk	`conformity_id` in domain preset	Compliance manifest header
3	QMS: prEN 18286 artifacts	Manifest + Ledger + Causal Trace	All three artefacts
4	Risk management: Art. 9	`PolicyRegistry` + `RiskScorerNode`	`computed_oversight_level` in causal trace
5	Data governance: prEN 18283/18284	`PIIRedactionEngine` + `BiasFilterNode`	PII absent from log; bias scan in state diff
6	Trustworthiness: Art. 12–14	`AuditLogger` + `HumanJuryNode` + `AuthorityLedger`	All three artefacts
7	Cybersecurity: Art. 15(4)	`CredentialVault`	`jit_token_present` in causal trace; unvaulted tools = 0 in manifest
8	CRA applicability	Secure-by-design architecture	Provider documentation
9	Adjacent legislation inventory	`ComplianceManifestGenerator`	Compliance manifest action inventory
10	Conformity assessment artefacts	Manifest + Ledger + Trace → Annex IV	All three artefacts
11	Post-market monitoring: Art. 3(23)	`RuntimeStateVersioner`	`drift_report` in causal trace

Full mapping: https://docs.snath.ai/compliance/paper-compliance-mapping/

5. Failure Modes and Red Flags

Failure	What to Look For	Primitive That Should Have Caught It
Rubber-stamp approval	`HumanJuryNode` decisions with empty rationale or sub-10s timestamps	`AuthorityLedger` rationale field
Lethal Trifecta violation	Untrusted input + PII + autonomous action with no jury record	`LethalTrifectaGuard`
Behavioural drift	Tool in causal trace not present in manifest	`RuntimeStateVersioner` drift report
PII in audit log	Known personal data values present in signed log	`PIIRedactionEngine`
Unvaulted credentials	`tools_without_credential_vault > 0` in manifest	`CredentialVault`
Unapproved subgraph	`AdaptiveNode` present but no `TopologyValidator` referenced	`TopologyValidator` rejection log
Missing Art. 50 disclosure	Third-party-affecting action with no transparency flag	`TransparencyEngine`
Tampered log	HMAC verification fails	HMAC-SHA256 signing
Bias in output	Protected characteristic in LLM output with no interrupt	`BiasFilterNode` + `HumanJuryNode`
Consolidated-only jury context	`BatchNode` present but no `branch_findings_summary` in `HumanJuryNode` context keys	`BranchTriageNode`

6. Questions to Ask the Engineering Team

What is the conformity_id for this system and where is the conformity assessment record?
Where is the HMAC secret stored and who has access to it?
Has the compliance_manifest.json been reviewed and signed off before production deployment?
Are HumanJuryNode approvers trained on what they are approving, or is rubber-stamping structurally possible?
Is RuntimeStateVersioner configured with a conformity baseline, and what is the drift threshold before a new assessment is triggered?