/ Whitepaper · 2026
Cited RAG for the SOC
How we built Astute to give analysts answers they can audit — with conflict reconciliation, ATT&CK tagging, and span-level citations.
/ Whitepaper · 2026
How we built Astute to give analysts answers they can audit — with conflict reconciliation, ATT&CK tagging, and span-level citations.
The default RAG implementation — embed, search, prepend, ask — fails the moment a SOC analyst tries to use it for a real decision. Conflicting sources get averaged, citations are missing or wrong, and the model invents context when retrieval comes up empty. The output is plausible. It is not auditable.
Astuteis the retrieval layer of WIT OS. It is MITRE ATT&CK-aware, conflict-reconciled, and emits span-level citations on every claim. This paper is the build log: the architecture choices, the failure modes we hit, and the four metrics that matter when an analyst's career is on the line.
The SOC is not a customer-support chatbot. The cost of a hallucinated answer is not a confused user; it is an analyst escalating the wrong incident, declaring a false positive in a real intrusion, or — worst — citing an authoritative-sounding paragraph that the model fabricated whole.
Default RAG fails analysts in three predictable ways:
Astute is built around three commitments: every claim is attributable to a span, every conflict is surfaced, and empty retrievals fail loud.
Every document carries (a) source identity, (b) issuer, (c) issuance date, (d) freshness signal, (e) trust class (canonical, derived, community). A NVD CVE is canonical; a Reddit thread is community. The trust class is carried through retrieval and rendered in the citation.
Documents are embedded into a dense vector index, a sparse BM25 index, and an ATT&CK-technique index. The technique index is what lets an analyst ask “what do we know about T1059.001 in healthcare?” and have the retrieval respect both the technique and the industry.
Before the model sees the chunks, a reconciliation pass identifies disagreements between sources on numeric or factual claims (CVSS scores, attribution, exploitation status). The disagreement is preserved through to the output: the analyst sees both numbers and knows which source said which.
Generation runs against an instruction set that requires every assertion to cite a specific span (not a whole document). The post-generation verifier re-grounds each cited span against its source; uncited assertions are rejected before the answer is returned to the analyst.
If retrieval scores are below threshold, Astute does not generate an answer from pretraining. It returns the closest-relevant sources and an explicit “the corpus doesn't cover this” signal.
Most RAG benchmarks measure retrieval recall and generation BLEU. Neither is a useful proxy for “is this safe to put in front of a SOC analyst.” The four we track in production:
Astute's trailing-quarter numbers across customer deployments: 99.4% / 96.1% / 99.7% / 1.9s.
An incident-response team was investigating a credential abuse case with 47 candidate sources — vendor advisories, CISA notes, three social-media posts, and a Twitter thread flagged by a junior analyst. Conventional RAG returned a confident summary that picked the wrong attribution.
Astute returned the same summary with three flagged conflicts: the vendor and CISA disagreed on initial-access technique (T1078 vs T1190), two community sources cited a retracted advisory, and one source was younger than the attacker's known dwell time. The analyst escalated with confidence. The post-incident review credited the flagged conflicts with shaving 11 hours off the investigation.
Cited RAG is not a feature you bolt on. It is a discipline that runs through ingestion, retrieval, generation, and verification. Done right, it is the difference between “the AI told me” and “the AI showed me where to verify it.”
The architecture in this paper is the same one we run for every WIT ONE customer. Talk to the team about deploying it inside your environment.