Part of the Syno platform · Literature review
Reviews whose every claim you can verify in seconds — not hours.
Syno is a biomedical literature-review agent built on a different posture. Every claim is grounded in a retrieved passage, judged against the cited paper along independent axes, and stamped with a verdict you can trace back to the paragraph that justifies it.
The aim isn’t to write reviews faster than a human. It’s to write reviews a human can check, claim by claim.
The citation crisis
The harder problem isn’t invented papers. It’s real papers, quietly mis-described.
The field trained itself to watch for fabricated citations. But peer-reviewed audits now show the dominant failure is subtler: the DOI resolves, the authors check out — and the sentence built around the paper overstates, mis-populates, or quietly reverses what it actually reports.
47–50%
of references from leading generalist deep-research tools carried fabricated authors or titles on medical prompts.
JMIR 2026 — Wong, Ong, Merle & Keane
>50%
of the strongest tool’s cited statements still contained at least one subtle inaccuracy or misrepresentation of the source.
OpenAI Deep Research, same evaluation
75–83%
the per-claim accuracy band of careful human reviewers — the floor Syno is measured against, not zero.
Baethge & Jergas meta-analysis and others
Design principles
We treat citation faithfulness as a property to measure — not assert.
Six principles, each a direct response to a documented failure mode, each grounded in established evidence-synthesis methodology.
Closes citation fabrication
Retrieval-first grounding
No claim enters the output without a retrieved passage. Planning, drafting, and validation all draw from one fixed evidence set, fingerprinted at retrieval. The writer cannot name a paper that isn’t in it.
Catches misinterpretation
Per-axis judgment
Each (claim, paper) pair is judged on four independent axes — population, numerics, outcome, recency. A paper can entail a claim’s direction and still fail the population check; the mismatch survives instead of disappearing.
Displaces vote-counting
Quality-weighted aggregation
Meta-analyses and randomised trials carry the most weight; case series and mechanistic work the least. Bounded so no single study overwhelms a contradicting body of evidence — a 50-patient pilot is not one vote against a 5,000-patient trial.
Surfaces suppressed dissent
Cross-paper dissent check
A separate pass targets every grounded claim that cites just one paper, pulls top uncited candidates from the same round, and demotes the claim if a supermajority contradict it. Silence is never counted as disagreement.
Surfaces contestation
Multi-tier, cross-family escalation
When a first-pass verdict is low-confidence or contested, the same input is re-judged by a deeper model from a different family and vendor. Escalation is recorded, not averaged away — and a single-provider outage can’t silently degrade the output.
Makes reviews reproducible
Auditable, version-stamped verdicts
Every verdict carries cited paper IDs, a passage anchor, per-axis judgments, escalation history, and a policy version. Change the scoring regime and the version bumps — old verdicts are re-judged, never silently reused.
Per-claim trust
A verdict for every claim — not one score for the whole document.
A 7,000-word review holds hundreds of claims of widely varying defensibility. An overall “trust me” score tells a reader nothing about which paragraph to scrutinise. Syno emits a verdict for each claim, with the per-axis judgments that produced it and a passage anchor a reader can open in seconds.
Grounded
Cited papers directly support the claim along all relevant axes; the cross-paper check passes.
Weakly grounded
Support exists but is qualified — a broader population, a surrogate outcome, or a single small cohort.
Not grounded
No retrieved paper substantively supports the claim.
Contradicted
At least one cited paper actively conflicts with the claim along a substantive axis.
Literature disagrees
Multiple credible papers in the evidence set point in opposing directions; dissent is preserved, with its temporal direction.
Abstain (scope-rejected)
The claim falls outside the locked scope or is trivially un-evaluable; not pursued.
Escalation-dropped
The reasoning budget was spent before a verdict resolved — kept distinct from scope rejection in the audit trail.
How a review gets built
Seven phases. Each one produces an artefact you can reproduce from its inputs alone.
Planning
The question is decomposed into a research plan — PICO framing where it applies, inclusion criteria, and the review posture (systematic, scoping, or narrative) — recorded alongside the active policy version.
Retrieval
A curated biomedical corpus (PMID, PMCID, DOI, plus arXiv preprints) is queried with hybrid dense-plus-lexical search. Every run records a cryptographic fingerprint of the corpus, and dedicated slots protect dissenting evidence from being filtered out.
Study
Each retained paper becomes a structured record — study design, population, sample size — plus a faithful summary that feeds both drafting and validation.
Drafting
A long-context model composes the prose; every substantive claim is tagged with supporting paper IDs from the fixed set. It cannot reference a paper outside it.
Per-claim validation
Prose is decomposed into atomic claims, each graded per-axis, weight-aggregated, run through the dissent check, and escalated when contested. Every call carries a wall-clock budget.
Revision
The writer sees structured verdicts, reframing hints, and concrete replacement citations from the same evidence set. Already-grounded claims are held stable so verified work isn’t re-judged.
Audit output
The review ships with prose, the per-claim verdict record, passage anchors, the policy version, the corpus fingerprint, a PRISMA 2020 four-stage flow, and a provenance-carrying bibliography.
External benchmark · diabetes
One run, one topic, one corroborating data point — fully reproducible.
The primary endpoint
+18.8 pts per-claim numeric fidelity over the strongest peer.
On ten pre-registered diabetes review questions, graded by a three-model panel across two families, Syno led Elicit by a per-question mean of +18.8 percentage points (95% CI [+9.4, +28.2]) — ahead in 9 of 10 questions. The lead survives every judge subset.
Citation existence — the trust floor
Three tools cluster at the top; the generalist agent is the outlier, with only 69.2% of citations resolving to indexed papers.
Honest scope: diabetes only, one capture per tool, graded by an external model panel rather than human experts. Syno was graded on full text; abstract-only tools on abstracts — each tool’s real evidence base. It’s a corroborating data point, not the thesis.
Download the data package — every question, every raw verdict →
Methodological lineage
Built on the evidence-synthesis canon — and we name it.
A literature-review agent that doesn’t engage with these standards isn’t a literature-review agent; it’s a summariser of search results.
Prove your next review — claim by claim.
Run a citation-faithful literature review on real data, or book a demo to see the verdict trail on your own questions.
Every verdict is reproducible from the published evidence set and policy version.