Skip to content

Part of the Syno platform · Literature review

Reviews whose every claim you can verify in seconds — not hours.

Syno is a biomedical literature-review agent built on a different posture. Every claim is grounded in a retrieved passage, judged against the cited paper along independent axes, and stamped with a verdict you can trace back to the paragraph that justifies it.

The aim isn’t to write reviews faster than a human. It’s to write reviews a human can check, claim by claim.

Read the full whitepaper →

Syno literature review workbench — a completed review showing the phase tracker, the Sources tab with retrieved papers, a publication-year distribution, and a PRISMA flow

The citation crisis

The harder problem isn’t invented papers. It’s real papers, quietly mis-described.

The field trained itself to watch for fabricated citations. But peer-reviewed audits now show the dominant failure is subtler: the DOI resolves, the authors check out — and the sentence built around the paper overstates, mis-populates, or quietly reverses what it actually reports.

47–50%

of references from leading generalist deep-research tools carried fabricated authors or titles on medical prompts.

JMIR 2026 — Wong, Ong, Merle & Keane

>50%

of the strongest tool’s cited statements still contained at least one subtle inaccuracy or misrepresentation of the source.

OpenAI Deep Research, same evaluation

75–83%

the per-claim accuracy band of careful human reviewers — the floor Syno is measured against, not zero.

Baethge & Jergas meta-analysis and others

Design principles

We treat citation faithfulness as a property to measure — not assert.

Six principles, each a direct response to a documented failure mode, each grounded in established evidence-synthesis methodology.

Closes citation fabrication

Retrieval-first grounding

No claim enters the output without a retrieved passage. Planning, drafting, and validation all draw from one fixed evidence set, fingerprinted at retrieval. The writer cannot name a paper that isn’t in it.

Catches misinterpretation

Per-axis judgment

Each (claim, paper) pair is judged on four independent axes — population, numerics, outcome, recency. A paper can entail a claim’s direction and still fail the population check; the mismatch survives instead of disappearing.

Displaces vote-counting

Quality-weighted aggregation

Meta-analyses and randomised trials carry the most weight; case series and mechanistic work the least. Bounded so no single study overwhelms a contradicting body of evidence — a 50-patient pilot is not one vote against a 5,000-patient trial.

Surfaces suppressed dissent

Cross-paper dissent check

A separate pass targets every grounded claim that cites just one paper, pulls top uncited candidates from the same round, and demotes the claim if a supermajority contradict it. Silence is never counted as disagreement.

Surfaces contestation

Multi-tier, cross-family escalation

When a first-pass verdict is low-confidence or contested, the same input is re-judged by a deeper model from a different family and vendor. Escalation is recorded, not averaged away — and a single-provider outage can’t silently degrade the output.

Makes reviews reproducible

Auditable, version-stamped verdicts

Every verdict carries cited paper IDs, a passage anchor, per-axis judgments, escalation history, and a policy version. Change the scoring regime and the version bumps — old verdicts are re-judged, never silently reused.

Per-claim trust

A verdict for every claim — not one score for the whole document.

A 7,000-word review holds hundreds of claims of widely varying defensibility. An overall “trust me” score tells a reader nothing about which paragraph to scrutinise. Syno emits a verdict for each claim, with the per-axis judgments that produced it and a passage anchor a reader can open in seconds.

Syno literature review — the claim verdicts list: 12 of 16 claims supported, a colour-coded support breakdown, and a verdict for each claim with cited paper IDs

Grounded

Cited papers directly support the claim along all relevant axes; the cross-paper check passes.

Weakly grounded

Support exists but is qualified — a broader population, a surrogate outcome, or a single small cohort.

Not grounded

No retrieved paper substantively supports the claim.

Contradicted

At least one cited paper actively conflicts with the claim along a substantive axis.

Literature disagrees

Multiple credible papers in the evidence set point in opposing directions; dissent is preserved, with its temporal direction.

Abstain (scope-rejected)

The claim falls outside the locked scope or is trivially un-evaluable; not pursued.

Escalation-dropped

The reasoning budget was spent before a verdict resolved — kept distinct from scope rejection in the audit trail.

How a review gets built

Seven phases. Each one produces an artefact you can reproduce from its inputs alone.

01

Planning

The question is decomposed into a research plan — PICO framing where it applies, inclusion criteria, and the review posture (systematic, scoping, or narrative) — recorded alongside the active policy version.

02

Retrieval

A curated biomedical corpus (PMID, PMCID, DOI, plus arXiv preprints) is queried with hybrid dense-plus-lexical search. Every run records a cryptographic fingerprint of the corpus, and dedicated slots protect dissenting evidence from being filtered out.

03

Study

Each retained paper becomes a structured record — study design, population, sample size — plus a faithful summary that feeds both drafting and validation.

04

Drafting

A long-context model composes the prose; every substantive claim is tagged with supporting paper IDs from the fixed set. It cannot reference a paper outside it.

05

Per-claim validation

Prose is decomposed into atomic claims, each graded per-axis, weight-aggregated, run through the dissent check, and escalated when contested. Every call carries a wall-clock budget.

06

Revision

The writer sees structured verdicts, reframing hints, and concrete replacement citations from the same evidence set. Already-grounded claims are held stable so verified work isn’t re-judged.

07

Audit output

The review ships with prose, the per-claim verdict record, passage anchors, the policy version, the corpus fingerprint, a PRISMA 2020 four-stage flow, and a provenance-carrying bibliography.

External benchmark · diabetes

One run, one topic, one corroborating data point — fully reproducible.

The primary endpoint

+18.8 pts per-claim numeric fidelity over the strongest peer.

On ten pre-registered diabetes review questions, graded by a three-model panel across two families, Syno led Elicit by a per-question mean of +18.8 percentage points (95% CI [+9.4, +28.2]) — ahead in 9 of 10 questions. The lead survives every judge subset.

Citation existence — the trust floor

Syno 100.0%
Consensus 99.5%
Elicit 98.4%
Gemini Deep Research 69.2%

Three tools cluster at the top; the generalist agent is the outlier, with only 69.2% of citations resolving to indexed papers.

Honest scope: diabetes only, one capture per tool, graded by an external model panel rather than human experts. Syno was graded on full text; abstract-only tools on abstracts — each tool’s real evidence base. It’s a corroborating data point, not the thesis.

Download the data package — every question, every raw verdict →

Methodological lineage

Built on the evidence-synthesis canon — and we name it.

A literature-review agent that doesn’t engage with these standards isn’t a literature-review agent; it’s a summariser of search results.

PRISMA 2020PRISMA-SCochrane HandbookGRADESANRACochrane–Campbell–JBI–CEE Joint Position 2025PRISMA-trAIce

Prove your next review — claim by claim.

Run a citation-faithful literature review on real data, or book a demo to see the verdict trail on your own questions.

Every verdict is reproducible from the published evidence set and policy version.