Skip to content

Part of the Syno platform · Literature review

Reviews whose every claim you can verify in seconds rather than hours.

Syno is a biomedical literature-review agent that treats citation faithfulness as a property to measure rather than assert. Every claim is grounded in a retrieved passage and judged against the cited paper on independent axes: population, numerics, outcome, recency. Each verdict carries the exact passage it was scored against, so you can trace any sentence back to the paragraph that justifies it.

Syno is built for reviews a human can check claim by claim, not for raw speed over a human reviewer.

Read the full whitepaper →

Syno literature review workbench — a completed review showing the phase tracker, the Sources tab with retrieved papers, a publication-year distribution, and a PRISMA flow

The citation crisis

The papers are real. The sentences built around them are wrong.

The field trained itself to watch for fabricated citations. Peer-reviewed audits now show the dominant failure is subtler. The DOI resolves and the authors check out, yet the sentence built around the paper overstates it, misreports its population, or reverses what it found.

47–50%

of references from leading generalist deep-research tools carried fabricated authors or titles on medical prompts.

JMIR 2026 · Wong, Ong, Merle & Keane

>50%

of the strongest tool’s cited statements still contained at least one subtle inaccuracy or misrepresentation of the source.

OpenAI Deep Research, same evaluation

75–83%

the per-claim accuracy careful human reviewers reach. AI errors belong against that band, not against zero.

Baethge & Jergas meta-analysis and others

Design principles

We measure citation faithfulness rather than assert it.

Six principles, each answering a documented failure mode and grounded in established evidence-synthesis methodology.

Closes citation fabrication

Retrieval-first grounding

No claim enters the output without a retrieved passage. Planning, drafting, and validation all draw from one fixed evidence set, fingerprinted at retrieval. The writer cannot name a paper that isn’t in it.

Catches misinterpretation

Per-axis judgment

Each (claim, paper) pair is judged on four independent axes: population, numerics, outcome, recency. A paper can entail a claim’s direction and still fail the population check. That mismatch is preserved through aggregation and surfaced to the writer.

Displaces vote-counting

Quality-weighted aggregation

Meta-analyses and randomised trials carry the most weight, case series and mechanistic work the least. Weights are bounded so no single study overwhelms a contradicting body of evidence: a 50-patient pilot is not one vote against a 5,000-patient trial.

Surfaces suppressed dissent

Cross-paper dissent check

A separate pass targets every grounded claim that cites just one paper, pulls top uncited candidates from the same round, and demotes the claim if a supermajority contradict it. Silence is not counted as disagreement.

Surfaces contestation

Multi-tier, cross-family escalation

When a first-pass verdict is low-confidence or contested, the same input is re-judged by a deeper model from a different family and vendor. The escalation is recorded rather than averaged away, and because the fallback chain crosses vendors, a single-provider outage falls back to a different family instead of degrading the output.

Makes reviews reproducible

Auditable, version-stamped verdicts

Every verdict carries cited paper IDs, a passage anchor, per-axis judgments, escalation history, and a policy version. Change the scoring regime and the version bumps; old verdicts are re-judged on the next run, not reused.

Per-claim trust

Every claim carries its own verdict.

A 7,000-word review holds hundreds of claims of widely varying defensibility. An overall “trust me” score tells a reader nothing about which paragraph to scrutinise. Syno emits a verdict for each claim, with the per-axis judgments that produced it and a passage anchor a reader can open in seconds.

Syno literature review — the claim verdicts list: 12 of 16 claims supported, a colour-coded support breakdown, and a verdict for each claim with cited paper IDs

Grounded

Cited papers directly support the claim along all relevant axes; the cross-paper check passes.

Weakly grounded

Support exists but is qualified: a broader population, a surrogate outcome, or a single small cohort.

Not grounded

No retrieved paper substantively supports the claim.

Contradicted

At least one cited paper actively conflicts with the claim along a substantive axis.

Literature disagrees

Multiple credible papers in the evidence set point in opposing directions; dissent is preserved, with its temporal direction.

Abstain (scope-rejected)

The claim falls outside the locked scope or is trivially un-evaluable; not pursued.

Escalation-dropped

The escalation budget was spent before a verdict resolved; kept distinct from scope rejection in the audit trail.

How a review gets built

Seven phases, each one producing an artefact you can reproduce from its inputs alone.

01

Planning

The question is decomposed into a research plan: PICO framing where it applies, inclusion criteria, and the review posture (systematic, scoping, or narrative). The plan is recorded alongside the active policy version.

02

Retrieval

A curated biomedical corpus (PMID, PMCID, DOI, plus arXiv preprints) is queried with hybrid dense-plus-lexical search. Every run records a cryptographic fingerprint of the corpus, so a reader can confirm two runs hit the same evidence base, and dedicated slots keep dissenting evidence from being filtered out.

03

Study

Each retained paper becomes a structured record (study design, population, sample size when stated) plus a faithful summary. Both feed drafting and validation.

04

Drafting

A long-context model composes the prose, tagging every substantive claim with supporting paper IDs from the fixed set. The writer cannot cite a paper outside it.

05

Per-claim validation

Prose is decomposed into atomic claims. Each is graded per-axis, weight-aggregated, run through the dissent check, and escalated when contested, with a wall-clock budget on every call so a stuck escalation can’t consume the whole compute budget.

06

Revision

The writer sees structured verdicts, reframing hints, and concrete replacement citations from the same evidence set. Already-grounded claims are held stable, so verified work isn’t re-judged round after round.

07

Audit output

The review ships with the prose and everything needed to check it: the per-claim verdict record, passage anchors, the policy version, the corpus fingerprint, a PRISMA 2020 four-stage flow, and a provenance-carrying bibliography.

External benchmark · diabetes

One topic, one run per tool, fully reproducible from the published data.

The primary endpoint

+18.8 pts per-claim numeric fidelity over Elicit, the strongest peer.

On ten pre-registered diabetes review questions, graded by a three-model panel across two families, Syno led Elicit by a per-question mean of +18.8 percentage points (95% CI [+9.4, +28.2]), and was ahead in 9 of 10 questions. The lead held across every judge subset we cut the data by.

Citation existence: the trust floor

Syno 100.0%
Consensus 99.5%
Elicit 98.4%
Gemini Deep Research 69.2%

The three purpose-built tools cluster at the top. The generalist agent is the outlier: only 69.2% of its citations resolve to an indexed scientific paper.

Honest scope: diabetes only, one capture per tool, graded by an external model panel rather than human experts. Syno was graded on full text; abstract-only tools on abstracts, the evidence base each one actually serves in production. We read this as one corroborating data point rather than the thesis.

Download the data package: every question, every raw verdict →

Methodological lineage

We build on the evidence-synthesis canon, and name every standard.

A review agent that ignores these standards is closer to a search-results summariser than to a literature-review instrument.

PRISMA 2020PRISMA-SCochrane HandbookGRADESANRACochrane–Campbell–JBI–CEE Joint Position 2025PRISMA-trAIce

Prove every sentence in your next review.

Run a citation-faithful review on your own questions, or book a demo and watch a claim resolve back to the passage that supports it.

Re-run the same questions over the same evidence set and policy version, and you get back the same conclusions.