ClawBio logoClawBio Workshop

Agentic GWAS

From summary statistics to causal variants

Dr Manuel Corpas · University of Westminster · 2026

Where we are

In the previous session you annotated a single genome and found clinically relevant variants using VEP, ClinVar, gnomAD, and CPIC.

Now we scale up. GWAS looks across thousands of genomes to find variants associated with disease at the population level.

This session: ~10 min slides, then ~20 min hands-on in Google Colab. By the end you will have queried nine GWAS databases, computed a polygenic risk score, and fine-mapped a locus to identify causal variants.

Learning objectives

  1. Explain what a GWAS is and what summary statistics contain
  2. Query nine federated databases for variant associations in seconds
  3. Calculate polygenic risk scores from published PGS Catalog scores
  4. Apply SuSiE fine-mapping to identify credible sets of causal variants
  5. Recognise cross-ancestry gaps in GWAS representation
  6. Run all three analyses in Google Colab with zero infrastructure

Part 1

What is a GWAS?

GWAS in one slide

Test every variant in the genome for association with a trait across thousands of people.

6,000+published GWAS
500M+participants (cumulative)
90,000+trait associations
86%European ancestry

The last number is the problem. Most GWAS findings may not transfer to non-European populations.

What are summary statistics?

Published GWAS release summary statistics: per-variant results without individual genotypes.

FieldMeaningExample
rsidVariant identifierrs7903146
betaEffect size (log-odds or per-allele)0.31
seStandard error of beta0.02
pP-value for association5.2 x 10-38
MAFMinor allele frequency0.28

Key insight: Summary statistics are public, free, and enough to run PRS, meta-analysis, and fine-mapping. No HPC. No data access agreements. No individual-level data.

Part 2

Three skills, one pipeline

The GWAS pipeline

GWAS Lookup
PRS Calculator
Fine-Mapping

GWAS Lookup

Query 9 databases in parallel for a single variant: GWAS Catalog, Open Targets, UKB, FinnGen, Biobank Japan, GTEx, eQTL Catalogue.

PRS Calculator

Compute polygenic risk scores from 23andMe or AncestryDNA data using 3,000+ published scores from the PGS Catalog.

Fine-Mapping

Apply SuSiE to identify credible sets of likely causal variants from summary statistics. No individual-level data needed.

Skill 1: GWAS Lookup

Give it an rsID. It queries nine databases in parallel and returns a unified report.

DatabaseWhat it returnsAncestry
GWAS CatalogPublished trait associationsMixed
Open TargetsCredible sets, L2G scoresMixed
UKB-TOPMed PheWebPheWAS across 4,500 phenotypesMulti-ancestry
FinnGen r12Finnish disease endpointsFinnish
Biobank JapanEast Asian PheWASJapanese
GTEx v8eQTL tissue expressionMostly European
EBI eQTL CatalogueMulti-tissue eQTL associationsMixed

One command: python gwas_lookup.py --rsid rs7903146

Cross-ancestry: why it matters

The problem

  • 86% of GWAS participants are of European ancestry
  • Effect sizes and allele frequencies differ between populations
  • PRS trained on Europeans perform poorly in African and South Asian populations
  • Risk of widening health disparities

What ClawBio does

  • Queries UKB, FinnGen, and Biobank Japan in one call
  • Flags allele frequency differences across populations
  • MVP (Million Veteran Program) is the most diverse GWAS cohort: 33% non-European
  • Uganda Genome Resource: 6,407 samples, African ancestry GWAS

Skill 2: Polygenic Risk Scores

A PRS sums the effects of many variants into a single risk estimate.

Formula: PRS = ∑ (dosagei × effect_weighti) across all matched variants

ClawBio ships with 6 curated scores for instant demos:

TraitPGS IDVariants
Type 2 diabetesPGS0000138
Coronary artery diseasePGS00000446
Breast cancerPGS00000177
Prostate cancerPGS000057147
Atrial fibrillationPGS00001112
BMIPGS00003997

Risk categories: Low (<20th) · Average (20-80th) · Elevated (80-95th) · High (>95th)

Skill 3: SuSiE Fine-Mapping

GWAS finds associated regions. Fine-mapping finds the causal variants within them.

Without fine-mapping

  • A GWAS hit spans 10-200 correlated SNPs in LD
  • Which one is causal? All look equally significant
  • Manual triage: slow, subjective, error-prone

With SuSiE

  • Credible sets: minimal set of SNPs capturing 95% of causal probability
  • PIPs: posterior inclusion probability per variant
  • Handles multiple causal signals per locus
  • Works from summary statistics alone

Democratising GWAS

The infrastructure barrier is gone.

No HPC

Summary statistics are small. Everything runs in Google Colab on a free tier.

No data access agreements

Summary statistics are publicly released. No application, no waiting, no ethics board.

No bioinformatics team

ClawBio wraps the full pipeline. One command per analysis.

No cost

Google Colab is free. ClawBio is MIT-licensed. PGS Catalog is open. All databases are public APIs.

Implication for the Global South: A researcher in Lima, Kampala, or Dhaka can run the same GWAS analyses as one at the Broad Institute. Today.

Part 3

Hands-on Practical

Open Google Colab now

What you will do

StepTaskTime
1Setup: install ClawBio in Colab (same as variant interpretation)2 min
2GWAS Lookup: query rs7903146 (type 2 diabetes) across 9 databases5 min
3Compare allele frequencies across UKB, FinnGen, and Biobank Japan3 min
4PRS: compute polygenic risk scores for 6 traits using the Corpasome5 min
5Fine-mapping: run SuSiE on a demo locus with 2 causal signals5 min

Requirements: Same as before. A Google account and a web browser. Nothing to install.

Variants we will explore

rsIDGene / LocusTraitWhy it matters
rs7903146TCF7L2Type 2 diabetesStrongest common T2D signal. OR 1.4 per allele.
rs429358APOEAlzheimer'sAlready found in variant interpretation. Now see the GWAS context.
rs3798220LPAHeart diseaseLipoprotein(a). Risk factor for coronary events.
rs1801282PPARGType 2 diabetesDrug target for thiazolidinediones.

Take-home messages

  1. GWAS summary statistics are free and public. You do not need individual-level data to do meaningful research.
  2. Three ClawBio skills cover the full GWAS workflow: lookup, PRS, and fine-mapping.
  3. Cross-ancestry analysis is not optional. Most GWAS are European-biased. Query multiple biobanks to check transferability.
  4. Fine-mapping narrows GWAS hits to causal variants. SuSiE credible sets are the state of the art.
  5. Infrastructure is no longer a barrier. Google Colab + ClawBio = publication-quality GWAS analysis for free.

Resources

ResourceLink
ClawBio GitHubgithub.com/ClawBio/ClawBio
Variant Interpretation WorkshopPrevious workshop
GWAS Catalogebi.ac.uk/gwas
PGS Catalogpgscatalog.org
Open Targetsgenetics.opentargets.org
Corpasome (Zenodo)doi:10.5281/zenodo.19297389
Ensembl VEPensembl.org/vep
gnomADgnomad.broadinstitute.org