ClawBio WorkshopFrom summary statistics to causal variants
Dr Manuel Corpas · University of Westminster · 2026
In the previous session you annotated a single genome and found clinically relevant variants using VEP, ClinVar, gnomAD, and CPIC.
Now we scale up. GWAS looks across thousands of genomes to find variants associated with disease at the population level.
This session: ~10 min slides, then ~20 min hands-on in Google Colab. By the end you will have queried nine GWAS databases, computed a polygenic risk score, and fine-mapped a locus to identify causal variants.
Part 1
Test every variant in the genome for association with a trait across thousands of people.
The last number is the problem. Most GWAS findings may not transfer to non-European populations.
Published GWAS release summary statistics: per-variant results without individual genotypes.
| Field | Meaning | Example |
|---|---|---|
rsid | Variant identifier | rs7903146 |
beta | Effect size (log-odds or per-allele) | 0.31 |
se | Standard error of beta | 0.02 |
p | P-value for association | 5.2 x 10-38 |
MAF | Minor allele frequency | 0.28 |
Key insight: Summary statistics are public, free, and enough to run PRS, meta-analysis, and fine-mapping. No HPC. No data access agreements. No individual-level data.
Part 2
Query 9 databases in parallel for a single variant: GWAS Catalog, Open Targets, UKB, FinnGen, Biobank Japan, GTEx, eQTL Catalogue.
Compute polygenic risk scores from 23andMe or AncestryDNA data using 3,000+ published scores from the PGS Catalog.
Apply SuSiE to identify credible sets of likely causal variants from summary statistics. No individual-level data needed.
Give it an rsID. It queries nine databases in parallel and returns a unified report.
| Database | What it returns | Ancestry |
|---|---|---|
| GWAS Catalog | Published trait associations | Mixed |
| Open Targets | Credible sets, L2G scores | Mixed |
| UKB-TOPMed PheWeb | PheWAS across 4,500 phenotypes | Multi-ancestry |
| FinnGen r12 | Finnish disease endpoints | Finnish |
| Biobank Japan | East Asian PheWAS | Japanese |
| GTEx v8 | eQTL tissue expression | Mostly European |
| EBI eQTL Catalogue | Multi-tissue eQTL associations | Mixed |
One command: python gwas_lookup.py --rsid rs7903146
A PRS sums the effects of many variants into a single risk estimate.
Formula: PRS = ∑ (dosagei × effect_weighti) across all matched variants
ClawBio ships with 6 curated scores for instant demos:
| Trait | PGS ID | Variants |
|---|---|---|
| Type 2 diabetes | PGS000013 | 8 |
| Coronary artery disease | PGS000004 | 46 |
| Breast cancer | PGS000001 | 77 |
| Prostate cancer | PGS000057 | 147 |
| Atrial fibrillation | PGS000011 | 12 |
| BMI | PGS000039 | 97 |
Risk categories: Low (<20th) · Average (20-80th) · Elevated (80-95th) · High (>95th)
GWAS finds associated regions. Fine-mapping finds the causal variants within them.
The infrastructure barrier is gone.
Summary statistics are small. Everything runs in Google Colab on a free tier.
Summary statistics are publicly released. No application, no waiting, no ethics board.
ClawBio wraps the full pipeline. One command per analysis.
Google Colab is free. ClawBio is MIT-licensed. PGS Catalog is open. All databases are public APIs.
Implication for the Global South: A researcher in Lima, Kampala, or Dhaka can run the same GWAS analyses as one at the Broad Institute. Today.
Part 3
Open Google Colab now
| Step | Task | Time |
|---|---|---|
| 1 | Setup: install ClawBio in Colab (same as variant interpretation) | 2 min |
| 2 | GWAS Lookup: query rs7903146 (type 2 diabetes) across 9 databases | 5 min |
| 3 | Compare allele frequencies across UKB, FinnGen, and Biobank Japan | 3 min |
| 4 | PRS: compute polygenic risk scores for 6 traits using the Corpasome | 5 min |
| 5 | Fine-mapping: run SuSiE on a demo locus with 2 causal signals | 5 min |
Requirements: Same as before. A Google account and a web browser. Nothing to install.
| rsID | Gene / Locus | Trait | Why it matters |
|---|---|---|---|
| rs7903146 | TCF7L2 | Type 2 diabetes | Strongest common T2D signal. OR 1.4 per allele. |
| rs429358 | APOE | Alzheimer's | Already found in variant interpretation. Now see the GWAS context. |
| rs3798220 | LPA | Heart disease | Lipoprotein(a). Risk factor for coronary events. |
| rs1801282 | PPARG | Type 2 diabetes | Drug target for thiazolidinediones. |
| Resource | Link |
|---|---|
| ClawBio GitHub | github.com/ClawBio/ClawBio |
| Variant Interpretation Workshop | Previous workshop |
| GWAS Catalog | ebi.ac.uk/gwas |
| PGS Catalog | pgscatalog.org |
| Open Targets | genetics.opentargets.org |
| Corpasome (Zenodo) | doi:10.5281/zenodo.19297389 |
| Ensembl VEP | ensembl.org/vep |
| gnomAD | gnomad.broadinstitute.org |