ClawBio WorkshopFrom a raw genome file to clinical findings
Dr Manuel Corpas · University of Westminster · 2026
You have heard about ClawBio, how skills work, and the databases behind variant annotation. Now you will run the full pipeline yourself on a real genome.
The genome belongs to Dr Manuel Corpas, published under CC0 public domain in 2013. Everything you find is real.
This session: ~10 min slides, then ~20 min hands-on in Google Colab. By the end you will have annotated 21 clinical variants, identified five actionable findings, and generated a pharmacogenomics report.
Part 1
Consumer tests (23andMe, AncestryDNA) produce a text file with four columns:
| rsID | Chromosome | Position | Genotype |
|---|---|---|---|
| rs6025 | 1 | 169519049 | CT |
| rs1800562 | 6 | 26093141 | GA |
| rs113993960 | 7 | 117559590 | DI |
| rs9923231 | 16 | 31107689 | TT |
From 600,000 positions, ClawBio selects variants in genes with established clinical evidence:
CYP2C19, CYP2D6, CYP2C9, VKORC1, TPMT, DPYD, MTHFR
Factor V (clotting), CFTR (cystic fibrosis), HFE (iron), APOE (Alzheimer's)
BRCA1, TP53
Each variant is converted to VCF format and sent to the Ensembl VEP REST API (free, public, no key required).
Part 2
ClawBio assigns every annotated variant to a tier based on clinical evidence and frequency:
| Tier | Definition | Example |
|---|---|---|
| Tier 1 | Pathogenic/likely pathogenic, rare (<0.1%), highest clinical relevance | CFTR deltaF508 (CF carrier) |
| Tier 2 | Drug response or established risk factor, actionable under CPIC | VKORC1 warfarin sensitivity |
| Tier 3 | VUS: insufficient evidence for classification | Rare missense without ClinVar entry |
| Tier 4 | Benign/likely benign, common (>1%) | MTHFR A1298C |
The annotation report includes one row per variant with these columns:
| Column | What it tells you |
|---|---|
gene | Gene harbouring the variant |
consequence | Protein impact: missense, synonymous, frameshift |
impact | HIGH (protein disruption), MODERATE, LOW, MODIFIER |
clinvar_significance | Pathogenic, Likely pathogenic, VUS, Benign, Drug response |
gnomad_af | Global allele frequency. Below 0.001 (0.1%) = rare |
priority_tier | ClawBio tier 1-4 (see previous slide) |
Part 3
rs6025 in gene F5. Heterozygous carrier.
Makes blood clotting factor V resistant to inactivation. Blood clots more easily than it should.
3-8x increased risk of deep vein thrombosis (DVT). ~5% of Europeans carry this variant.
Clinical actions: Contraceptive counselling (oestrogen increases clot risk further). Perioperative planning. Long-haul flight precautions. Family testing.
Two genes, one dangerous interaction:
TT homozygous. Warfarin's target. TT genotype = high sensitivity. The drug works too well.
Heterozygous carrier. Slows warfarin metabolism. The drug stays in the body longer.
Combined effect: CPIC recommends starting at 50-75% lower dose. Without genetic testing, a standard dose could cause a haemorrhage. This is the textbook example of pharmacogenomics saving lives.
Heterozygous carrier. Cystic fibrosis requires two copies. ~1 in 25 Europeans carry one mutation.
Action: Partner testing recommended. If both parents carry a mutation, 25% chance of affected child.
Heterozygous carrier. Elevated iron absorption. Homozygotes develop hereditary haemochromatosis.
Action: Serum ferritin monitoring. Carriers virtually never develop disease. Homozygotes: blood donation regimen.
~3-fold increased late-onset Alzheimer's risk vs e3/e3. Two copies (e4/e4) = ~12-fold.
Probabilistic risk factor, not a diagnosis. Many carriers remain unaffected. Requires genetic counselling due to psychological impact.
Heterozygous. ~65% enzyme activity. 30-40% of Europeans carry this.
One of the most over-interpreted variants in consumer genomics. Heterozygotes typically need no intervention. Homozygotes may benefit from methylfolate in pregnancy.
Part 4
Open Google Colab now
| Step | Task | Time |
|---|---|---|
| 0 | Setup: install ClawBio in Colab (two cells, ~15 seconds) | 2 min |
| 1 | Explore the Corpasome: load 600K SNPs, inspect chromosome distribution | 5 min |
| 2 | Select 21 clinically relevant variants and convert to VCF | 3 min |
| 3 | Annotate with VEP, ClinVar, gnomAD: one command, full results table | 5 min |
| 4 | Pharmacogenomics: CPIC drug-gene mapping and dosing recommendations | 5 min |
| 5 | Exercises: demo data, personal upload (optional), variant research | 10 min |
Requirements: A Google account and a web browser. Nothing to install. No API keys. No payment.
| Exercise | Task | |
|---|---|---|
| 5a | Rerun the pipeline on 20 synthetic variants using the --demo flag. Compare your output to the expected results. | Required |
| 5b | Upload your own 23andMe or AncestryDNA file (if you have one) and run the pipeline on your data. | Optional |
| 5c | Pick one gene from the output. Research: what is its ACMG classification? How common is it? Would you report it? | Required |
Consumer genotyping arrays test ~600,000 positions out of 3 billion in the genome.
A "clear" result does not mean the genome is free of pathogenic variants. It means the array did not test the position where a pathogenic variant might exist.
| Resource | Link |
|---|---|
| ClawBio GitHub | github.com/ClawBio/ClawBio |
| Tutorial page | docs.clawbio.ai/tutorials/variant-interpretation-workshop |
| Ensembl VEP | ensembl.org/vep |
| ClinVar | ncbi.nlm.nih.gov/clinvar |
| gnomAD | gnomad.broadinstitute.org |
| CPIC guidelines | cpicpgx.org |
| Corpasome (Zenodo) | doi:10.5281/zenodo.19297389 |
| GWAS session (next) | docs.clawbio.ai/tutorials/gwas-workshop |