ClawBio WorkshopLeverage the power of AI in genomics
Dr Manuel Corpas · University of Westminster · 2026
Part 1
An open-source toolkit of AI agent skills for genomic analysis.
You give it genetic data. It runs the right analysis. You get a structured report.
Hours to days per genome
Minutes per analysis
The contract. Defines inputs, outputs, domain decisions, safety rules. The AI reads this; it never overrides it.
The implementation. Runs the pipeline, writes structured output. All processing is local.
Synthetic test data with every skill. Run a demo instantly without real patient data.
Markdown + JSON. Human-readable and machine-parseable. Includes disclaimers and provenance.
Part 2
From a single genome to clinical findings
| Database | What it tells you |
|---|---|
| Ensembl VEP | What gene is affected? Missense, synonymous, frameshift? |
| ClinVar | Has this variant been seen in patients? Pathogenic or benign? |
| gnomAD | How common is it across 76,000+ genomes? |
| CPIC | Does it affect drug metabolism? Which drugs? |
The American College of Medical Genetics uses five categories:
| Category | What it means | Clinical action |
|---|---|---|
| Pathogenic | Causes or strongly contributes to disease | Report. Genetic counselling. |
| Likely pathogenic | Strong evidence, not conclusive | Report with caveat. |
| VUS | Not enough evidence | Not acted upon. May be reclassified. |
| Likely benign | Probably harmless | Usually not reported. |
| Benign | Definitively harmless | Not reported. |
More than half of all known variants are VUS. "We don't know yet" is often the most honest answer.
In 2013, Manuel Corpas published his genome under CC0 public domain. Real findings you will discover:
| Variant | Gene | Clinical meaning |
|---|---|---|
| rs6025 | F5 (Factor V Leiden) | 3-8x increased blood clotting risk |
| rs1800562 | HFE (C282Y) | Carrier for haemochromatosis |
| rs113993960 | CFTR (deltaF508) | Carrier for cystic fibrosis |
| rs9923231 | VKORC1 | Warfarin: AVOID standard dose |
| rs429358 | APOE | Elevated Alzheimer's risk (e3/e4) |
| Gene | Drugs affected | Risk |
|---|---|---|
| CYP2D6 | Codeine, tamoxifen, antidepressants | Poor metabolisers get zero pain relief from codeine |
| CYP2C19 | Clopidogrel (blood thinner) | Drug not activated; stroke risk unchanged |
| CYP2C9 + VKORC1 | Warfarin | Wrong dose causes bleeding or clots |
| DPYD | 5-fluorouracil (chemo) | Standard dose can be fatal |
The Corpasome needs a significantly lower warfarin dose than standard. Without genetic testing, a doctor might prescribe a dose that causes a haemorrhage.
Part 3
From one genome to population-level analysis
Test every variant for association with a trait across thousands of people.
Summary statistics are public, free, and sufficient for PRS, meta-analysis, and fine-mapping. No HPC needed.
Query 9 databases in parallel: GWAS Catalog, Open Targets, UKB, FinnGen, Biobank Japan, GTEx, eQTL Catalogue.
Compute polygenic risk scores from 23andMe/AncestryDNA data using 3,000+ PGS Catalog scores.
SuSiE credible sets to identify causal variants from summary statistics. No individual data needed.
Part 4
The same genome at full resolution
| SNP Array (~600K) | 30x WGS | |
|---|---|---|
| SNPs | ~600,000 | 3,716,648 |
| Indels | 0 | 912,009 |
| Structural variants | 0 | 8,925 |
| Copy number variants | 0 | 1,387 |
| Coverage | Pre-selected positions | Every base |
SNP arrays answer pre-defined questions. WGS lets you ask questions you did not know to ask.
| Type | What happens | Count |
|---|---|---|
| DEL | DNA segment missing | 5,854 |
| BND | Translocation or complex rearrangement | 1,413 |
| DUP | Segment copied (alters gene dosage) | 778 |
| INV | Segment flipped in orientation | 673 |
| INS | New DNA inserted (mobile elements) | 207 |
Structural variants cause an estimated 20% of rare genetic disease but are invisible to SNP arrays.
| Metric | Expected | This genome | What it tells you |
|---|---|---|---|
| Ti/Tv ratio | ~2.0 | 2.03 | Below 2.0 suggests errors or contamination |
| Het/Hom ratio | 1.5-1.7 | 1.63 | Outside range may indicate contamination |
| Total SNPs | 3.5-4.5M | 3,716,648 | Too few = low coverage; too many = errors |
Always check QC before interpreting. These three numbers tell you immediately whether the data is trustworthy.
Part 5
Whose genomes are studied? Whose are left out?
AI amplifies this. Models trained on biased data produce biased results. Every analysis should acknowledge which populations the reference data represents.
The infrastructure barrier is gone.
Everything runs in Google Colab on a free tier.
Summary statistics are publicly released. No application, no waiting.
ClawBio wraps the full pipeline. One command per analysis.
Google Colab is free. ClawBio is MIT-licensed. All databases are public.
A researcher in Lima, Kampala, or Dhaka can run the same analyses as one at the Broad Institute. Today.
Part 6
Open Google Colab now
| Session | What you'll do | Time |
|---|---|---|
| Setup | Install ClawBio in Colab (one click) | 2 min |
| Session 1: Variant Interpretation | Annotate 21 clinical variants from a real genome. VEP, ClinVar, gnomAD, CPIC. | 25 min |
| Session 2: GWAS | Query rs7903146 across 9 databases. Compute PRS for 6 traits. Fine-map a locus with SuSiE. | 25 min |
| Session 3: 30x WGS | Explore structural variants, QC metrics, and WGS vs SNP chip comparison. | 25 min |
| Q&A | Discussion and questions | 15 min |
Requirements: A Google account and a web browser. Nothing to install. No API keys. No payment.
| Resource | Link |
|---|---|
| ClawBio GitHub | github.com/ClawBio/ClawBio |
| Documentation | docs.clawbio.ai |
| Variant Interpretation tutorial | docs.clawbio.ai/tutorials/variant-interpretation-workshop |
| GWAS tutorial | docs.clawbio.ai/tutorials/gwas-workshop |
| 30x WGS tutorial | docs.clawbio.ai/tutorials/30x-wgs-workshop |
| Corpasome (Zenodo) | doi:10.5281/zenodo.19297389 |
github.com/ClawBio/ClawBio