ClawBio logoClawBio Workshop

Agentic Genomics

Leverage the power of AI in genomics

Dr Manuel Corpas · University of Westminster · 2026

Part 1

What is ClawBio?

ClawBio in one sentence

An open-source toolkit of AI agent skills for genomic analysis.

You give it genetic data. It runs the right analysis. You get a structured report.

39+bioinformatics skills
488+GitHub stars
13contributors
0cost

The problem ClawBio solves

Without ClawBio

  • Download VEP, configure databases, debug dependencies
  • Submit variants to web APIs one by one
  • Parse raw JSON output by hand
  • Cross-reference ClinVar, gnomAD, CPIC separately
  • AI chatbots invent gene-drug associations

Hours to days per genome

With ClawBio

  • One command runs the full pipeline
  • Built-in demo data for every skill
  • Structured reports in markdown and JSON
  • Grounded in CPIC, ClinVar, gnomAD
  • All processing is local: your data never leaves

Minutes per analysis

How a skill works

SKILL.md
Python script
Demo data
Report

SKILL.md

The contract. Defines inputs, outputs, domain decisions, safety rules. The AI reads this; it never overrides it.

Python script

The implementation. Runs the pipeline, writes structured output. All processing is local.

Demo data

Synthetic test data with every skill. Run a demo instantly without real patient data.

Report

Markdown + JSON. Human-readable and machine-parseable. Includes disclaimers and provenance.

Part 2

Variant Interpretation

From a single genome to clinical findings

The annotation pipeline

Your variants (VCF)
Ensembl VEP
ClinVar
gnomAD
Report
DatabaseWhat it tells you
Ensembl VEPWhat gene is affected? Missense, synonymous, frameshift?
ClinVarHas this variant been seen in patients? Pathogenic or benign?
gnomADHow common is it across 76,000+ genomes?
CPICDoes it affect drug metabolism? Which drugs?

ACMG classification

The American College of Medical Genetics uses five categories:

CategoryWhat it meansClinical action
PathogenicCauses or strongly contributes to diseaseReport. Genetic counselling.
Likely pathogenicStrong evidence, not conclusiveReport with caveat.
VUSNot enough evidenceNot acted upon. May be reclassified.
Likely benignProbably harmlessUsually not reported.
BenignDefinitively harmlessNot reported.

More than half of all known variants are VUS. "We don't know yet" is often the most honest answer.

The Corpasome: a real open genome

In 2013, Manuel Corpas published his genome under CC0 public domain. Real findings you will discover:

VariantGeneClinical meaning
rs6025F5 (Factor V Leiden)3-8x increased blood clotting risk
rs1800562HFE (C282Y)Carrier for haemochromatosis
rs113993960CFTR (deltaF508)Carrier for cystic fibrosis
rs9923231VKORC1Warfarin: AVOID standard dose
rs429358APOEElevated Alzheimer's risk (e3/e4)

Pharmacogenomics: drugs meet DNA

GeneDrugs affectedRisk
CYP2D6Codeine, tamoxifen, antidepressantsPoor metabolisers get zero pain relief from codeine
CYP2C19Clopidogrel (blood thinner)Drug not activated; stroke risk unchanged
CYP2C9 + VKORC1WarfarinWrong dose causes bleeding or clots
DPYD5-fluorouracil (chemo)Standard dose can be fatal

The Corpasome needs a significantly lower warfarin dose than standard. Without genetic testing, a doctor might prescribe a dose that causes a haemorrhage.

Part 3

Agentic GWAS

From one genome to population-level analysis

GWAS in one slide

Test every variant for association with a trait across thousands of people.

6,000+published GWAS
500M+participants
90,000+trait associations
86%European ancestry

Summary statistics are public, free, and sufficient for PRS, meta-analysis, and fine-mapping. No HPC needed.

Three GWAS skills, one pipeline

GWAS Lookup
PRS Calculator
Fine-Mapping

GWAS Lookup

Query 9 databases in parallel: GWAS Catalog, Open Targets, UKB, FinnGen, Biobank Japan, GTEx, eQTL Catalogue.

PRS Calculator

Compute polygenic risk scores from 23andMe/AncestryDNA data using 3,000+ PGS Catalog scores.

Fine-Mapping

SuSiE credible sets to identify causal variants from summary statistics. No individual data needed.

Cross-ancestry: the equity gap in GWAS

The problem

  • 86% of GWAS participants are European
  • PRS lose up to 80% accuracy in non-European populations
  • Risk of widening health disparities

What ClawBio does

  • Queries UKB, FinnGen, and Biobank Japan in one call
  • Flags allele frequency differences across populations
  • MVP is the most diverse GWAS cohort: 33% non-European

Part 4

30x Whole-Genome Sequencing

The same genome at full resolution

SNP array vs whole-genome sequencing

SNP Array (~600K)30x WGS
SNPs~600,0003,716,648
Indels0912,009
Structural variants08,925
Copy number variants01,387
CoveragePre-selected positionsEvery base

SNP arrays answer pre-defined questions. WGS lets you ask questions you did not know to ask.

Structural variants: the hidden layer

TypeWhat happensCount
DELDNA segment missing5,854
BNDTranslocation or complex rearrangement1,413
DUPSegment copied (alters gene dosage)778
INVSegment flipped in orientation673
INSNew DNA inserted (mobile elements)207

Structural variants cause an estimated 20% of rare genetic disease but are invisible to SNP arrays.

QC metrics: is this genome trustworthy?

MetricExpectedThis genomeWhat it tells you
Ti/Tv ratio~2.02.03Below 2.0 suggests errors or contamination
Het/Hom ratio1.5-1.71.63Outside range may indicate contamination
Total SNPs3.5-4.5M3,716,648Too few = low coverage; too many = errors

Always check QC before interpreting. These three numbers tell you immediately whether the data is trustworthy.

Part 5

Equity in Genomics

Whose genomes are studied? Whose are left out?

The numbers

  • 86% of GWAS participants are of European ancestry
  • Databases have 30x more BRCA variant data for Europeans than other populations
  • Polygenic risk scores lose up to 80% accuracy in non-European populations
  • A variant classified as "benign" may simply be unstudied in the relevant population

AI amplifies this. Models trained on biased data produce biased results. Every analysis should acknowledge which populations the reference data represents.

Democratising genomics

The infrastructure barrier is gone.

No HPC

Everything runs in Google Colab on a free tier.

No data agreements

Summary statistics are publicly released. No application, no waiting.

No bioinformatics team

ClawBio wraps the full pipeline. One command per analysis.

No cost

Google Colab is free. ClawBio is MIT-licensed. All databases are public.

A researcher in Lima, Kampala, or Dhaka can run the same analyses as one at the Broad Institute. Today.

Part 6

Hands-on Practical

Open Google Colab now

Workshop schedule

SessionWhat you'll doTime
SetupInstall ClawBio in Colab (one click)2 min
Session 1: Variant InterpretationAnnotate 21 clinical variants from a real genome. VEP, ClinVar, gnomAD, CPIC.25 min
Session 2: GWASQuery rs7903146 across 9 databases. Compute PRS for 6 traits. Fine-map a locus with SuSiE.25 min
Session 3: 30x WGSExplore structural variants, QC metrics, and WGS vs SNP chip comparison.25 min
Q&ADiscussion and questions15 min

Requirements: A Google account and a web browser. Nothing to install. No API keys. No payment.

Take-home messages

  1. AI makes the mechanical steps faster, but domain expertise in molecular biology, genetics, and clinical context remains essential.
  2. Pharmacogenomics is saving lives today. Drug-gene interactions like warfarin/CYP2C9/VKORC1 are implemented in hospitals now.
  3. "We don't know yet" is a valid answer. Over half of all variants are VUS. Honest uncertainty is a strength, not a weakness.
  4. Cross-ancestry analysis is not optional. Always check whether findings transfer across populations.
  5. Infrastructure is no longer a barrier. Google Colab + ClawBio = publication-quality analysis, for free, anywhere.

Resources

ResourceLink
ClawBio GitHubgithub.com/ClawBio/ClawBio
Documentationdocs.clawbio.ai
Variant Interpretation tutorialdocs.clawbio.ai/tutorials/variant-interpretation-workshop
GWAS tutorialdocs.clawbio.ai/tutorials/gwas-workshop
30x WGS tutorialdocs.clawbio.ai/tutorials/30x-wgs-workshop
Corpasome (Zenodo)doi:10.5281/zenodo.19297389

github.com/ClawBio/ClawBio