ClawBio logoClawBio Workshop

Variant Interpretation

From a raw genome file to clinical findings

Dr Manuel Corpas · University of Westminster · 2026

Where we are

You have heard about ClawBio, how skills work, and the databases behind variant annotation. Now you will run the full pipeline yourself on a real genome.

The genome belongs to Dr Manuel Corpas, published under CC0 public domain in 2013. Everything you find is real.

This session: ~10 min slides, then ~20 min hands-on in Google Colab. By the end you will have annotated 21 clinical variants, identified five actionable findings, and generated a pharmacogenomics report.

Learning objectives

  1. Load and explore a consumer genotyping file (~600,000 SNPs)
  2. Select clinically relevant variants from a genome-wide dataset
  3. Annotate variants using Ensembl VEP, ClinVar, and gnomAD in one command
  4. Generate a pharmacogenomics report using CPIC guidelines
  5. Classify variants into priority tiers and explain clinical significance
  6. Run all analyses in Google Colab with zero infrastructure

Part 1

The data

What a genotyping file looks like

Consumer tests (23andMe, AncestryDNA) produce a text file with four columns:

rsIDChromosomePositionGenotype
rs60251169519049CT
rs1800562626093141GA
rs1139939607117559590DI
rs99232311631107689TT
~600KSNPs per file
21we will annotate
5actionable findings
0API keys needed

The 21 variants we will annotate

From 600,000 positions, ClawBio selects variants in genes with established clinical evidence:

Drug metabolism

CYP2C19, CYP2D6, CYP2C9, VKORC1, TPMT, DPYD, MTHFR

Disease risk

Factor V (clotting), CFTR (cystic fibrosis), HFE (iron), APOE (Alzheimer's)

Cancer predisposition

BRCA1, TP53

Each variant is converted to VCF format and sent to the Ensembl VEP REST API (free, public, no key required).

Part 2

What the pipeline returns

Priority tiers

ClawBio assigns every annotated variant to a tier based on clinical evidence and frequency:

TierDefinitionExample
Tier 1Pathogenic/likely pathogenic, rare (<0.1%), highest clinical relevanceCFTR deltaF508 (CF carrier)
Tier 2Drug response or established risk factor, actionable under CPICVKORC1 warfarin sensitivity
Tier 3VUS: insufficient evidence for classificationRare missense without ClinVar entry
Tier 4Benign/likely benign, common (>1%)MTHFR A1298C

Reading the output table

The annotation report includes one row per variant with these columns:

ColumnWhat it tells you
geneGene harbouring the variant
consequenceProtein impact: missense, synonymous, frameshift
impactHIGH (protein disruption), MODERATE, LOW, MODIFIER
clinvar_significancePathogenic, Likely pathogenic, VUS, Benign, Drug response
gnomad_afGlobal allele frequency. Below 0.001 (0.1%) = rare
priority_tierClawBio tier 1-4 (see previous slide)

Part 3

Six key findings

Factor V Leiden Tier 1

rs6025 in gene F5. Heterozygous carrier.

What it does

Makes blood clotting factor V resistant to inactivation. Blood clots more easily than it should.

Risk

3-8x increased risk of deep vein thrombosis (DVT). ~5% of Europeans carry this variant.

Clinical actions: Contraceptive counselling (oestrogen increases clot risk further). Perioperative planning. Long-haul flight precautions. Family testing.

Warfarin sensitivity Tier 2

Two genes, one dangerous interaction:

VKORC1 (rs9923231)

TT homozygous. Warfarin's target. TT genotype = high sensitivity. The drug works too well.

CYP2C9*2 (rs1799853)

Heterozygous carrier. Slows warfarin metabolism. The drug stays in the body longer.

Combined effect: CPIC recommends starting at 50-75% lower dose. Without genetic testing, a standard dose could cause a haemorrhage. This is the textbook example of pharmacogenomics saving lives.

Carrier findings Tier 1

CFTR deltaF508 (rs113993960)

Heterozygous carrier. Cystic fibrosis requires two copies. ~1 in 25 Europeans carry one mutation.

Action: Partner testing recommended. If both parents carry a mutation, 25% chance of affected child.

HFE C282Y (rs1800562)

Heterozygous carrier. Elevated iron absorption. Homozygotes develop hereditary haemochromatosis.

Action: Serum ferritin monitoring. Carriers virtually never develop disease. Homozygotes: blood donation regimen.

Risk factors and common variants

APOE e3/e4 (rs429358)

~3-fold increased late-onset Alzheimer's risk vs e3/e3. Two copies (e4/e4) = ~12-fold.

Probabilistic risk factor, not a diagnosis. Many carriers remain unaffected. Requires genetic counselling due to psychological impact.

MTHFR C677T (rs1801133) Tier 2

Heterozygous. ~65% enzyme activity. 30-40% of Europeans carry this.

One of the most over-interpreted variants in consumer genomics. Heterozygotes typically need no intervention. Homozygotes may benefit from methylfolate in pregnancy.

Part 4

Hands-on Practical

Open Google Colab now

What you will do

StepTaskTime
0Setup: install ClawBio in Colab (two cells, ~15 seconds)2 min
1Explore the Corpasome: load 600K SNPs, inspect chromosome distribution5 min
2Select 21 clinically relevant variants and convert to VCF3 min
3Annotate with VEP, ClinVar, gnomAD: one command, full results table5 min
4Pharmacogenomics: CPIC drug-gene mapping and dosing recommendations5 min
5Exercises: demo data, personal upload (optional), variant research10 min

Requirements: A Google account and a web browser. Nothing to install. No API keys. No payment.

Exercises

ExerciseTask
5aRerun the pipeline on 20 synthetic variants using the --demo flag. Compare your output to the expected results.Required
5bUpload your own 23andMe or AncestryDNA file (if you have one) and run the pipeline on your data.Optional
5cPick one gene from the output. Research: what is its ACMG classification? How common is it? Would you report it?Required

What SNP arrays miss

Consumer genotyping arrays test ~600,000 positions out of 3 billion in the genome.

Not detected

  • Structural variants (deletions, duplications, inversions)
  • Most rare variants (private to individual or family)
  • Copy number changes
  • Short insertions/deletions in most regions

Detected

  • Common SNPs at pre-selected positions
  • Well-studied pharmacogenomic variants
  • Established disease-risk variants
  • Ancestry-informative markers

A "clear" result does not mean the genome is free of pathogenic variants. It means the array did not test the position where a pathogenic variant might exist.

Take-home messages

  1. 21 variants, five actionable findings. A real genome with real clinical implications, analysed in minutes.
  2. Pharmacogenomics prevents harm today. The warfarin double-hit in this genome would change a prescription right now.
  3. Carrier status matters for family planning. CFTR and HFE findings are not about the individual alone.
  4. "We don't know yet" is honest. Over half of all variants are VUS. Resist the pressure to over-interpret.
  5. SNP arrays have blind spots. A negative result is not a clean bill of health. 30x WGS sees far more.