SomaticWrapper is a fully automated and modular pipeline for detecting somatic variants from paired tumor–normal WGS/WXS data on the LSF compute1 cluster (WashU).
It integrates multiple industry-standard variant callers — Strelka2, VarScan2, Mutect1, and Pindel — and produces comprehensive, annotated mutation calls in MAF format.
- SNV calls: intersection of 2 out of 3 callers — Strelka2, Mutect1, VarScan2
 - Indel calls: intersection of 2 out of 3 callers — Strelka2, VarScan2, Pindel
 - Reference genome: Human GRCh38 (HG38)
 - Scheduler: LSF (supports job dependencies and groups)
 
Final output files:
dnp.annotated.maf→ all variantsdnp.annotated.coding.maf→ coding variants only
- 
Added Step 0 — automatically submits the full pipeline (Steps 1 → 11) with job dependencies (
j2waits forj1, etc.). - 
Added Step 22 — automatically submits the full pipeline (Steps 2 → 11) with job dependencies (
j3waits forj2, etc.). - 
Added Step 23 — automatically submits the full pipeline (Steps 3 → 11) with job dependencies
 - 
Added Step 24 — automatically submits the full pipeline (Steps 4 → 11) with job dependencies
 - 
Added Step 25 — automatically submits the full pipeline (Steps 5 → 11) with job dependencies
 - 
Added Step 26 — automatically submits the full pipeline (Steps 6 → 11) with job dependencies
 - 
Added Step 27 — automatically submits the full pipeline (Steps 7 → 11) with job dependencies
 - 
Added Step 28 — automatically submits the full pipeline (Steps 8 → 11) with job dependencies
 - 
Added Step 29 — automatically submits the full pipeline (Steps 9 → 11) with job dependencies
 - 
Added Step 30 — automatically submits the full pipeline (Steps 10 → 11) with job dependencies
 
Before running, update your ~/.bashrc to include the necessary environment variables:
export PATH=/storage1/fs1/songcao/Active/Software/anaconda3/bin:$PATH
export STORAGE1=/storage1/fs1/songcao/Active
export STORAGE2=/storage1/fs1/dinglab/Active
export STORAGE3=/storage1/fs1/m.wyczalkowski/Active
export LSF_DOCKER_VOLUMES="$STORAGE1:$STORAGE1 $STORAGE2:$STORAGE2 $STORAGE3:$STORAGE3"Then activate:
source ~/.bashrcgit clone https://github.com/YourGitRepo/somaticwrapper.git
cd somaticwrapperExample:
mkdir -p /storage1/fs1/songcao/Active/Projects/somatic/example_run_somatic_2025
mkdir -p /storage1/fs1/songcao/Active/Projects/somatic/example_run_somatic_2025/logUse --step 0 to run Steps 1–14 sequentially with built-in job dependencies:
perl somaticwrapper.pl   --step 0   --rdir /storage1/fs1/songcao/Active/Projects/somatic/example_run_somatic_2025   --log  /storage1/fs1/songcao/Active/Projects/somatic/example_run_somatic_2025/log   --ref /storage1/fs1/songcao/Active/Database/hg38_database/GRCh38.d1.vd1/GRCh38.d1.vd1.fa   --smg /storage1/fs1/songcao/Active/Database/SMG/smg_list.txt   --groupname example_run_somatic_2025   --users scao   --wgs 0   --srg 1   --sre 0   --exonic 1   --q long   --mincovt 14 --mincovn 8 --minvaf 0.05 --maxindsize 100perl somaticwrapper.pl --step 5 --rdir <run_dir> --log <log_dir> ...| Step | Description | 
|---|---|
| 0 | Submit steps (1–11) automatically with dependencies | 
| 1 | Run Strelka2 | 
| 2 | Run VarScan2 | 
| 3 | Run Pindel | 
| 4 | Run Mutect1 | 
| 5 | Parse Mutect results | 
| 6 | Parse Strelka2 results | 
| 7 | Parse VarScan2 results | 
| 8 | Parse Pindel results | 
| 9 | QC VCF files | 
| 10 | Merge VCF files | 
| 11 | Generate MAF files | 
| 12 | Merge run-level MAF | 
| 13 | DNP annotation | 
| 14 | Clean unnecessary intermediate files | 
| 22 | Submit steps (2–11) automatically with dependencies | 
| 23 | Submit steps (3–11) automatically with dependencies | 
| 24 | Submit steps (4–11) automatically with dependencies | 
| 25 | Submit steps (5–11) automatically with dependencies | 
| 26 | Submit steps (6–11) automatically with dependencies | 
| 27 | Submit steps (7–11) automatically with dependencies | 
| 28 | Submit steps (8–11) automatically with dependencies | 
| 29 | Submit steps (9–11) automatically with dependencies | 
| 30 | Submit steps (10–11) automatically with dependencies | 
| Parameter | Description | 
|---|---|
--rdir | 
Full path to run directory containing per-sample folders | 
--log | 
Path for log output (usually parent of rdir) | 
--srg | 
BAM has read groups (1 = yes, 0 = no) | 
--sre | 
Rerun and overwrite results (1 = yes, 0 = no) | 
--wgs | 
1 = WGS, 0 = WXS | 
--groupname | 
Job group name | 
--users | 
LSF user account (used in job group path) | 
--ref | 
HG38 reference FASTA | 
--smg | 
SMG gene list file | 
--q | 
LSF queue (research-hpc, ding-lab, or long) | 
--mincovt | 
Minimum tumor coverage (≥ 14) | 
--mincovn | 
Minimum normal coverage (≥ 8) | 
--minvaf | 
Minimum variant allele frequency (≥ 0.05) | 
--maxindsize | 
Maximum indel size (≤ 100) | 
--exonic | 
Output exonic region (1 = yes, 0 = no) | 
run_dir/
├── <sample_name>/
│   ├── strelka/
│   ├── varscan/
│   ├── pindel/
│   ├── mutect1/
│   ├── merged.withmutect.vcf
│   ├── <sample>.withmutect.maf
│   └── <sample>.dnp.annotated.maf
└── log/
    ├── LSF_DIR_SOMATIC/
    └── tmpsomatic/
Author: Song Cao
Email: scao@wustl.edu
Washington University in St. Louis