Skip to content

Usage

BASALT vs BASALT-Air: Key CLI Differences

BASALT-Air introduces several quality-of-life improvements over the original BASALT CLI:

Feature BASALT (Conda) BASALT-Air (Pixi)
Path support Working directory only Absolute paths supported
Dataset separator / (slash) ; (semicolon)
Intermediate dir CWD (fixed) --workdir
Output dir CWD (fixed) --outdir
Activation conda activate basalt_env pixi shell
Version check BASALT --version
Dependency check BASALT --check-deps

Tip: In BASALT-Air, you can use ; as a separator for multiple datasets in -s, and run BASALT from any directory without copying or symlinking files.


Command-Line Interface

BASALT -a <assemblies> -s <short_reads> -t <threads> -m <RAM_GB> [options]
BASALT -a <assemblies> -s <short_reads> -t <threads> -m <RAM_GB>
       [--workdir <dir>] [--outdir <dir>] [options]

Required Arguments

Argument Type Description Example
-a, --assemblies str Comma-separated assembly FASTA files -a as1.fa,as2.fa,as3.fa
-s, --shortreads str Paired-end reads. Read pairs separated by ,; datasets by / -s s1_r1.fq,s1_r2.fq/s2_r1.fq,s2_r2.fq
-t, --threads int Number of CPU threads -t 64
-m, --ram int Maximum RAM in GB (minimum: 32) -m 250

Supported file formats:

  • Assemblies: .fa, .fna, .fasta
  • Reads: .fq, .fastq
  • Compression: .gz, .tar.gz, .zip

Optional Arguments

Argument Type Default Choices Description
-l, --longreads str none Comma-separated long-read files (ONT/PacBio, excludes HiFi)
-hf, --hifi str none Comma-separated PacBio HiFi read files
-e, --extra_binner str none m, v, l Additional binners: m=MetaBinner, v=VAMB, l=LorBin
-o, --out str Final_binset Output folder name prefix
-q, --quality-check str checkm2 checkm, checkm2 Quality assessment software
--min-cpn int 35 Minimum completeness (%) for refinement
--max-ctn int 20 Maximum contamination (%) for refinement
--mode str continue new, continue Start fresh or resume from checkpoint
--module str all autobinning, refinement, reassembly, all Pipeline stage to run
--sensitive str sensitive quick, sensitive, more-sensitive Binning sensitivity preset
--refinepara str quick quick, deep Refinement depth
-r, --refinement-binset str none Binset folder for standalone refinement
-c, --coverage-list str none Coverage files for standalone refinement
-b, --binsets-list str none Binset folders for dereplication
-d, --data-feeding-folder str none External binset folders for Data Feeding
--binset-index int 500 Start index for extra binsets in Data Feeding
--workdir str CWD Directory for intermediate files (BASALT-Air only)
--outdir str Same as workdir Directory for final output (BASALT-Air only)

= BASALT-Air only


Sensitivity Presets

The --sensitive flag controls which binners are used and their parameter ranges:

Preset Binners Parameters Runtime
quick MetaBAT2, Semibin2 MetaBAT2: 200/300/400/500; Semibin2: 100 Fastest
sensitive MetaBAT2, CONCOCT, Semibin2 As above + CONCOCT (1–2 settings) Balanced
more-sensitive Maxbin2, MetaBAT2, CONCOCT, Semibin2 Maxbin2: 0.3/0.5/0.7/0.9 + full settings Most thorough

Refinement Depth

Option Description
quick Standard contig retrieval. Faster, suitable for most datasets.
deep Extended contig retrieval at the sequence retrieval step. Recovers more contigs but takes longer.

Extra Binners

Binners enabled via -e run in addition to the default set (MetaBAT2, Maxbin2, CONCOCT, Semibin2):

Flag Tool Method Reference
m MetaBinner k-mer composition + coverage clustering
v VAMB Variational autoencoder Nat. Biotechnol. (2021)
l LorBin Long-read multi-scale adaptive clustering Nat. Commun. (2025)

Combine multiple extra binners: -e m,l or -e m,v,l.


Examples

Basic Short-Read Run

BASALT -a assembly.fasta \
    -s sample_R1.fq,sample_R2.fq \
    -t 32 -m 128

Multi-Assembly with Long Reads

BASALT -a as1.fa,as2.fa,as3.fa \
    -s s1_R1.fq,s1_R2.fq/s2_R1.fq,s2_R2.fq \
    -l long.fastq \
    -t 60 -m 250

Fast Mode

BASALT -a assembly.fasta \
    -s sample_R1.fq,sample_R2.fq \
    -t 32 -m 128 \
    --sensitive quick --refinepara quick

High-Quality Filtering with Custom Thresholds

BASALT -a as1.fa \
    -s sample_R1.fq,sample_R2.fq \
    -t 60 -m 250 \
    --min-cpn 50 --max-ctn 10

Short + Long Reads, Sensitive Mode, CheckM2

BASALT -a as1.fa,as2.fa \
    -s s1_R1.fq,s1_R2.fq/s2_R1.fq,s2_R2.fq \
    -l lr1.fastq,lr2.fastq \
    -t 64 -m 256 \
    --sensitive more-sensitive --refinepara deep \
    -q checkm2

Autobinning Only (Skip Refinement + Reassembly)

BASALT -a as1.fa,as2.fa \
    -s s1_R1.fq,s1_R2.fq/s2_R1.fq,s2_R2.fq \
    -t 60 -m 250 \
    --module autobinning

Refinement Only on Existing Bins

BASALT -a assembly.fa \
    -s sample_R1.fq,sample_R2.fq \
    -r My_MAGs_Folder \
    -c Coverage_matrix.txt \
    -t 32 -m 128

Data Feeding: Import External Bins

BASALT -s sample_R1.fq,sample_R2.fq \
    -d vamb_bins,metabat_bins \
    --binset-index 1 \
    -t 32 -m 128

With Extra Binners

BASALT -a as1.fa \
    -s sample_R1.fq,sample_R2.fq \
    -t 32 -m 128 \
    -e m,l \
    --sensitive more-sensitive

BASALT-Air: Absolute Paths with Work/Output Directories

BASALT \
    -a /path/to/data/assembly.fa \
    -s /path/to/data/sample1.R1.fq,/path/to/data/sample1.R2.fq \
    -l /path/to/data/sample1.nanopore.fq \
    -t 64 -m 128 \
    -o my_project \
    --workdir /scratch/work \
    --outdir /results/output

BASALT-Air: Multiple Datasets (Semicolon Separator)

BASALT \
    -a /data/as1.fa,/data/as2.fa \
    -s /data/s1_R1.fq,/data/s1_R2.fq;/data/s2_R1.fq,/data/s2_R2.fq \
    -t 64 -m 128