Manual¶

Modules available:

nanopolish extract: extract reads in FASTA or FASTQ format from a directory of FAST5 files
nanopolish call-methylation: predict genomic bases that may be methylated
nanopolish variants: detect SNPs and indels with respect to a reference genome
nanopolish variants --consensus: calculate an improved consensus sequence for a draft genome assembly
nanopolish eventalign: align signal-level events to k-mers of a reference genome
nanopolish phase-reads: Phase reads using heterozygous SNVs with respect to a reference genome
nanopolish polya: Estimate polyadenylated tail lengths on native RNA reads

extract¶

Overview¶

This module is used to extract reads in FASTA or FASTQ format from a directory of FAST5 files.

Input¶

path to a directory of FAST5 files modified to contain basecall information

Output¶

sequences of reads in FASTA or FASTQ format

Usage example¶

nanopolish extract [OPTIONS] <fast5|dir>

Argument name(s)	Required	Default value	Description
<fast5\|dir>	Y	NA	FAST5 or path to directory of FAST5 files.
`-r`, `--recurse`	N	NA	Recurse into subdirectories
`-q`, `--fastq`	N	fasta format	Use when you want to extract to FASTQ format
`-t`, `--type=TYPE`	N	2d-or-template	The type of read either: {template, complement, 2d, 2d-or-template, any}
`-b`, `--basecaller=NAME[:VERSION]`	N	NA	consider only data produced by basecaller NAME, optionally with given exact VERSION
`-o`, `--output=FILE`	N	stdout	Write output to FILE

index¶

Overview¶

Build an index mapping from basecalled reads to the signals measured by the sequencer

Input¶

path to directory of raw nanopore sequencing data in FAST5 format

basecalled reads

Output¶

gzipped FASTA file of basecalled reads (.index)

index files (.fai, .gzi, .readdb)

Readdb file format¶

Readdb file is a tab-separated file that contains two columns. One column represents read ids and the other column represents the corresponding path to FAST5 file:

read_id_1   /path/to/fast5/containing/reads_id_1/signals
read_id_2   /path/to/fast5/containing/read_id_2/signals

Usage example¶

nanopolish index [OPTIONS] -d nanopore_raw_file_directory reads.fastq

Argument name(s)	Required	Default value	Description
`-d`, `--directory`	Y	NA	FAST5 or path to directory of FAST5 files containing ONT sequencing raw signal information.
`-f`, `--fast5-fofn`	N	NA	file containing the paths to each fast5 for the run

call-methylation¶

Overview¶

Classify nucleotides as methylated or not.

Input¶

Basecalled ONT reads in FASTA format

Output¶

tab-separated file containing per-read log-likelihood ratios

Usage example¶

nanopolish call-methylation [OPTIONS] <fast5|dir>

Argument name(s)	Required	Default value	Description
`-r`, `--reads=FILE`	Y	NA	the ONT reads are in fasta FILE
`-b`, `--bam=FILE`	Y	NA	the reads aligned to the genome assembly are in bam FILE
`-g`, `--genome=FILE`	Y	NA	the genome we are computing a consensus for is in FILE
`-t`, `--threads=NUM`	N	1	use NUM threads
`--progress`	N	NA	print out a progress message

variants¶

Overview¶

This module is used to call single nucleotide polymorphisms (SNPs) using a signal-level HMM.

Input¶

basecalled reads

alignment info

genome assembly

Output¶

VCF file

Usage example¶

nanopolish variants [OPTIONS] --reads reads.fa --bam alignments.bam --genome genome.fa

Argument name(s)	Required	Default value	Description
`--snps`	N	NA	use flag to only call SNPs
`--consensus=FILE`	N	NA	run in consensus calling mode and write polished sequence to FILE
`--fix-homopolymers`	N	NA	use flag to run the experimental homopolymer caller
`--faster`	N	NA	minimize compute time while slightly reducing consensus accuracy
`-w`, `--window=STR`	N	NA	find variants in window STR (format: <chromsome_name>:<start>-<end>)
`-r`, `--reads=FILE`	Y	NA	the ONT reads are in fasta FILE
`-b`, `--bam=FILE`	Y	NA	the reads aligned to the reference genome are in bam FILE
`-e`, `--event-bam=FILE`	Y	NA	the events aligned to the reference genome are in bam FILE
`-g`, `--genome=FILE`	Y	NA	the reference genome is in FILE
`-o`, `--outfile=FILE`	N	stdout	write result to FILE
`-t`, `--threads=NUM`	N	1	use NUM threads
`-m`, `--min-candidate-frequency=F`	N	0.2	extract candidate variants from the aligned reads when the variant frequency is at least F
`-d`, `--min-candidate-depth=D`	N	20	extract candidate variants from the aligned reads when the depth is at least D
`-x`, `--max-haplotypes=N`	N	1000	consider at most N haplotypes combinations
`--max-rounds=N`	N	50	perform N rounds of consensus sequence improvement
`-c`, `--candidates=VCF`	N	NA	read variants candidates from VCF, rather than discovering them from aligned reads
`-a`, `--alternative-basecalls-bam=FILE`	N	NA	if an alternative basecaller was used that does not output event annotations then use basecalled sequences from FILE. The signal-level events will still be taken from the -b bam
`--calculate-all-support`	N	NA	when making a call, also calculate the support of the 3 other possible bases
`--models-fofn=FILE`	N	NA	read alternatives k-mer models from FILE

event align¶

Overview¶

Align nanopore events to reference k-mers

Input¶

basecalled reads

alignment information

assembled genome

Usage example¶

nanopolish eventalign [OPTIONS] --reads reads.fa --bam alignments.bam --genome genome.fa

Argument name(s)	Required	Default value	Description
`--sam`	N	NA	use to write output in SAM format
`-w, --window=STR`	N	NA	Compute the consensus for window STR (format : ctg:start_id-end_id)
`-r, --reads=FILE`	Y	NA	the ONT reads are in fasta FILE
`-b, --bam=FILE`	Y	NA	the reads aligned to the genome assembly are in bam FILE
`-g, --genome=FILE`	Y	NA	the genome we are computing a consensus for is in FILE
`-t, --threads=NUM`	N	1	use NUM threads
`--scale-events`	N	NA	scale events to the model, rather than vice-versa
`--progress`	N	NA	print out a progress message
`-n`, `--print-read-names`	N	NA	print read names instead of indexes
`--summary=FILE`	N	NA	summarize the alignment of each read/strand in FILE
`--samples`	N	NA	write the raw samples for the event to the tsv output
`--models-fofn=FILE`	N	NA	read alternative k-mer models from FILE

phase-reads - (experimental)¶

Overview¶

Phase reads using heterozygous SNVs with respect to a reference genome

Input¶

basecalled reads

alignment information

assembled genome

variants (from nanopolish variants or from other sources eg. Illumina VCF)

Usage example¶

nanopolish phase-reads [OPTIONS] --reads reads.fa --bam alignments.bam --genome genome.fa variants.vcf

polya¶

Overview¶

Estimate the number of nucleotides in the poly(A) tails of native RNA reads.

Input¶

basecalled reads

alignment information

reference transcripts

Usage example¶

nanopolish polya [OPTIONS] --reads=reads.fa --bam=alignments.bam --genome=ref.fa

Argument name(s)	Required	Default value	Description
`-w, --window=STR`	N	NA	Compute only for reads aligning to window of reference STR (format : ctg:start_id-end_id)
`-r, --reads=FILE`	Y	NA	the FAST(A/Q) file of native RNA reads
`-b, --bam=FILE`	Y	NA	the BAM file of alignments between reads and the reference
`-g, --genome=FILE`	Y	NA	the reference transcripts
`-t, --threads=NUM`	N	1	use NUM threads
`-v, -vv`	N	NA	-v returns raw sample log-likelihoods, while -vv returns event durations