Manual¶
Modules available:
nanopolish extract: extract reads in FASTA or FASTQ format from a directory of FAST5 files
nanopolish call-methylation: predict genomic bases that may be methylated
nanopolish variants: detect SNPs and indels with respect to a reference genome
nanopolish variants --consensus: calculate an improved consensus sequence for a draft genome assembly
nanopolish eventalign: align signal-level events to k-mers of a reference genome
nanopolish phase-reads: Phase reads using heterozygous SNVs with respect to a reference genome
nanopolish polya: Estimate polyadenylated tail lengths on native RNA reads
extract¶
Overview¶
This module is used to extract reads in FASTA or FASTQ format from a directory of FAST5 files.
Input¶
- path to a directory of FAST5 files modified to contain basecall information
Output¶
- sequences of reads in FASTA or FASTQ format
Usage example¶
nanopolish extract [OPTIONS] <fast5|dir>
Argument name(s) | Required | Default value | Description |
---|---|---|---|
<fast5|dir> | Y | NA | FAST5 or path to directory of FAST5 files. |
-r , --recurse |
N | NA | Recurse into subdirectories |
-q , --fastq |
N | fasta format | Use when you want to extract to FASTQ format |
-t , --type=TYPE |
N | 2d-or-template | The type of read either: {template, complement, 2d, 2d-or-template, any} |
-b , --basecaller=NAME[:VERSION] |
N | NA | consider only data produced by basecaller NAME, optionally with given exact VERSION |
-o , --output=FILE |
N | stdout | Write output to FILE |
index¶
Overview¶
Build an index mapping from basecalled reads to the signals measured by the sequencer
Input¶
- path to directory of raw nanopore sequencing data in FAST5 format
- basecalled reads
Output¶
- gzipped FASTA file of basecalled reads (.index)
- index files (.fai, .gzi, .readdb)
Readdb file format¶
Readdb file is a tab-separated file that contains two columns. One column represents read ids and the other column represents the corresponding path to FAST5 file:
read_id_1 /path/to/fast5/containing/reads_id_1/signals
read_id_2 /path/to/fast5/containing/read_id_2/signals
Usage example¶
nanopolish index [OPTIONS] -d nanopore_raw_file_directory reads.fastq
Argument name(s) | Required | Default value | Description |
---|---|---|---|
-d , --directory |
Y | NA | FAST5 or path to directory of FAST5 files containing ONT sequencing raw signal information. |
-f , --fast5-fofn |
N | NA | file containing the paths to each fast5 for the run |
call-methylation¶
Overview¶
Classify nucleotides as methylated or not.
Input¶
- Basecalled ONT reads in FASTA format
Output¶
- tab-separated file containing per-read log-likelihood ratios
Usage example¶
nanopolish call-methylation [OPTIONS] <fast5|dir>
Argument name(s) | Required | Default value | Description |
---|---|---|---|
-r , --reads=FILE |
Y | NA | the ONT reads are in fasta FILE |
-b , --bam=FILE |
Y | NA | the reads aligned to the genome assembly are in bam FILE |
-g , --genome=FILE |
Y | NA | the genome we are computing a consensus for is in FILE |
-t , --threads=NUM |
N | 1 | use NUM threads |
--progress |
N | NA | print out a progress message |
variants¶
Overview¶
This module is used to call single nucleotide polymorphisms (SNPs) using a signal-level HMM.
Input¶
- basecalled reads
- alignment info
- genome assembly
Output¶
- VCF file
Usage example¶
nanopolish variants [OPTIONS] --reads reads.fa --bam alignments.bam --genome genome.fa
Argument name(s) | Required | Default value | Description |
---|---|---|---|
--snps |
N | NA | use flag to only call SNPs |
--consensus=FILE |
N | NA | run in consensus calling mode and write polished sequence to FILE |
--fix-homopolymers |
N | NA | use flag to run the experimental homopolymer caller |
--faster |
N | NA | minimize compute time while slightly reducing consensus accuracy |
-w , --window=STR |
N | NA | find variants in window STR (format: <chromsome_name>:<start>-<end>) |
-r , --reads=FILE |
Y | NA | the ONT reads are in fasta FILE |
-b , --bam=FILE |
Y | NA | the reads aligned to the reference genome are in bam FILE |
-e , --event-bam=FILE |
Y | NA | the events aligned to the reference genome are in bam FILE |
-g , --genome=FILE |
Y | NA | the reference genome is in FILE |
-o , --outfile=FILE |
N | stdout | write result to FILE |
-t , --threads=NUM |
N | 1 | use NUM threads |
-m , --min-candidate-frequency=F |
N | 0.2 | extract candidate variants from the aligned reads when the variant frequency is at least F |
-d , --min-candidate-depth=D |
N | 20 | extract candidate variants from the aligned reads when the depth is at least D |
-x , --max-haplotypes=N |
N | 1000 | consider at most N haplotypes combinations |
--max-rounds=N |
N | 50 | perform N rounds of consensus sequence improvement |
-c , --candidates=VCF |
N | NA | read variants candidates from VCF, rather than discovering them from aligned reads |
-a , --alternative-basecalls-bam=FILE |
N | NA | if an alternative basecaller was used that does not output event annotations then use basecalled sequences from FILE. The signal-level events will still be taken from the -b bam |
--calculate-all-support |
N | NA | when making a call, also calculate the support of the 3 other possible bases |
--models-fofn=FILE |
N | NA | read alternatives k-mer models from FILE |
event align¶
Overview¶
Align nanopore events to reference k-mers
Input¶
- basecalled reads
- alignment information
- assembled genome
Usage example¶
nanopolish eventalign [OPTIONS] --reads reads.fa --bam alignments.bam --genome genome.fa
Argument name(s) | Required | Default value | Description |
---|---|---|---|
--sam |
N | NA | use to write output in SAM format |
-w, --window=STR |
N | NA | Compute the consensus for window STR (format : ctg:start_id-end_id) |
-r, --reads=FILE |
Y | NA | the ONT reads are in fasta FILE |
-b, --bam=FILE |
Y | NA | the reads aligned to the genome assembly are in bam FILE |
-g, --genome=FILE |
Y | NA | the genome we are computing a consensus for is in FILE |
-t, --threads=NUM |
N | 1 | use NUM threads |
--scale-events |
N | NA | scale events to the model, rather than vice-versa |
--progress |
N | NA | print out a progress message |
-n , --print-read-names |
N | NA | print read names instead of indexes |
--summary=FILE |
N | NA | summarize the alignment of each read/strand in FILE |
--samples |
N | NA | write the raw samples for the event to the tsv output |
--models-fofn=FILE |
N | NA | read alternative k-mer models from FILE |
phase-reads - (experimental)¶
Overview¶
Phase reads using heterozygous SNVs with respect to a reference genome
Input¶
- basecalled reads
- alignment information
- assembled genome
- variants (from nanopolish variants or from other sources eg. Illumina VCF)
Usage example¶
nanopolish phase-reads [OPTIONS] --reads reads.fa --bam alignments.bam --genome genome.fa variants.vcf
polya¶
Overview¶
Estimate the number of nucleotides in the poly(A) tails of native RNA reads.
Input¶
- basecalled reads
- alignment information
- reference transcripts
Usage example¶
nanopolish polya [OPTIONS] --reads=reads.fa --bam=alignments.bam --genome=ref.fa
Argument name(s) | Required | Default value | Description |
---|---|---|---|
-w, --window=STR |
N | NA | Compute only for reads aligning to window of reference STR (format : ctg:start_id-end_id) |
-r, --reads=FILE |
Y | NA | the FAST(A/Q) file of native RNA reads |
-b, --bam=FILE |
Y | NA | the BAM file of alignments between reads and the reference |
-g, --genome=FILE |
Y | NA | the reference transcripts |
-t, --threads=NUM |
N | 1 | use NUM threads |
-v, -vv |
N | NA | -v returns raw sample log-likelihoods, while -vv returns event durations |