SPAdes stands for St. Petersburg genome assembler. It is intended for both standard isolates and single-cell MDA bacteria assemblies. This manual will help you to install and run SPAdes. You can find the latest SPAdes release at http://bioinf.spbau.ru/spades. The latest version of this manual can be found here. SPAdes version 2.3.0 was released under GPLv2 on 30 October 2012.
SPAdes requires a 64-bit Linux system. We have two test datasets: single-cell E. coli dataset and E. coli isolates dataset. SPAdes requires 24 Gb of RAM for processing these datasets. Also SPAdes requires Python 2 (version 2.4 or higher) installed.
To download SPAdes tar.gz and extract it:
wget http://spades.bioinf.spbau.ru/release2.3.0/spades-2.3.0.tar.gz tar -xzf spades-2.3.0.tar.gz cd spades-2.3.0
There are two ways to obtain SPAdes binaries: download static builds from our server or compile SPAdes on your server.
We recommend to download binaries with the following script:
Also you can compile SPAdes yourself, but SPAdes depends on the following libraries for compiling its code:
If you meet these requirements you can build SPAdes with the following script:
In both cases you should get a
bin directory with files
hammer (error correcting module) ans
spades (assembly module).
For testing purposes, SPAdes comes with a toy dataset (first 1000 bp of E. coli). If you run
spades.py with the parameter
it will process this dataset and if the installation is successful you will see something like this at the end of the log:
* Corrected reads are in spades_test/corrected/ * Assembled contigs are spades_test/contigs.fasta * Assembled scaffolds are spades_test/scaffolds.fasta Thank you for using SPAdes! ======= SPAdes pipeline finished
SPAdes accepts single reads as well as forward-reverse paired end reads in FASTA and FASTQ format; however, in order to run error correction, reads should be in FASTQ format. All files may be compressed with gzip. At present, SPAdes can accept only one paired-end library as an input.
SPAdes supports paired end reads organized in two separate files or combined in one:
s_1_2_sequence.txtfor the forward and reverse reads, respectively, from lane 1), the two file should have corresponding reads in the exact same order.
s_1_1_sequence.txtis followed by the corresponding read from
SPAdes stores all of its output files in the directory
Before starting assembly, SPAdes runs BayesHammer to correct errors in reads.
After that, the corrected reads are stored in the directory
After assembly completion, the resulting contigs are stored in the file
<output_dir>/contigs.fasta and the resulting
scaffolds are stored in the file
--generate-sam-file option was used, a symlink
contigs.sam to the SAM-file will also be created in the same directory.
To run SPAdes from the command line, type
./spades.py [options] -o <output_dir>
To run SPAdes on the toy dataset (see section 1.3), type
./spades.py -1 test_dataset/ecoli_1K_1.fq.gz -2 test_dataset/ecoli_1K_2.fq.gz -o spades_test
Here is the description of options:
Specify the output directory. Required option.
This flag is required for MDA (single-cell) data.
File with merged left and right paired end reads.
File with left paired end reads.
File with right paired end reads.
File with unpaired reads.
Generate a SAM file that contains information about the alignment of the original reads to the resulting contigs.
-t <int> (or
Number of threads. The default value is 16.
-m <int> (or
Sets the memory limit in Gb. SPAdes terminates if it reaches this limit. The default value is 250 Gb. Actually consumed physical memory will be smaller than this limit.
Sets the directory to store temporary files from error correction. The default value is
Comma-separated list of k-mer sizes to be used (all values must be odd, less than 100 and listed in ascending order). The default value is 21,33,55.
-i <int> (or
Number of iterations for error correction. The default value is 1.
--phred-offset <33 or 64>
PHRED quality offset for the input reads, can be either 33 or 64. It will be auto-detected if it is not specified.
Runs error correction only.
Runs assembly only.
Forces error correction not to compress the corrected reads.
Runs SPAdes on the toy dataset; see section 1.3.
Runs SPAdes in debug mode, keeping intermediate results.
<output_dir> contains the following files:
assembly.log- assembler log
contigs.fasta- resulting contigs
contigs.sam- SAM file, generated only with
corrected/- files from error correction run
configs/- configuration files for error correction
correction.log- error correction log
dataset.info- internal configuration files other error correction output file
dataset.info- internal configuration files
K21/- files from the run with K=21
K33/- files from the run with K=33
K55/- files from the run with K=55
params.txt- information about SPAdes parameters in this run
scaffolds.fasta- resulting scaffolds
SPAdes will overwrite these files and directories if they exist in the specified
We recommend using QUAST for assembly evaluation.
Your comments, bug reports, and suggestions are very welcomed. They will help us to further improve SPAdes.
In case you have troubles running SPAdes, please provide us with the following files from the directory
params.txt assembly.log corrected/correction.log
Address for communications: email@example.com.