SPAdes stands for St. Petersburg genome assembler. It is intended for both standard (multicell) and single-cell MDA bacteria assemblies. This manual will help you to install and run SPAdes. You can find the latest SPAdes release at http://bioinf.spbau.ru/spades. The latest version of this manual can be found here.
SPAdes requires a 64-bit Linux system. We have two test datasets: single-cell E. coli dataset and multicell E. coli dataset. Error correction for these datasets requires 85 Gb of RAM, assembling after error correction requires 30 Gb of RAM. Also SPAdes requires Python 2.4 or higher installed.
To download SPAdes tar.gz and extract it:
wget http://spades.bioinf.spbau.ru/release2.1.0/spades_2.1.0.tar.gz tar -xzf spades_2.1.0.tar.gz cd spades-2.1.0
SPAdes depends on the following libraries for compiling its code:
If you are not able to match these requirements please use downloaded binaries as described in the next section.
Precompiled SPAdes binaries for the default parameters may be downloaded with the following scripts:
./spades_download_binary.py 21 33 55 ./spades_download_bayeshammer.py
This will download binaries to
SPAdes uses binaries optimized for each value of k. You may download binaries
for the required values of k with the script
./spades_download_binary.py k1 k2
using space-separated values of k as an argument.
For testing purposes, SPAdes comes with a toy dataset (first 1000 bp of E. coli). If you run
spades.py with the parameter
it will process this dataset and if the installation is successful you will see something like this at the end of the log:
* Corrected reads are in spades_output/ECOLI_1K/corrected/ * Assembled contigs are spades_output/ECOLI_1K/spades_04.18_17.59.30/ECOLI_1K.fasta Thank you for using SPAdes! ======= SPAdes pipeline finished
SPAdes accepts single reads as well as forward-reverse paired end reads in FASTA and FASTQ format; however, in order to run error correction, reads should be in FASTQ format. All files may be compressed with gzip. At present, SPAdes can accept only one paired-end library as input.
SPAdes supports paired end reads organized into one or two files.
s_n_1_export.txtfirst, followed by the read from the Illumina file
SPAdes has two stages: error correction by BayesHammer and genome assembly. By default, all the
results are stored in the directory
spades_output. For each dataset, SPAdes creates a separate directory
Corrected reads are stored in the subdirectory
The resulting contigs are stored in the subdirectory
<project_name>.fasta file. If the
--generate-sam-file option was set the symlink to the
<project_name>.sam file also created in the same directory.
If SPAdes finds the corrected reads, it asks the user whether it should start error correction again. By default, we do not re-run error correction and we start the assembly in 10 seconds unless user says otherwise.
There are two ways to run SPAdes for you dataset: you can specify the parameters from the command line or provide the configuration
file as the only
SPAdes is run from the command line as follows:
spades.py [options] -n <project_name>
Here is the description of options:
A required option that sets the name of the project.
Specify the output directory. Default:
This flag is required for MDA (single-cell) data.
File with interlaced left and right paired end reads.
File with left paired end reads.
File with right paired end reads.
File with unpaired reads
Generate a SAM file. See more details in section 3.
-t <int> (or
Number of threads. The default value is 16.
-m <int> (or
Sets the memory limit in Gb. SPAdes terminates if it reaches this limit. The default value is 250 Gb. Due to technical reasons, the actual physical memory consumed will be smaller than this limit.
Set the directory for error correction's temporary files. The default value is
Comma-separated list of increasing k-mer sizes (all values must be odd). Default is 21,33,55.
-i <int> (or
Number of iterations for error correction. The default value is 2.
--phred-offset <33 or 64>
PHRED quality offset for the input reads, can be either 33 or 64. It will be auto-detected if it is not specified.
With this option we run only error correction, without the assembler.
With this option we run only assembler, without error correction.
Forces SPAdes not to use the gap closer.
Forces error correction not to compress the corrected reads.
Runs SPAdes on the toy dataset, see section 1.3 .
You can find the configuration file
spades_config.info for the toy dataset from section 1.3 in the directory where you extracted the archive (see section 1.1).
You can use this file as the template for your datasets. In this file, on each line, all text after the first semicolon is a comment.
The configuration file starts with the common parameters for the SPAdes run. We recommend setting the
project_name parameter that specifies the directory
and names for output files. After that there is a dataset section that contains the information about input reads. This section is required and you need to specify
single_reads or both.
The next two sections are
assembly. If you want to skip one of these stages, you can
remove or rename or comment it.
SPAdes stores compiled binaries in the directory
build and reuses them. To save disk space, or to force SPAdes to recompile the binaries, just
delete this directory. Most users will not need to do this.
For each SPAdes run, we create the directory
spades_output/<project_name>. It contains internal configs,
logs and intermediate results for different values of k. We also have symbolic links
latest_success to the directories
for the latest and the latest successful runs.
After running SPAdes you can evaluate different quality metrics using our pipeline. Please see quality.html for details.
SPAdes can generate a SAM file for further processing, but SAM files are
very large, so you need to set the
--generate-sam-file option in the command line.
SAM file contains information about the alignment of the original reads to the resulting
contigs. We recommend using Tablet  to visualize these alignments.
Also we recommend to use SEQuel  after SPAdes to further reduce the number of small errors (single nucleotide substitutions and small indels). It also requires the SAM-file as input.
The current version of SPAdes does not have a scaffolding stage. One can use a separate scaffolder such as Opera .
We will be thankful if you help us make SPAdes better by sending your comments, bug reports, and suggestions to firstname.lastname@example.org.
We kindly ask you to attach file
contigs/assembly.log if you have troubles running SPAdes.
These files are placed in the directory
 A. Bankevich, S. Nurk, et al. SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing. Journal of Computational Biology. May 2012, 19(5): 455-477. doi:10.1089/cmb.2012.0021.
 Song Gao, Wing-Kin Sung, and Niranjan Nagarajan. Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences. Journal of Computational Biology. November 2011, 18(11): 1681-1691. doi:10.1089/cmb.2011.0170.
 I. Milne, M. Bayer, L. Cardle, P. Shaw, G. Stephen, F.Wright, and D. Marshall. Tablet — next generation sequence assembly visualization. Bioinformatics (2010) 26 (3): 401-402. doi: 10.1093/bioinformatics/btp666
 R. Ronen, C. Boucher, H. Chitsaz, and P. Pevzner. SEQuel: Improving the accuracy of genome assemblies. Bioinformatics, 2012. To appear.