SPAdes 2.3.0 Manual

SPAdes stands for St. Petersburg genome assembler. It is intended for both standard isolates and single-cell MDA bacteria assemblies. This manual will help you to install and run SPAdes. You can find the latest SPAdes release at http://bioinf.spbau.ru/spades. The latest version of this manual can be found here. SPAdes version 2.3.0 was released under GPLv2 on 30 October 2012.

1. Installation

SPAdes requires a 64-bit Linux system. We have two test datasets: single-cell E. coli dataset and E. coli isolates dataset. SPAdes requires 24 Gb of RAM for processing these datasets. Also SPAdes requires Python 2 (version 2.4 or higher) installed.

1.1 SPAdes tar.gz file

To download SPAdes tar.gz and extract it:


    wget http://spades.bioinf.spbau.ru/release2.3.0/spades-2.3.0.tar.gz
    tar -xzf spades-2.3.0.tar.gz
    cd spades-2.3.0

1.2 Getting SPAdes binaries

There are two ways to obtain SPAdes binaries: download static builds from our server or compile SPAdes on your server.

We recommend to download binaries with the following script:


    ./spades_download_binary.py

Also you can compile SPAdes yourself, but SPAdes depends on the following libraries for compiling its code:

If you meet these requirements you can build SPAdes with the following script:


    ./spades_compile.sh

In both cases you should get a bin directory with files hammer (error correcting module) ans spades (assembly module).

1.3 Testing your installation

For testing purposes, SPAdes comes with a toy dataset (first 1000 bp of E. coli). If you run spades.py with the parameter --test


    ./spades.py --test

it will process this dataset and if the installation is successful you will see something like this at the end of the log:


 * Corrected reads are in spades_test/corrected/
 * Assembled contigs are spades_test/contigs.fasta
 * Assembled scaffolds are spades_test/scaffolds.fasta

Thank you for using SPAdes!

======= SPAdes pipeline finished

2 Running SPAdes

2.1 Input data

SPAdes accepts single reads as well as forward-reverse paired end reads in FASTA and FASTQ format; however, in order to run error correction, reads should be in FASTQ format. All files may be compressed with gzip. At present, SPAdes can accept only one paired-end library as an input.

SPAdes supports paired end reads organized in two separate files or combined in one:

2.2 SPAdes pipeline

SPAdes stores all of its output files in the directory <output_dir>.

Before starting assembly, SPAdes runs BayesHammer to correct errors in reads. After that, the corrected reads are stored in the directory <output_dir>/corrected in *.fastq.gz files.

After assembly completion, the resulting contigs are stored in the file <output_dir>/contigs.fasta and the resulting scaffolds are stored in the file <output_dir>/scaffolds.fasta. If the --generate-sam-file option was used, a symlink contigs.sam to the SAM-file will also be created in the same directory.

2.3 SPAdes command line options

To run SPAdes from the command line, type


    ./spades.py [options] -o <output_dir>

To run SPAdes on the toy dataset (see section 1.3), type


    ./spades.py -1 test_dataset/ecoli_1K_1.fq.gz -2 test_dataset/ecoli_1K_2.fq.gz -o spades_test

Here is the description of options:

-o <output_dir>
    Specify the output directory. Required option.

--sc
    This flag is required for MDA (single-cell) data.

--12 <filename>
    File with merged left and right paired end reads.

-1 <filename>
    File with left paired end reads.

-2 <filename>
    File with right paired end reads.

-s <filename>
    File with unpaired reads.

--generate-sam-file
    Generate a SAM file that contains information about the alignment of the original reads to the resulting contigs.

-t <int> (or --threads <int>)
    Number of threads. The default value is 16.

-m <int> (or --memory <int>)
    Sets the memory limit in Gb. SPAdes terminates if it reaches this limit. The default value is 250 Gb. Actually consumed physical memory will be smaller than this limit.

--tmp-dir <dirname>
    Sets the directory to store temporary files from error correction. The default value is <output_dir>/corrected/tmp.

-k <int,int,...>
    Comma-separated list of k-mer sizes to be used (all values must be odd, less than 100 and listed in ascending order). The default value is 21,33,55.

-i <int> (or --iterations <int>)
    Number of iterations for error correction. The default value is 1.

--phred-offset <33 or 64>
    PHRED quality offset for the input reads, can be either 33 or 64. It will be auto-detected if it is not specified.

--only-error-correction
    Runs error correction only.

--only-assembler
    Runs assembly only.

--disable-gzip-output
    Forces error correction not to compress the corrected reads.

--test
    Runs SPAdes on the toy dataset; see section 1.3.

--debug
    Runs SPAdes in debug mode, keeping intermediate results.

-h (or --help)
    Prints help.

2.4 Files and directories

The directory <output_dir> contains the following files:

    assembly.log - assembler log
    contigs.fasta - resulting contigs
    contigs.sam - SAM file, generated only with --generate-sam-file option
    corrected/ - files from error correction run
        configs/ - configuration files for error correction
        correction.log - error correction log
        dataset.info - internal configuration files
        other error correction output file
    dataset.info - internal configuration files
    K21/ - files from the run with K=21
    K33/ - files from the run with K=33
    K55/ - files from the run with K=55
    params.txt - information about SPAdes parameters in this run
    scaffolds.fasta - resulting scaffolds

SPAdes will overwrite these files and directories if they exist in the specified <output_dir>.

2.5 Assembly evaluation

We recommend using QUAST for assembly evaluation.

3 Feedback and bug reports

Your comments, bug reports, and suggestions are very welcomed. They will help us to further improve SPAdes.

In case you have troubles running SPAdes, please provide us with the following files from the directory <output_dir>:


    params.txt
    assembly.log
    corrected/correction.log

Address for communications: spades.support@bioinf.spbau.ru.