rnaSPAdes 0.1.1 Manual

1. About rnaSPAdes
2. Installation
    2.1. Downloading rnaSPAdes Linux binaries
    2.2. Downloading rnaSPAdes binaries for Mac
    2.3. Downloading and compiling rnaSPAdes source code
    2.4. Verifying your installation
3. rnaSPAdes command line options
4. Assembly evaluation
4. Citation
5. Feedback and bug reports

1 About rnaSPAdes

rnaSPAdes is an assembler for RNA-Seq data, which is based on SPAdes genome assembler. This manual will help you to install and run rnaSPAdes. Note, that this manual only contains information specific for rnaSPAdes.

To learn basic options we strongly recommend to read SPAdes manual first.

2 Installation

rnaSPAdes requires a 64-bit Linux system and Python (supported versions are 2.4, 2.5, 2.6, 2.7, 3.2 and 3.3) to be pre-installed on it. To obtain rnaSPAdes you can either download binaries or download source code and compile it yourself.

2.1 Downloading rnaSPAdes Linux binaries

To download rnaSPAdes Linux binaries and extract them, go to the directory in which you wish rnaSPAdes to be installed and run:


    wget http://spades.bioinf.spbau.ru/rnaspades0.1.1/rnaSPAdes-0.1.1-Linux.tar.gz
    tar -xzf rnaSPAdes-0.1.1-Linux.tar.gz
    cd rnaSPAdes-0.1.1-Linux/bin/

In this case you do not need to run any installation scripts – rnaSPAdes is ready to use. The following files will be placed in the bin directory:

We also suggest adding rnaSPAdes installation directory to the PATH variable.

2.2 Downloading rnaSPAdes binaries for Mac

rnaSPAdes binaries for Mac OS will be available after first public release.

2.3 Downloading and compiling rnaSPAdes source code

If you wish to compile rnaSPAdes by yourself you will need the following libraries to be pre-installed:

If you meet these requirements, you can download the SPAdes source code:


    wget http://spades.bioinf.spbau.ru/rnaspades0.1.1/rnaSPAdes-0.1.1.tar.gz
    tar -xzf rnaSPAdes-3.5.0.tar.gz
    cd rnaSPAdes-3.5.0

and build it with the following script:


    ./spades_compile.sh

rnaSPAdes will be built in the directory ./bin. If you wish to install rnaSPAdes into another directory, you can specify full path of destination folder by running the following command in bash or sh:


    PREFIX=<destination_dir> ./spades_compile.sh

for example:


    PREFIX=/usr/local ./spades_compile.sh

which will install rnaSPAdes into /usr/local/bin.

After installation you will get the same files in ./bin (or <destination_dir>/bin if you specified PREFIX) directory:

We also suggest adding rnaSPAdes installation directory to the PATH variable.

2.4 Verifying your installation

For testing purposes, rnaSPAdes comes with a toy data set (reads that align to first 1000 bp of E. coli). To try rnaSPAdes on this data set, run:


    <spades installation dir>/rnaspades.py --test

If you added rnaSPAdes installation directory to the PATH variable, you can run:


    rnaspades.py --test

For the simplicity we further assume that rnaSPAdes installation directory is added to the PATH variable.

If the installation is successful, you will find the following information at the end of the log:


===== Assembling finished. 

 * Corrected reads are in /home/andrey/ablab/algorithmic-biology/assembler/spades_test/corrected/
 * Assembled contigs are in /home/andrey/ablab/algorithmic-biology/assembler/spades_test/contigs.fasta (contigs.fastg)
 * Assembled scaffolds are in /home/andrey/ablab/algorithmic-biology/assembler/spades_test/scaffolds.fasta (scaffolds.fastg)

======= rnaSPAdes pipeline finished WITH WARNINGS!

=== Error correction and assembling warnings:
 * 0:00:00.491   16M /    3G   WARN  General                 (graph_simplification.hpp  :1357)   Mean coverage wasn't reliably estimated
======= Warnings saved to /home/andrey/ablab/algorithmic-biology/assembler/spades_test/warnings.log

========= TEST PASSED CORRECTLY.

rnaSPAdes log can be found here: /home/andrey/ablab/algorithmic-biology/assembler/spades_test/spades.log

Thank you for using rnaSPAdes!


3 rnaSPAdes command line options

To run rnaSPAdes from the command line, type


    rnaspades.py [options] -o <output_dir>

Note that we assume that rnaSPAdes installation directory is added to the PATH variable (provide full path to rnaSPAdes executable otherwise: <rnaspades installation dir>/rnaspades.py).

Below we describe only main options and options specific for rnaSPAdes. To learn basic options we strongly recommend to read SPAdes manual first.

Basic options

-o <output_dir>
    Specify the output directory. Required option.

--iontorrent
    This flag is required when assembling IonTorrent data. Allows BAM files as input. Carefully read section 3.3 of SPAdes manual before using this option.

--test
    Runs rnaSPAdes on the toy data set; see section 2.3.

-h (or --help)
    Prints help.

Options for transcriptome assembly

--draft-assembly
    Perform raw draft assembly. Noticeably reduces running time, but may result in a deteriorated assembly quality.

--min-complete-transcript <int>
Remove low-covered (coverage less than 4) isolated transcript fragments with shorted than the specified value. Default value is 200 bp. Note, that using this thresholds does not remove all transcripts shorter than the specified value, but only those that have coverage gaps at both ends.     

4 Assembly evaluation

rnaQUAST may be used to generate summary statistics on test datasets when reference genome and gene database are available.

5 Citation

Paper to be submitted.

6 Feedback and bug reports

Your comments, bug reports, and suggestions are very welcomed. They will help us to further improve rnaSPAdes.

If you have any troubles running rnaSPAdes, please send us params.txt and spades.log from the directory <output_dir>.

Address for communications: spades.support@bioinf.spbau.ru.