Tuesday, December 08, 2015

Aligning long reads

With long reads becoming more popular and accessible, major read aligners responded by providing mapping capabilities that can work with long, high-error reads. STAR's Alex Dobin has put out a tool STARlong that can map PacBio's Circular Consensus Seqeuencong (CCS) reads. You would need to install it alongside your regular STAR binary (instructions here). Once you have that, generate an index for you genome and tune some parameters to get the alignments:
# assuming STARlong is installed
# generate an index for your genome:
...
# align reads
STARlong --runThreadN 20 --runMode alignReads --seedPerReadNmax 10000 --genomeDir  --readFilesIn 

BWA's Heng Li has also provided a version of bwa mapper that can align Oxford Nanopore and PacBio's long subreads and shorter CCS reads. Here is how he suggests running the alignments:
bwa index ref.fa
bwa mem -x pacbio ref.fa pacbio.fq > aln.sam
bwa mem -x ont2d ref.fa ont-2D.fq > aln.sam

Further discussions on aligning long Nanopore and PacBio reads:
Biostars ] [ choosing STAR parameters for long reads ] [ STAR parameters to use with IsoSeq reads ]