FASTQ Format Specification

Introduction

FASTQ format stores sequences and Phred qualities in a single file. It is concise and compact. FASTQ is first widely used in the Sanger Institute and therefore we usually take the Sanger specification and the standard FASTQ format, or simply FASTQ format. Although Solexa/Illumina read file looks pretty much like FASTQ, they are different in that the qualities are scaled differently. In the quality string, if you can see a character with its ASCII code higher than 90, probably your file is in the Solexa/Illumina format.

Example

@EAS54_6_R1_2_1_413_324
CCCTTCTTGTCTTCAGCGTTTCTCC
+
;;3;;;;;;;;;;;;7;;;;;;;88
@EAS54_6_R1_2_1_540_792
TTGGCAGGCCAAGGCCGATGGATCA
+
;;;;;;;;;;;7;;;;;-;;;3;83
@EAS54_6_R1_2_1_443_348
GTTGCTTCTGGCGTGGGTGGGGGGG
+EAS54_6_R1_2_1_443_348
;;;;;;;;;;;9;7;;.7;393333
  

FASTQ Format Specification

Notations

Syntax

<fastq>:=<block>+
<block>:=@<seqname>\n<seq>\n+[<seqname>]\n<qual>\n
<seqname>:=[A-Za-z0-9_.:-]+
<seq>:=[A-Za-z\n\.~]+
<qual>:=[!-~\n]+

Requirements

where chr() is the Perl function to convert an integer to a character based on the ASCII table.
  • Conversely, given a character $q, the corresponding Phred quality can be calculated with:
      $Q = ord($q) - 33;
      where ord() gives the ASCII code of a character.

      Solexa/Illumina Read Format

      The syntax of Solexa/Illumina read format is almost identical to the FASTQ format, but the qualities are scaled differently. Given a character $sq, the following Perl code gives the Phred quality $Q:

        $Q = 10 * log(1 + 10 ** (ord($sq) - 64) / 10.0)) / log(10);