Genome sequences are not simply random collections of letters; they have complex, ordered patterns. These patterns often involve sequences that are repeated multiple times throughout the genome (just like there are particular phrases used commonly in English literature, like "you wouldn't believe how...").
This presents a challenge when we try to match a read’s sequence to its location in the genome. If we happen to have a sequencing read originally transcribed from one of these repeat sequences, it will align to multiple locations across the genome, and we won't know its true source. Sequencing reads that align to multiple locations are called multimappers.
Each sequencing read comes from only one location in the genome. Our challenge is to figure out which alignment is the true source of the sequencing read.
Is it Chromosome 2 or Chromosome 10? We will explore solutions to this problem later in the module.