The fundamental information in the genome is contained in the DNA, which is a long molecular string made up of four different nucleotides symbolized by the letters A, C, G, and T.
A given genome has a unique sequence of these four nucleotides; for example, “AAACTTTACTTG…” The sequence of letters encodes information about the control of molecular processes in an organism including specifying the different proteins synthesized by the organism.
The human genome consists of approximately 3 billion letters, organized into 23 different chromosomes that come in maternal and paternal pairs for a total of 46 chromosomes in each cell.
A change in the sequence (mutations) can cause changes in proteins or how genes’ expressions are controlled. Mutations can control risk of diseases or directly cause certain diseases (for example, cancer).
Furthermore, while humans are nearly identical in their genome sequence (99.9% identical), differences in the sequence partly determines many of our characteristics such as facial shape, height, color, and so on.
Because the genome sequence contains all this information, uncovering the exact letter sequence for a genome or a part of a genome is one of the central goals of biology. “Reading” the DNA molecule to determine the sequence of the letters is called DNA sequencing.