Genomics Glossary

1000 Genomes Project

the international research effort to create the largest public catalogue of human variation and genotype data by sequencing the genomes of a large number of people

Acrida conica

scientific name for grasshopper

Adenosine triphosphate (ATP)

a chemical that supplies energy for many biochemical cellular processes


a sequence of instructions/actions to solve some problem


a variant form of a gene

Alzheimer's disease

a progressive disorder that causes mental deterioration

amino acid

organic compounds which are the basic building blocks that make up proteins. DNA and RNA encode amino acid information for protein synthesis via codons


a respiratory disease which usually results from an allergic reaction or other forms of hypersensitivity


a general term for a group of complex disorders of brain development

balanced translocation

a translocation where no genetic material is gained or lost in the cell, a person typically will be unaffected

base pairs (bp)

Paired nucleotides that make up DNA and RNA. Following specific hydrogen-bonding patterns, DNA has adenine(A)-thymine(T) and guanine(G)-cytosine(C) base pairings. Similarly, RNA has adenine-uracil(U) and guanine-cytosine base pairings


a gene that encodes the component of hemoglobin. A point mutation within this gene causes Sickle Cell Disease


the variety and variability of living things in the world or in a particular habitat or ecosystem


the application of life sciences, physical sciences, mathematics and engineering principles and techniques to problems in medicine and biology


scientists with multidisciplinary training in computer science and biology who are equipped to answer important biological questions based on analyzing and interpreting huge biological-based datasets


a field in biology where scientists collect, analyze, and interpret biological data to further human understanding of biological systems

Bowl of Spaghetti model

model that depicts that uncondensed chromosomes (not going through cell division) are randomly entangeled in the nucleus. An experiment in the 1980s by Thomas and Christoph Cremer did not support this model.


a gene that can cause breast cancer if it is mutated


a unit of memory size


the weight of a genome


a disease caused by an abnormal cells dividing uncontrollably in a part of the body


a thread of nucleosomes that is looped and packaged by proteins into tightly packed chromosomes during DNA replication


a single molecule of DNA that is highly organized (by proteins) when cells divide. When cells are not dividing, this single molecule of DNA is less structured

Chromosome Territory model

model that depicts that each uncondensed chromosome (not going through cell division) occupies a specific space and only overlaps with its neighbors. An experiment in the 1980s by Thomas and Christoph Cremer supported this model.


a sequence of three DNA or RNA nucleotides (trinucleotide) that corresponds with a specific amino acid or stop signal during protein synthesis

complementary strands (of DNA)

two single opposing stands of DNA that bind as a result of base pairing throughout their full length

coverage plot

A plot where the RNAseq data is displayed graphically. The more sequence reads you have in a region the higher the plot is.  More RNA sequence reads means more gene expression.

Cystic Fibrosis

a disease which causes severe lung and digestive system damage and leads to respiratory failure

Cystic Fibrosis Transmembrane conductance Regulator (CTFR)

A protein that functions as channel for movement of chloride ions in and out of cells. In other words, this protein regulates a balance between salt and water on epithelial surfaces, such as in the lung or pancreas.


an organized collection of information that can be accessed, managed, and updated

deoxyribonucleic acid (DNA)

the molecule that encodes the information for all the characteristics or traits about an organism


an organism or cell that contains paired chromosomes. The set consists of one chromosome from each parent.

DNA barcoding

using a DNA segment of the genome to identify the sequence belonging to a particular species

DNA fingerprinting

a technique of utilizing differences in DNA (polymorphisms) to identify an individual

DNA polymerase

a protein enzyme that helps synthesize new strings of DNA by copying an existing string

DNA Sequencing

“Reading” the DNA molecule to determine the sequence of the letters


description of an allele which can express its phenotype independent of whether the allele on the homologous pair is identical or not

ELSI (Ethical, Legal and Social Implications Research Program )

A group within the National Institute of Health's human genome institute whose goal is to foster basic and applied research on the ethical, legal and social implications of genetic and genomic research for individuals, families and communities.

embryonic stem cell (ESC)

a type of stem cell that is pluripotent, meaning they are able to specialize into any cell type that makes up the human body. ESCs are derived from the undifferentiated inner mass cells of a human embryo and have stemmed complicated ethical questions in science.


cells that contain membrane-bound organelles, especially the nucleus


a collection of all the protein-coding sequences (exons) found in the genome. This is the part of the human genome we understand best, but the exome only makes up about 2% of the entire genome.


the pieces of RNA that we keep in the final messenger RNA molecule to code for proteins

flow cell

a special glass plate that is a little larger than a microscope slide, which is used to attach DNA or cDNA (made from RNA) samples to sequence them in next-generation sequencing machines


a unit of DNA that controls specific traits

gene chippers

Robots that are fed thousands of seeds to report each seed's genetic information for plant breeding purposes.  The robots chip off a part of each seed (a part that will not kill the seed), extract DNA from the plant material, perform DNA analysis, and report the results via a database.

gene expression

the process by which instructions in our DNA are converted into functional products, such as proteins (read more about the basics of gene expression)

gene ID

a unique sequence of numbers and/or letters (similar to a social security number) to identify a gene

gene ontology (GO)

a vocabulary to describe gene products in terms of their associated biological processes, cellular components and molecular functions in a species-independent manner.

gene quantification

counting the number of reads we have assigned to each gene

genetically modifying foods

foods in which the DNA has been altered in a way that does not occur naturally or by artificial selection


the study of genes and genetic variation and how specific traits or characteristics are inherited from parents to offspring


all the DNA in an organism

genome browser (GBrowse)

an interactive interface that displays and offers access to genomic data (similar to google maps but for genomes)


the study of a large number of genes and their interactions and/or of entire genomes (all of the DNA in an organism) to learn what kind of information is coded in these DNA sequences and how the DNA instructions are carried out


an individual's collection of genes. The term also can refer to the two alleles inherited for a particular gene. The genotype is expressed when the information encoded in the genes' DNA is used to make protein and RNA molecules. The expression of the genotype contributes to the individual's observable traits, called the phenotype.


a principal taxonomic category that ranks above species. It is denoted by a capitalized Latin name and usually includes more than one species.


a cell or organism that contains only a single set  of unpaired chromosomes


a protein inside red blood cells that carries oxygen. A single point mutation within the beta-globin gene (which encodes a component of hemoglobin) can cause the hemoglobin protein to change shape which, in turn, causes Sickle Cell Disease by causing the red blood cell to take the shape of a sickle.

hip dysplasia

a hip joint structural problem


proteins molecules that DNA is wrapped around

HIV (Human Immunodeficiency Virus)

a virus that attacks the immune system


a gene that is similar in structure and evolutionary origin to a gene in another species


term to describe chromosomes that are chomosome pairs. Of the pair, one chromosome comes from the mother and the other is from the father. They carry similar but not identical information.

Huntington's Disease

a disease which causes a breakdown of nerve cells in the brain that affects an individual’s ability to think, speak and move.


an educated guess that can be tested

immune system

the body's natural defense system

intergenic region

region in DNA where no known genes exist


the pieces of RNA that are removed from final messenger RNAs (these segments are not used to code for proteins)


a description of the number and appearance of chromosomes in the nucleus of a eukaryotic cell


a visual tool that scientists use to profile and then analyze chromosomes in a sample of cells

kilobase (Kbs)

a measure of DNA length. One kilobase equals 1000 bases or 1000 nucleotides.


a protein that phosphorylates (adds a phosphate group) other proteins

locus (plural loci)

the specific location and position on a chromosome of a gene's DNA sequence

log base 2 scale

log2(x) means the power you have to raise 2 to get x (i.e. 22 = 4, so log2(4) is 2)


A type of white blood cell in the immune system. Macrophages are responsible for "eating" and digesting cancer cells, microbes, and foreign materials.


a mosquito-carried disease that targets blood cells

megabase (Mb)

1 million base pairs


Sequencing reads that align to multiple locations

multiple sclerosis (MS)

a long-lasting and potentially disabling disease of the brain and spinal cord


alterations in the DNA


a cell that carries messages between the brain and other parts of the body. It is also called a "nerve cell" and is the basic unit of the nervous system.


a unit that consists of a loop made of DNA and histones


the basic structural units of nucleic acids such as DNA and RNA. They are made up of a nucleoside coupled to a phosphate group


a membrane bound structure in a cells that contains DNA

omics datasets

informally refers to the data of some field of study in biology ending in -omics, such as genomics, proteomics or metabolomics; “omics” implies a large or genome-wide scale

paired-end reads

two reads that were sequenced from opposing ends of the same molecule. See paired-end sequencing.

paired-end sequencing

a type of sequencing that obtains reads starting from opposing ends of the same molecule. The reads are output as paired-end reads.

parallel sequencing

any of several high-throughput and next-generation sequencing approaches to DNA sequencing that are able to process a very large number of sequencing reactions at the same time


the loss of the ability to move in part or most of the body


an organism that exploits a host organism at the host's expense


observable traits

picogram (pg)

10-12 grams

point mutation

a mutation in one nucleotide in a gene sequence

polymerase chain reaction (PCR)

A laboratory method using DNA polymerase [also in glossary] to make copies of DNA molecules.


differences in DNA which can be used to identify individuals (eg. number of specific repeats within an intergenic region)


short nucleotide sequences that serve as a starting point for DNA synthesis. They are required for DNA replication because DNA polymerase can only add nucleotides to an existing DNA strand.


creating a sequence of instructions/actions from a fixed set of allowed instructions/actions to solve a problem

programming language

a formal syntax to communicate instructions to a machine


a section of DNA where RNA polymerase can bind to initate transcription


large complex molecules that perform many critical roles for an organism to function

read alignment, read mapping

The entire process of taking sequencing reads and assigning them to specific locations in the genome


a decription of a genetic trait that only expresses its phenotype when a dominant allele is lacking or when the two alleles present are both recessive

reference genome

The entire genome sequence of the organism from which we collected our RNA samples


the act of replacing something that was lost

reverse transcriptase

An enzyme that makes a cDNA copy of an RNA molecule. This enzyme was originally discovered in viruses.

ribonucleic acid (RNA)

a nucleic acid molecule that is implicated in various biological roles, including coding, decoding, regulation, and expression of genes.

RNA sequencing (RNA-seq)

a technique that reveals the presence and quantity of RNA in a biological sample at a given moment in time.

RNA splicing

The process of editing messenger RNA during transcription through removal of introns. Exons are joined together through a process called ligation.

sequence motif

a particular string of nucleotides whose pattern is repeated at least once in a long string that is related to the function of a gene


The process of determining the order of nucleotides that makes up DNA or RNA

sequencing adapters

Short pieces of DNA that attach a sequencing molecule to a sequencing flow cell via complementery base-pairing. These are required for a sequencing machine to recognize and read a molecule of DNA or cDNA.

sequencing reads

a list of short nucleotide sequences (100-200 letters, or base pairs (bp)

Sickle Cell Disease

a genetic disorder that causes red blood cells to take on a deformed sickle shape instead of their normal disk shape, leading to poor circulation, pain, and anemia (low blood cell count).

single-end reads

reads that were sequenced from only one end of the molecule (as opposed to paired-end reads)

somatic mutation

alterations in your DNA that happen after you are conceived (they are not passed on to you from your mother or father's DNA).

spatial indexing

a technique used in next-generation sequencing machines that uses the position of strings of DNA on the flow cell to help sequence the strings more quickly


a group of living organisms that have many characteristics in common and are capable of exchanging genes or interbreeding. It is the principal natural taxonomic unit below a genus.


different variations of messenger RNA that are produced by the same gene; which specific spliceforms and how much of each type are produced from a gene can change what is translated into proteins; having "alternative spliceforms" is a way a single gene can respond to different biological situations by producing different products using the same string of DNA sequence.


a biological machine made up of RNA and protein complexes which function to remove introns from a trascribed pre-mRNA and then ligate the exons together

stem cells

unspecialized cells that are capable of dividing and renewing themselves for long periods (self-renewal) as well as give rise to specialized cells. Different stem cell types have different limitations as to what type of specialized cells they can differentiate into.


a sequence of one or more letters

synthetic biology

an area of research that studies ways to design and build novel biological pathways, organisms, systems, or devices, as well as ways to redesign existing natural biological systems for useful purposes.


the life stage of Toxoplasma when it infects humans


a functional genomic online database for organisms that causes disease in humans and animals

Toxoplasma gondii

a unicellular eukaryotic parasite


a disease caused by Toxoplasma infection


the process in which information in a strand of DNA is decoded into a messenger RNA molecule


part of a chromosome is transferred to a non-homologous chromosome


jumping genes which make up about half of the human genome by weight


a sequence consisting of three nucleotides

type I diabetes

an autoimmune disease typically diagnosed in children and young adults where the body does not produce enough insulin, a hormone needed to allow sugar to enter cells to produce energy

unbalanced translocation

when a chromosome with extra or missing genetic material is inherited from a parent with a balanced translocation  which can lead to genetic disorders


an adjective to describe a single-celled organism


untranslated 5’ and 3’ regions in genes that are made into RNA but do not get translated like the protein coding section of the gene