Genomics Glossary
1000 Genomes Project |
the international research effort to create the largest public catalogue of human variation and genotype data by sequencing the genomes of a large number of people |
Acrida conica |
scientific name for grasshopper |
Adenosine triphosphate (ATP) |
a chemical that supplies energy for many biochemical cellular processes |
algorithm |
a sequence of instructions/actions to solve some problem |
allele |
a variant form of a gene |
Alzheimer's disease |
a progressive disorder that causes mental deterioration |
amino acid |
organic compounds which are the basic building blocks that make up proteins. DNA and RNA encode amino acid information for protein synthesis via codons |
asthma |
a respiratory disease which usually results from an allergic reaction or other forms of hypersensitivity |
autism |
a general term for a group of complex disorders of brain development |
balanced translocation |
a translocation where no genetic material is gained or lost in the cell, a person typically will be unaffected |
base pairs (bp) |
Paired nucleotides that make up DNA and RNA. Following specific hydrogen-bonding patterns, DNA has adenine(A)-thymine(T) and guanine(G)-cytosine(C) base pairings. Similarly, RNA has adenine-uracil(U) and guanine-cytosine base pairings |
beta-globin |
a gene that encodes the component of hemoglobin. A point mutation within this gene causes Sickle Cell Disease |
biodiversity |
the variety and variability of living things in the world or in a particular habitat or ecosystem |
bioengineering |
the application of life sciences, physical sciences, mathematics and engineering principles and techniques to problems in medicine and biology |
bioinformaticians |
scientists with multidisciplinary training in computer science and biology who are equipped to answer important biological questions based on analyzing and interpreting huge biological-based datasets |
bioinformatics |
a field in biology where scientists collect, analyze, and interpret biological data to further human understanding of biological systems |
Bowl of Spaghetti model |
model that depicts that uncondensed chromosomes (not going through cell division) are randomly entangeled in the nucleus. An experiment in the 1980s by Thomas and Christoph Cremer did not support this model. |
BRCA1 |
a gene that can cause breast cancer if it is mutated |
byte |
a unit of memory size |
C-value |
the weight of a genome |
cancer |
a disease caused by an abnormal cells dividing uncontrollably in a part of the body |
chromatin |
a thread of nucleosomes that is looped and packaged by proteins into tightly packed chromosomes during DNA replication |
chromosome |
a single molecule of DNA that is highly organized (by proteins) when cells divide. When cells are not dividing, this single molecule of DNA is less structured |
Chromosome Territory model |
model that depicts that each uncondensed chromosome (not going through cell division) occupies a specific space and only overlaps with its neighbors. An experiment in the 1980s by Thomas and Christoph Cremer supported this model. |
codon |
a sequence of three DNA or RNA nucleotides (trinucleotide) that corresponds with a specific amino acid or stop signal during protein synthesis |
complementary strands (of DNA) |
two single opposing stands of DNA that bind as a result of base pairing throughout their full length |
coverage plot |
A plot where the RNAseq data is displayed graphically. The more sequence reads you have in a region the higher the plot is. More RNA sequence reads means more gene expression. |
Cystic Fibrosis |
a disease which causes severe lung and digestive system damage and leads to respiratory failure |
Cystic Fibrosis Transmembrane conductance Regulator (CTFR) |
A protein that functions as channel for movement of chloride ions in and out of cells. In other words, this protein regulates a balance between salt and water on epithelial surfaces, such as in the lung or pancreas. |
database |
an organized collection of information that can be accessed, managed, and updated |
deoxyribonucleic acid (DNA) |
the molecule that encodes the information for all the characteristics or traits about an organism |
diploid |
an organism or cell that contains paired chromosomes. The set consists of one chromosome from each parent. |
DNA barcoding |
using a DNA segment of the genome to identify the sequence belonging to a particular species |
DNA fingerprinting |
a technique of utilizing differences in DNA (polymorphisms) to identify an individual |
DNA polymerase |
a protein enzyme that helps synthesize new strings of DNA by copying an existing string |
DNA Sequencing |
“Reading” the DNA molecule to determine the sequence of the letters |
dominant |
description of an allele which can express its phenotype independent of whether the allele on the homologous pair is identical or not |
ELSI (Ethical, Legal and Social Implications Research Program ) |
A group within the National Institute of Health's human genome institute whose goal is to foster basic and applied research on the ethical, legal and social implications of genetic and genomic research for individuals, families and communities. |
embryonic stem cell (ESC) |
a type of stem cell that is pluripotent, meaning they are able to specialize into any cell type that makes up the human body. ESCs are derived from the undifferentiated inner mass cells of a human embryo and have stemmed complicated ethical questions in science. |
eukaryotic |
cells that contain membrane-bound organelles, especially the nucleus |
exome |
a collection of all the protein-coding sequences (exons) found in the genome. This is the part of the human genome we understand best, but the exome only makes up about 2% of the entire genome. |
exons |
the pieces of RNA that we keep in the final messenger RNA molecule to code for proteins |
flow cell |
a special glass plate that is a little larger than a microscope slide, which is used to attach DNA or cDNA (made from RNA) samples to sequence them in next-generation sequencing machines |
gene |
a unit of DNA that controls specific traits |
gene chippers |
Robots that are fed thousands of seeds to report each seed's genetic information for plant breeding purposes. The robots chip off a part of each seed (a part that will not kill the seed), extract DNA from the plant material, perform DNA analysis, and report the results via a database. |
gene expression |
the process by which instructions in our DNA are converted into functional products, such as proteins (read more about the basics of gene expression) |
gene ID |
a unique sequence of numbers and/or letters (similar to a social security number) to identify a gene |
gene ontology (GO) |
a vocabulary to describe gene products in terms of their associated biological processes, cellular components and molecular functions in a species-independent manner. |
gene quantification |
counting the number of reads we have assigned to each gene |
genetically modifying foods |
foods in which the DNA has been altered in a way that does not occur naturally or by artificial selection |
genetics |
the study of genes and genetic variation and how specific traits or characteristics are inherited from parents to offspring |
genome |
all the DNA in an organism |
genome browser (GBrowse) |
an interactive interface that displays and offers access to genomic data (similar to google maps but for genomes) |
genomics |
the study of a large number of genes and their interactions and/or of entire genomes (all of the DNA in an organism) to learn what kind of information is coded in these DNA sequences and how the DNA instructions are carried out |
genotype |
an individual's collection of genes. The term also can refer to the two alleles inherited for a particular gene. The genotype is expressed when the information encoded in the genes' DNA is used to make protein and RNA molecules. The expression of the genotype contributes to the individual's observable traits, called the phenotype. |
genus |
a principal taxonomic category that ranks above species. It is denoted by a capitalized Latin name and usually includes more than one species. |
haploid |
a cell or organism that contains only a single set of unpaired chromosomes |
hemoglobin |
a protein inside red blood cells that carries oxygen. A single point mutation within the beta-globin gene (which encodes a component of hemoglobin) can cause the hemoglobin protein to change shape which, in turn, causes Sickle Cell Disease by causing the red blood cell to take the shape of a sickle. |
hip dysplasia |
a hip joint structural problem |
histone |
proteins molecules that DNA is wrapped around |
HIV (Human Immunodeficiency Virus) |
a virus that attacks the immune system |
homolog |
a gene that is similar in structure and evolutionary origin to a gene in another species |
homologous |
term to describe chromosomes that are chomosome pairs. Of the pair, one chromosome comes from the mother and the other is from the father. They carry similar but not identical information. |
Huntington's Disease |
a disease which causes a breakdown of nerve cells in the brain that affects an individual’s ability to think, speak and move. |
hypothesis |
an educated guess that can be tested |
immune system |
the body's natural defense system |
intergenic region |
region in DNA where no known genes exist |
introns |
the pieces of RNA that are removed from final messenger RNAs (these segments are not used to code for proteins) |
karyotype |
a description of the number and appearance of chromosomes in the nucleus of a eukaryotic cell |
karyotyping |
a visual tool that scientists use to profile and then analyze chromosomes in a sample of cells |
kilobase (Kbs) |
a measure of DNA length. One kilobase equals 1000 bases or 1000 nucleotides. |
kinase |
a protein that phosphorylates (adds a phosphate group) other proteins |
locus (plural loci) |
the specific location and position on a chromosome of a gene's DNA sequence |
log base 2 scale |
log2(x) means the power you have to raise 2 to get x (i.e. 22 = 4, so log2(4) is 2) |
macrophage |
A type of white blood cell in the immune system. Macrophages are responsible for "eating" and digesting cancer cells, microbes, and foreign materials. |
malaria |
a mosquito-carried disease that targets blood cells |
megabase (Mb) |
1 million base pairs |
multimappers |
Sequencing reads that align to multiple locations |
multiple sclerosis (MS) |
a long-lasting and potentially disabling disease of the brain and spinal cord |
mutation |
alterations in the DNA |
neuron |
a cell that carries messages between the brain and other parts of the body. It is also called a "nerve cell" and is the basic unit of the nervous system. |
nucleosome |
a unit that consists of a loop made of DNA and histones |
nucleotides |
the basic structural units of nucleic acids such as DNA and RNA. They are made up of a nucleoside coupled to a phosphate group |
nucleus |
a membrane bound structure in a cells that contains DNA |
omics datasets |
informally refers to the data of some field of study in biology ending in -omics, such as genomics, proteomics or metabolomics; “omics” implies a large or genome-wide scale |
paired-end reads |
two reads that were sequenced from opposing ends of the same molecule. See paired-end sequencing. |
paired-end sequencing |
a type of sequencing that obtains reads starting from opposing ends of the same molecule. The reads are output as paired-end reads. |
parallel sequencing |
any of several high-throughput and next-generation sequencing approaches to DNA sequencing that are able to process a very large number of sequencing reactions at the same time |
paralysis |
the loss of the ability to move in part or most of the body |
parasite |
an organism that exploits a host organism at the host's expense |
phenotype |
observable traits |
picogram (pg) |
10-12 grams |
point mutation |
a mutation in one nucleotide in a gene sequence |
polymerase chain reaction (PCR) |
A laboratory method using DNA polymerase [also in glossary] to make copies of DNA molecules. |
polymorphism |
differences in DNA which can be used to identify individuals (eg. number of specific repeats within an intergenic region) |
primer |
short nucleotide sequences that serve as a starting point for DNA synthesis. They are required for DNA replication because DNA polymerase can only add nucleotides to an existing DNA strand. |
programming |
creating a sequence of instructions/actions from a fixed set of allowed instructions/actions to solve a problem |
programming language |
a formal syntax to communicate instructions to a machine |
promoters |
a section of DNA where RNA polymerase can bind to initate transcription |
proteins |
large complex molecules that perform many critical roles for an organism to function |
read alignment, read mapping |
The entire process of taking sequencing reads and assigning them to specific locations in the genome |
recessive |
a decription of a genetic trait that only expresses its phenotype when a dominant allele is lacking or when the two alleles present are both recessive |
reference genome |
The entire genome sequence of the organism from which we collected our RNA samples |
regeneration |
the act of replacing something that was lost |
reverse transcriptase |
An enzyme that makes a cDNA copy of an RNA molecule. This enzyme was originally discovered in viruses. |
ribonucleic acid (RNA) |
a nucleic acid molecule that is implicated in various biological roles, including coding, decoding, regulation, and expression of genes. |
RNA sequencing (RNA-seq) |
a technique that reveals the presence and quantity of RNA in a biological sample at a given moment in time. |
RNA splicing |
The process of editing messenger RNA during transcription through removal of introns. Exons are joined together through a process called ligation. |
sequence motif |
a particular string of nucleotides whose pattern is repeated at least once in a long string that is related to the function of a gene |
sequencing |
The process of determining the order of nucleotides that makes up DNA or RNA |
sequencing adapters |
Short pieces of DNA that attach a sequencing molecule to a sequencing flow cell via complementery base-pairing. These are required for a sequencing machine to recognize and read a molecule of DNA or cDNA. |
sequencing reads |
a list of short nucleotide sequences (100-200 letters, or base pairs (bp) |
Sickle Cell Disease |
a genetic disorder that causes red blood cells to take on a deformed sickle shape instead of their normal disk shape, leading to poor circulation, pain, and anemia (low blood cell count). |
single-end reads |
reads that were sequenced from only one end of the molecule (as opposed to paired-end reads) |
somatic mutation |
alterations in your DNA that happen after you are conceived (they are not passed on to you from your mother or father's DNA). |
spatial indexing |
a technique used in next-generation sequencing machines that uses the position of strings of DNA on the flow cell to help sequence the strings more quickly |
species |
a group of living organisms that have many characteristics in common and are capable of exchanging genes or interbreeding. It is the principal natural taxonomic unit below a genus. |
spliceforms |
different variations of messenger RNA that are produced by the same gene; which specific spliceforms and how much of each type are produced from a gene can change what is translated into proteins; having "alternative spliceforms" is a way a single gene can respond to different biological situations by producing different products using the same string of DNA sequence. |
spliceosome |
a biological machine made up of RNA and protein complexes which function to remove introns from a trascribed pre-mRNA and then ligate the exons together |
stem cells |
unspecialized cells that are capable of dividing and renewing themselves for long periods (self-renewal) as well as give rise to specialized cells. Different stem cell types have different limitations as to what type of specialized cells they can differentiate into. |
string |
a sequence of one or more letters |
synthetic biology |
an area of research that studies ways to design and build novel biological pathways, organisms, systems, or devices, as well as ways to redesign existing natural biological systems for useful purposes. |
Tachyzoite |
the life stage of Toxoplasma when it infects humans |
ToxoDB |
a functional genomic online database for organisms that causes disease in humans and animals |
Toxoplasma gondii |
a unicellular eukaryotic parasite |
Toxoplasmosis |
a disease caused by Toxoplasma infection |
transcription |
the process in which information in a strand of DNA is decoded into a messenger RNA molecule |
translocation |
part of a chromosome is transferred to a non-homologous chromosome |
transposons |
jumping genes which make up about half of the human genome by weight |
trinucleotide |
a sequence consisting of three nucleotides |
type I diabetes |
an autoimmune disease typically diagnosed in children and young adults where the body does not produce enough insulin, a hormone needed to allow sugar to enter cells to produce energy |
unbalanced translocation |
when a chromosome with extra or missing genetic material is inherited from a parent with a balanced translocation which can lead to genetic disorders |
unicellular |
an adjective to describe a single-celled organism |
UTRs |
untranslated 5’ and 3’ regions in genes that are made into RNA but do not get translated like the protein coding section of the gene |