Course Module Image

About the Course Module

RNA Sequencing: Up Close with the Data

Enter the Course Module: 

RNA plays an important role in biological systems as the intermediary between DNA and proteins. By measuring the RNA in a cell or tissue, we learn about the cell’s/tissue’s function. RNA sequencing (RNA-seq) is a relatively new technology that allows us to measure RNA in a sample with a high degree of accuracy. In this module, we will discuss types of data generated by RNA-seq experiments.

Learning Objectives:

  • List the different types of information that can be gained from RNA-seq experiments.
  • Develop a general understanding of how RNA sequencing data are analyzed.
  • Explore some of the challenges associated with analyzing RNA-seq data.
Estimated Module Timing: 

RNA-seq Data—reading, short video, and discussion: 20 minutes

RNA-seq Analysis—reading and discussion: 10–15 minutes

Read Alignment/Mapping—reading and discussion: 15–20 minutes

Gene Quantification—reading and discussion: 10–15 minutes

RNA-seq in the World—reading and discussion: 10–15 minutes

Activities: 10–20 minutes each

 

RNA-seq in the World

In this section, we listed several different questions for which we could use RNA-seq data to help answer.

Here, we elaborate on how the RNA-seq data is used:

  • How do spiders make silk? What makes spider silk so strong?

Researchers have extracted RNA from spider silk glands and used RNA-seq to discover new silk genes and identify other genes that are expressed. Using this information together, they can predict which silk proteins are combined to produce different types of silk and which non-silk genes are necessary to produce the silk.

  • What causes jet lag? Can we develop medicines to prevent it?

Scientists can use RNA to track the internal circadian clock of humans to model organisms such as mice. By extracting RNA from humans and mice before and after they have experienced jet lag, we can see how the jet lag affects the normal functioning of the circadian clock. These affected genes may be good therapeutic targets for treating jet lag.

  • What happens when HIV infects human cells? How can we prevent infection?

By infecting human cells in a petri dish with HIV and sampling RNA from the cells, we can study how the course of HIV infection affects gene expression in the cells. Furthermore, we can also track the RNA produced by the viral genome and how its splicing patterns fluctuate. These changes in transcription may give useful clues about how to prevent or slow the progression of an HIV infection.

  • What leads to drug side effects? Can we prevent them?

Many pharmaceuticals are designed to target one gene or protein, or a small number of genes/proteins. Using RNA-seq we can study the expression of all genes affected by a drug treatment. With this information, we can predict which off-target genes lead to side effects.

  • How do our brains develop during childhood and adolescence?

By studying RNA from developing brains in model organisms, scientists can observe how gene expressions, and RNA splicing in particular, change over time.

 

Activity 1 

1. Break it into fragments and make some chemical changes to the fragments so the machine can recognize the fragments.

2. Increase read length, do paired-end sequencing, and use fragment length information.

3. Challenge question 

No. Without paired-end data, we know the location of only one end of the fragment and cannot calculate the length of the entire fragment. We need paired-end data to calculate the length of the particular fragment in question (how many base pairs are between one end of the fragment and its mate at the other end of the fragment).

You could then compare whether the distance between the paired-ends is consistent with the experiment’s known fragment length range. If these lengths are not consistent, it indicates that the alignment is incorrect (the ends match the genome locations by chance, but this is not where the RNA fragment originated from in the genome).

Note: Students may wonder why these modifications are not performed for every RNA-seq experiment. Modifications add additional cost to the sequencing and complexity to the analysis. Before starting an experiment, you must consider whether the modifications will help you reach your experimental goals. For some experiments, one or more of these modifications may not provide any benefit.

 

Activity 2

1. With more information, the bases continued to match the genome for Alignment 2, but the extra bases did not match the next bases in Alignment 1.

2. The second end (the mate) only matched the base pairs in one of the alignment positions.

3. It gives you more confidence that Alignment 2 is correct. If the fragment lengths were supposed to be 100–150 bp, you may suspect there is something wrong with Alignment 2 (250 bp fragment).

 

Activity 3

1. Increasing the read length lets you know Alignment 3 was wrong. But the read sequence still matched both Alignments 1 and 2.

2. The paired end (the read’s “mate,” shown in orange) matched the genome on the right side of the original read (in blue). For Alignment 1, the mate aligned to the genome a short distance to the right of the original read. For Alignment 2, the mate aligned to the genome a much greater distance to the right of the original read. Thus, paired-end information did not help decide whether Alignment 1 or 2 was the correct one—both still matched the DNA sequence on Chromosome 19.

3. Yes, this information about fragment length lets you know that Alignment 2 cannot be correct (there are too many nucleotides between the paired-end reads if the fragments are only supposed to be 100–200 bp long).

4. Challenge question

All the modifications we’ve discussed in the activity should still produce good results in this scenario; however, given that multi-mapped reads will not be an issue in this case, these modifications may be overkill and ultimately a waste of resources. In these circumstances, it is probably best to use the basic RNA-seq method without any modification.

 

Activity 4

1. You may have discovered a new gene! A reasonable hypothesis is that this part of the genome contains a gene that has not been identified until now.

2. Cancer, muscular dystrophy

3. Quantify the reads specific to each spliceform by counting them. Spliceform 1 has X reads vs. Spliceform 2 has Y reads.

 

Advanced Activity

1. No

2. Spliceform 1: The mate aligns to exon 2.

Spliceform 3: The mate aligns to exon 3.

3. The a) length of the fragments (300–400 base pairs) that were sequenced in the experiment and b) the length between where the paired-end reads align for each of these particular spliceforms (Spliceform 1 = 400 bp; Spliceform 2 = 700 bp).

Note that because these are spliceforms (where the introns are removed and only the exons are transcribed), the length for the fragment equals the total length of the exons in each spliceform.