Tutorial: Translating Across Introns

When importing genomic sequences, there is typically a need to deal with exons and introns in order to determine protein sequences. This can be done easily in GCK3. This tutorial illustrates how the process works.

  1. Start the application and choose Tools > Deluxe Import > Open Sequences File and then select the file called G-gamma globin.gbk and open the file. This file contains GenBank entry X03109, which is the fetal G-gamma globin from Chimpanzee. You could also open this file in any text editor to see what it contains. You will see Figure 2.73. For more details on Deluxe Importing see Tutorial 13: Importing GenBank Sequence Files Using Deluxe Importing.
    Figure 2.73: Deluxe importing of G-gamma globin
  2. The middle section of this window contains a list of features in this DNA. Exons, introns and CDS (coding sequences) are of interest to us in this tutorial. Clicking on a feature will provide more information about that feature in the right hand text box. Try exploring to see what is in this file. Click on the CDS feature and note that the first line starts with “join” and then lists three pairs of numbers (108..199, 322..544, and 1438..1566). This means that the coding sequence is comprised of nucleotides from 108-199 and 322-544 and 1438-1566. Note also that the exons also have ranges of nucleotides associated with them. Click the button Convert To GCK.
  3. You will see the dialog window shown in Figure 2.74. This shows how each GenBank feature is converted into a GCK feature. You want to have all of the exons and introns “checked” along with the “polyA_site.” Uncheck all the other items because they will just make the display confusing at this point. When everything is set, press the Save Construct button.
    Figure 2.74: G-Gamma globin features
  4. A new window will open up that shows you the construct you just created. It will look similar to Figure 2.75 but may not be identical because the conversion choices you have might be different from the ones used here. Note that the three exons are shown as thick black lines, while the introns show a pattern of black dots. The thin horizontal lines below the construct indicate the presence of comments. You can turn them off by choosing Construct > Display > Hide Comments.
    Figure 2.75: G-Gamma globin imported
  5. Double click on the first intron (200-321) and assign it a color of green (Format > Color > Green). Do the same thing for the second intron (545-1437). Note that you can see the range of nucleotides selected in the very bottom left of the construct window. This will make it easier to identify introns and exons when we switch to sequence view.
  6. Choose Construct > Display > Display Sequence to view the construct as a sequence. Let’s make it a little easier to read by first making the window larger (whatever is comfortable on your screen). Choose Construct > Display > Show Positions to place position numbers at the beginning of each line.
  7. Click somewhere in the sequence and then choose Edit > Select All.. Choose Format > Grouping > Group by Tens. If you can adjust the width of your window to contain 100 characters per line, it will be easier to make selections in subsequent steps.
  8. We need to now define introns. Double-click on the first intron (it is green). You should see 200-321 in the lower left corner of the construct window. Choose Construct > Features > Define Intron. This will “invert” the selection so that the letters appear white on a green background. Repeat this process for the second intron.
  9. The next step is to define what we want to translate. From Step 2 (above) we know that the coding region starts at 108 and ends at 1566. Select this range of nucleotides – confirm that you have selected the correct range by looking in the bottom left of the construct window.
  10. Now choose Construct > Features > Make Region. This will allow you to translate the selected sequence. Enter a name in the dialog box and make sure to check the Protein Sequence checkbox. It should look like Figure 2.76. Once you have this window appropriately set, press the OK button. You will now see the translated protein created by reading codons in the exons and skipping over the sequences in the introns. Note that the codon spanning from exon 1 to exon 2 is actually broken into two pieces (ag then g to make up agg).
    Figure 2.76: Making a globing region
  11. Switch back to a graphical view ( Construct > Display > Display Graphics ). You should see Figure 2.77. Notice that the introns are not displayed as part of the coding sequence.
    Figure 2.77: G-Gamma globin final construct
  12. One final comment. In the sequence view, if you select an intron and then extend the selection (e.g. by shift-clicking) you can actually redefine the intron by choosing Construct > Features > Expand Intron. This will redefine the intron but it will not automatically update the translated protein. You have to do this manually since there might be times when you do not want to lose the protein that already exists (e.g. to illustrate alternative splicing situations).
This entry was posted in gene construction kit tutorials and tagged , , , , , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

Post a Comment

Your email is never published nor shared. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>