You have seen in previous tutorials how you can enter comments in association with site markers, segments of DNA, chronographic generations, and regions of interest. Given that you may have many generations in a construct, it might sometimes be difficult not only to remember where in a construct you have stored information, but in which file the information was stored. In addition, if you store information such as storage locations for a given construct as part of the general information associated with each construct, it is useful to be able to search quickly through all the files you have to obtain a list, for example, of all the constructs in the freezer in room 211. This is done using the File Searching capability of GCK.
- Start GCK and open the file pBR322 in the tutorial files folder. You might recall that there is an origin of replication in this vector but it is not visible in the view you see, nor can you remember with which generation it is associated. You can identify the location of this feature by using the Search Comments function.
- Choose Construct > Search Comments…, which will bring up a dialog box. Type the word “origin” into the Key Text field and press Search. This will bring up Figure 2.39. This search examines all comments and feature names associated with all chronography, segments, regions, site markers, or part of general info for the key work “origin”. You can limit your search to just specific features of the construct by using the checkboxes on the left of the dialog box.
- The list that appears in the right of Figure 2.39 contains all those features which have the word “origin” in their names or in their comments. Click on the word “origin” in the right side list and press Show Info to see the context of the match. This will bring up Figure 2.40. This is the standard region “get info” box. Note that the word “origin” is highlighted in the Region Comments field so that you can easily see where it is located. From this dialog, you know that the origin of replication is from nucleotides 2484 to 2723 and can be found as a region in generation#1 (as indicated in the bottom left corner of the dialog).
- Searching comments is very useful for looking just within a single construct, but what about looking through all the files on a disk or in a specific folder or even in a folder on a common file server? This is where the file searching capability comes in. File Searching allows you to set up sophisticated search criteria and find files on your disk. Choose Tools > Search Files… to bring up Figure 2.41. In this dialog you enter your search criteria in the lines at the top of the dialog. For now, we will do a simple search. Just type in “ampicillin” in the top line. Note that you could just as easily search for something like “room 211-3B” to find all the constructs stored in the freezer in room 211, Box 3B – IF you had entered this is the comments. The goal here is to find all of the files that have comments indicating that they contain the word “ampicillin” in their comments – this is not the same as searching for the DNA sequence. If you are careful in entering comments for each of your files, File Searching can work as a searching tool for your database of constructs.
- Note that the search query box will display the text you are searching for. In the Find Match In section of the dialog, you can specify if you want to search DNA sequences or comments (and titles). Click the radio button that says Any Comments. You next need to specify the Search Directory. This is the folder that you wish to search to find matches satisfying the search criteria. If you choose a folder that has other folders inside it, and you want to search the contents of those folders too, you should check the Check Subdirectories checkbox at the bottom of the dialog.
- Press the Set Directory button to define the directory (folder) you wish to search. You will see Figure 2.42. Select the directory containing the files you wish to search. In this case, just select the GCK3 folder .
- Now you are ready to do the search. Press the Search button as shown in Figure 2.41. You will see a progress indicator informing you of what is going on and then you will see Figure 2.43. This window shows the Search Query and then lists all of the files that matched the criteria set in the search. Clicking on a file in the list will allow you to open it. You can also press the Save List button to create a text file containing all the matches with your query. This might be useful for generating a list, for example, of all the constructs containing an ampicillin resistance gene (or those stored in room 211 freezer box 3B).
- Let’s do another search. Choose File > Search Files… again and fill in the query to match Figure 2.44. In this case, the search is more complex. We are looking for files that contain ampicillin AND galactosidase but do NOT contain tetracycline. You can try different combinations of the and, or, and contains/doesn’t contain popup menus (these are sometimes referred to as Boolean operators) to see how they work by looking at the search query box at the bottom of the window. Set the query to match the figure and then press Search.
- You will see another list similar to the one shown in Figure 2.43. But what if someone did not enter all the comments that should have been entered? or what if the files were simply imported sequences that had not been annotated (contained no comments)? If comments were in fact incomplete, the search results would not be that meaningful (especially if you are looking for the absence of certain text items like tetracycline). To address this issue, GCK allows you to search using DNA sequence information instead of comments. Choose File > Search Files… again. Now select the radio button that says DNA Sequence. We want to find constructs that contain an ampicillin gene and also contain the β -galactosidase gene. You can use a 15 nucleotide sequence to represent each gene. For ampicillin resistance ( β -lactamase gene), type in CAACATTTCCGTGTC (you can use lower case too); for β -galactosidase, type in GCGGATAACAATTTC . Your screen should match Figure 2.45. Press Search when you are ready.
- The search results you obtain will be accurate because every file has a DNA sequence. Thus, if a file has both query sequences, it will appear in the list. Your search results should be similar to Figure 2.46. In this case pGFP is selected so the pathname to get to the pGFP construct is shown in the Search Directory area of this window.
- This concludes this tutorial. You should close all open windows at this time (do not save changes to pBR322).
File searching is a very powerful feature. You might elect to have all the constructs at your organization stored on a file server. If users are conscientious in entering comments, anyone will be able to search the comments of all the files at the organization and by looking at the comments know who to contact about obtaining the construct.
2 Trackbacks
[…] is an often overlooked feature in Gene Construction Kit®. This feature can assist labs in the management of the physical DNA samples they have […]
[…] this latest edition of the Textco BioSoftware newsletter, we highlight GCK’s “File Searching” ability – and how it can be used to not only find specific DNA sequences in a large […]