Here is the list of things you need to do with your assigned gene: ...

  1. Home
  2. Homework Library
  3. Biology
  4. Genetics
  5. Here is the list of things you need to do with your assigned gene: ...

QuestionQuestion

Transcribed TextTranscribed Text

Here is the list of things you need to do with your assigned gene: 1. Search for the gene encoding your target protein in Genbank. Find and paste the Genbank page for JUST the sequence of your gene (NOT the entire genome!!!) into your document. This CANNOT be the mRNA sequence, you must find the original DNA version of the gene. Also, tell us what is the common name (if there is one) for your organism. For example, for the bacterium Escherichia coli, there is no ‘common name’ other than an abbreviation (E. coli). However, the common name for Quercus rubra is a red oak tree. Several things to think about: a. Note that databases do contain spelling errors, as well as different (British versus American) spellings. For example haemoglobin versus hemoglobin. b. If we only specified the organism, and did not specify a tissue, you may find different homologs, or isoforms from different tissues of the same organism in the database. Choose the one that has the best resolution structure as described in the structure section below. c. Note that the DNA sequence of the gene might be MUCH larger than the actual coding sequence (why is this?). You should include the entire DNA sequence of your gene. However, do NOT paste in an entire genome!! Nor should you include an entire “CONTIG” if that is where you find the sequence. (What is a contig?) You only want the sequence of your gene. i. Underline the DNA sequence encoding the protein (ONLY the coding sequence) ii. Highlight the start and stop codons d. Note that your protein structure MIGHT be a heteromer, that is made up of two or more different proteins! If so, you need to provide sequences from BOTH genes! But only collect the other data on one of the two, so you aren’t doubling all the work. 2. Find a website which will allow you to paste in a DNA sequence and translate it into protein (there are MANY available). Translate the sequence yourself, giving the website of the translator you used, and show that your translation matches the one given in the Genbank file by showing an alignment of your translation against the Genbank translation (show the website of the alignment program you used). If you did this right, they should be identical. Again, there are MANY websites that will allow you to align sequences. 3. Protein sequence alignment. BLAST your protein sequence against the MICROBES database (yes, even if your protein is human, or plant). a. List the top three protein homologs. If all of these come from the same organism, then choose the top three homologs found in three different organisms. There are a number of different algorithms and parameters. If you get no results on your first try, try changing these. If there are no real homologs at all to your protein in the microbial database, then compare against three different eukaryotes. For example, if your protein is a human protein, compare against the mouse genome, or if you have a rat protein, compare with the human genome. (How do you determine what is a “Real” homolog? What criteria are used to decide this?) b. Show a multiple protein sequence alignment of your sequence with the top three homologs from OTHER organisms (i.e. not from homologs in the same organism, or mutant forms of the protein). Thus you will be comparing a total of four different protein sequences, yours and the three top homologs, all aligned together. You should also have a fifth line which is the consensus sequence – make sure you choose an alignment program that does this. 4. Find the three dimensional structure of your protein at the Protein DataBank (www.rcsb.org) and paste it here with the proper PDB ID number. What type of structure is it? (i.e. x-ray, solution, ...other?) What resolution is it? What cofactors does it have? Be careful not to confuse cofactors with other types of molecules associated with the structure. Paste a picture of the structure in your document. You can use Jmol Viewer or other molecule viewers available on the website to visualize the structures. Several things to consider: b. There may be MANY structures derived from your protein, some with substrates bound, or in a complex with other proteins, or mutants. Choose the one that is only your protein, not mutated or bound to anything. Note, however, that in some cases the only structure available might be a mutant, or the protein bound to something. The reason for this is that structural biology is heavily dependent on luck, in some ways it is very much an art. Thus, crystallization conditions for the wild type protein may not have been found, but for a mutant crystals may be available. Just be sure that you haven’t missed an unbound or native/wild type version. (What does the word ‘native’ mean when applied to a protein? How about ‘wild type’? What is the difference?) c. Is the structure from a native protein or a recombinant protein? If recombinant, in what manner was it produced? d. Be aware that there are spelling errors in this database! So you may have to do some hunting. a. Choose the best resolution structure that is available. This means you need to UNDERSTAND what ‘resolution’ means! This is the MOST IMPORTANT, KEY part of your project! You need to be sure you’ve found the structure with the best resolution, as all the other data must derive from this. 5. Functional domains – use Conserved Domain Architecture Retrieval Tool (CDART) or Interpro, or Motif Scan, and indicate what functional domains are found in the protein. What are these domains known to do, if anything? Clearly indicated which tool(s) you used. You must first be sure you understand what a function domain is! Then you must look carefully at the output from these online tools, and interpret what they are telling you about your protein. This will require significant time to first understand WHAT you are looking at (what the different color blocks represent) and then what it MEANS. 6. Search for your protein or the closest human homolog in the OMIM database. Write a brief two page summary of this information (NOTE – DO NOT COPY AND PASTE FROM OMIM!!!!!!!!!!!!!!!! Also DO NOT COPY AND PASTE AND JUST CHANGE A FEW WORDS!!!!!!!!!) You MUST rewrite in YOUR OWN WORDS. Explain in brief what is known about your protein. Is it involved in any diseases? Does it have known functions (and if so, what are they)? Is it normally a monomer? A dimer? Found in a large complex with other proteins? Is it a soluble or membrane-bound protein? In essence you need to summarize what is known about the structure and function of this protein in two pages (this is two pages, double-spaced, which does NOT include the information you have pasted in for parts 1-5 above). 7. Your summary should include any and all references you used, formatted using RefWorks. These references do not count as part of the two pages of summary, and should be referenced as numbers in the text, for example (1). The final product you produce will be something like what we’ve provided below. The only difference is we’ve not written a two page summary. Gene: Cytochrome c from Homo sapiens 1. DNA and protein sequence and Genbank page to JUST this sequence, not the entire genome: LOCUS HUMCYCAA 3088 bp DNA linear PRI 27-APR-1993 DEFINITION ACCESSION VERSION KEYWORDS SOURCE Human somatic cytochrome c (HCS) gene, complete cds. M22877 M22877.1 GI:181241 cytochrome c. Homo sapiens (human) ORGANISM Homo sapiens REFERENCE AUTHORS TITLE JOURNAL PUBMED REFERENCE AUTHORS JOURNAL COMMENT FEATURES source Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo. 1 (bases 1 to 144; 1218 to 1394; 1496 to 3088) Evans,M.J. and Scarpulla,R.C. The human somatic cytochrome c gene: two classes of processed pseudogenes demarcate a period of rapid molecular evolution Proc. Natl. Acad. Sci. U.S.A. 85 (24), 9625-9629 (1988) 2849112 2 (bases 1 to 3088) Evans,M.J. and Scarpulla,R.C. Unpublished Original source text: Human liver DNA. Draft entry and computer-readable sequence for [1],[2] kindly provided by M.J.Evans 03-MAR-1989. Location/Qualifiers 1..3088 /organism="Homo sapiens" /mol_type="genomic DNA" /db_xref="taxon:9606" prim_transcript 81..2512 /note="HCS mRNA and introns" intron 145..1217 /note="HCS intron A" CDS join(1226..1394,1496..1644) /note="cytochrome c" /codon_start=1 /protein_id="AAA35732.1" /db_xref="GI:181242" /translation="MGDVEKGKKIFIMKCSQCHTVEKGGKHKTGPNLHGLFGRKTGQA PGYSYTAANKNKGIIWGEDTLMEYLENPKKYIPGTKMIFVGIKKKEERADLIAYLKKA TNE" exon <1226..1394 /note="cytochrome c, (first expressed exon)" /number=2 intron 1395..1495 /note="HCS intron B" exon 1496..>1644 /note="cytochrome c" /number=3 (NOTA BENE - You need to 1. Underline the DNA sequence encoding the protein 2. Highlight the start and stop codons) ORIGIN 1 gcacgtcagg gcgcgggagc gcggagcgag tttggttgca cttacaccgg tacttaagcg 61 cggaccggcg tgtccttgga cttagagagt ggggacgtcc ggcttcggag cgggagtgtt 121 cgttgtgcca gcgactaaaa agaggtgaga gcgggtcgcg gaggccgcac ctggttagag 181 gcagagctgt gggaggcgcg cacttgcgag cgagccgaaa cccaagcggg gagcattcga 241 ggtggagccc gcgctgggtg ggagggcggg gagtgaagac cctggactgt ggtcagaccg 301 agctgggcga gtaacggctt gaggtgcggc ggagccctaa ctagggacag gtatggtctc 361 ggtcagggac tggaggcggc ttggatacag atccgaggag gaggcggcct cttccgtagt 421 ggttgctgaa gggctatgga aatgataggc aagacttccc tcctggaaag ccgaagctta 481 gagcttcacg ttcttcttca gagggcaaaa gctgttgctc ttctaataag gggccagttc 541 ttttcgtggg cacatgtttc ttccgtcagt cgttctgaca tcctagaagg agtttcatca 601 atcaccttga aaccgacctg gacgggtgac ctcgtggtcg ccccaggaga tcacaggtag 661 gggagttggg atcgcccggg ggaccgtgca gcctgcccct gagctcccat tcacaagttc 721 gagtgtcaag ctactcctgt gacctgggca gatagaaaca gccaggaccg ctttttaaac 781 atttgtgtgc tttgcgttat cctcagggag aggtggcttt acattgtagt aagattaaat 841 ggttaggtct ttttaaaagt tgcggttgtg gtgattttgg cttaatgtgt tcgcccttga 901 gcttcagatc tgtgacttcg tgaccatgat tgtctcttct gaaactggag tttgaattag 961 gttccctctt tgcttgggct ttaacgttcc ttcacgtata cacacaaaaa tacgtttttg 1021 aggaggtact cctaaaaatg tttttggtat taaagaatat ttggtataaa gagtattaaa 1081 gcaaaacaag attcattctg gtatttaatg acataaatta gcaatggatt ggtaattaag 1141 tggctagagt ggtcattcat ttacactgta tttgttacct gaggaaaaat ttactaagtt 1201 gaagctttcg tttttagaat taaatatggg tgatgttgag aaaggcaaga agatttttat 1261 tatgaagtgt tcccagtgcc acaccgttga aaagggaggc aagcacaaga ctgggccaaa 1321 tctccatggt ctctttgggc ggaagacagg tcaggcccct ggatactctt acacagccgc 1381 caataagaac aaaggtaaga gtcacttgtt aaataaaaca acacaaaatg caggaatata 1441 acatgtggca aactatcagg agtgtgaaat aaccgatgca ttctttcttg tttaggcatc 1501 atctggggag aggatacact gatggagtat ttggagaatc ccaagaagta catccctgga 1561 acaaaaatga tctttgtcgg cattaagaag aaggaagaaa gggcagactt aatagcttat 1621 ctcaaaaaag ctactaatga gtaataattg ggccactgcc ttatttatta caaaacagaa 1681 atgtctcatg acttttttat gtgtaccatc ctttaataga tctcatacac cagaattcag 1741 atcatgaatg actgacagaa tattttgttg ggcagtcctg atttaaaact aagactggct 1801 tgtggttaaa tgaatatgtt cagtttttga attttaatag taactccaat tcagtaaatg 1861 gtatcactgt ttaccccttt taaagatatg attagacttc gttagtaatg ttcaactttt 1921 cacaaagatg gtgagtgcca tcttaaaact tactggagat tggttttata tttagattta 1981 tataactggt tatgtgaata tatttaaata ctggggaaat tgcttcactg tcttagaacc 2041 aagcaagatt cacctgtgtt ttgtgttcat gttcatttgc ctcttaaagg caagggttga 2101 agataaataa ggtagcaatg tctatagttt tggccttaac tatgccaatc taattataat 2161 tccctgtatt taaaatggtt tcttttactt attgaaaggc attttagtgt ggtttatgtg 2221 taatattaaa gattattcaa cacctctcac atcttacaga tctataaggt cacatgcttt 2281 taaaatagta gcaagttaaa cttcactctt gaattcttta caatctaagt caaactaagt 2341 tataatttag gattgtcttt aaacagccat tcagaaacaa aactgtagaa ctgtgtattt 2401 gattgggaat ggtgcttttg ccaacttaaa aggattaaag taacggagat atacacaaat 2461 tttaaaatta tgtgtgatca caagactaaa gataattaaa aagaaaacca cagatcatga 2521 ctttttgact gtgcttgatt tcatgactga tgcacaaatt ttaatgatta aaaagtgcag 2581 gagccctaaa tgtcagtgca gcagccctaa atgtcagtgc agcagtgtta accagtcatg 2641 gtgctagatt gtttacttgg ttttctagga ctgcctcaac tagaataaca cttcactaat 2701 tgactcttag tttctttgct cagattgaga actgcagcat ttatcgcaga catggacaga 2761 ggaatgcctg tggtcatagt tttgtgatgt gtaacagtgt ataattacat actgaattat 2821 ttcatgcata gtctgtgcca tacacattta gagtagtcct tggagatttt atggagatgg 2881 tgagcacaag gtaagtcata aagaataatg agaaaataaa tctatgctgg tgcagctgag 2941 aactgtatct ttgtgggaca gtgagaagac tgagaagatg tgaatccatg gtctcaaagg 3001 tgatagggac gattagatag gtgttttaag gcctgaaagc aatttataac atatgagtct 3061 tatttttatt tatagaaatg tgaagctt // Atgggtgatgttgagaaaggcaagaagatttttattatgaagtgttcccagtgccacaccgttgaaaagggaggcaagcacaagactgggccaaatctccatggtctctttgggcggaagacaggtcaggcccc tggatactcttacacagccgccaataagaacaaaggcatcatctggggagaggatacactgatggagtatttggagaatcccaagaagtacatccctggaacaaaaatgatctttgtcggcattaagaagaagg aagaaagggcagacttaatagcttatctcaaaaaagctactaatgagtaa 2. Example: Here’s a translation I made using a translator found at the ExPASy website: (NOTA BENE: You need to translate the sequence yourself, giving the website of the translator you used, and show that your translation matches the one given in the Genbank file by showing an alignment) 5'3' Frame 1 Met G D V E K G K K I F I Met K C S Q C H T V E K G G K H K T G P N L H G L F G R K T G Q A P G Y S Y T A A N K N K G I I W G E D T L Met E Y L E N P K K Y I P G T K Met I F V G I K K K E E R A D L I A Y L K K A T N E Stop Example alignment using BLAST: >lcl|50657 unnamed protein product Length=105 Score = 216 bits (549), Expect = 3e-78, Method: Compositional matrix adjust. Identities = 105/105 (100%), Positives = 105/105 (100%), Gaps = 0/105 (0%) Query 1 MGDVEKGKKIFIMKCSQCHTVEKGGKHKTGPNLHGLFGRKTGQAPGYSYTAANKNKGIIW 60 MGDVEKGKKIFIMKCSQCHTVEKGGKHKTGPNLHGLFGRKTGQAPGYSYTAANKNKGIIW Sbjct 1 MGDVEKGKKIFIMKCSQCHTVEKGGKHKTGPNLHGLFGRKTGQAPGYSYTAANKNKGIIW 60 Query 61 GEDTLMEYLENPKKYIPGTKMIFVGIKKKEERADLIAYLKKATNE 105 GEDTLMEYLENPKKYIPGTKMIFVGIKKKEERADLIAYLKKATNE Sbjct 61 GEDTLMEYLENPKKYIPGTKMIFVGIKKKEERADLIAYLKKATNE 105 3. Here are the BLASTp results for comparing the human cytochrome c protein to the microbe database: Length=176 GENE ID: 9504955 Bresu_3056 | cytochrome C class I [Brevundimonas subvibrioides ATCC 15264] Score = 118 bits (296), Expect = 1e-31, Method: Compositional matrix adjust. Identities = 50/98 (51%), Positives = 72/98 (73%), Gaps = 2/98 (2%) Note that I was not able to use my first translation in the BLAST search, as it will not accept that format with Met as a three letter code but the other amino acids as single letter codes, and no spaces between letters (Nota Bene: You need to BLAST your sequence against the database, and show the top hits. Also indicate which BLAST algorithm you used. ) >ref|YP_003819985.1| cytochrome C class I [Brevundimonas subvibrioides ATCC 15264] Query 2 Sbjct 77 Query 62 Sbjct 135 Length=133 GDVEKGKKIFIMKCSQCHTVEKGGKHKTGPNLHGLFGRKTGQAPGYSYTAANKNKGIIWG 61 GD G+++F +C CH++E+G ++ GP+LHG+ GR GQ GY+Y+AANK G +W GDAVAGERVF-AQCRTCHSIEEG-VNRVGPSLHGIIGRTAGQVAGYNYSAANKASGKVWD 134 EDTLMEYLENPKKYIPGTKMIFVGIKKKEERADLIAYL 99 +TL YLENP+ YIPGTKM FVG++ ++RA++IAYL NETLFAYLENPRAYIPGTKMAFVGLRDPQQRANVIAYL 172 > gb|ADL02362.1| cytochrome c class I [Brevundimonas subvibrioides ATCC 15264] ref|YP_006374439.1| cytochrome c family protein [Tistrella mobilis KA081020-065] gb|AFK57457.1| cytochrome c family protein [Tistrella mobilis KA081020-065] GENE ID: 13002190 TMO_c0847 | cytochrome c family protein [Tistrella mobilis KA081020-065] Score = 113 bits (282), Expect = 4e-30, Method: Compositional matrix adjust. Identities = 49/98 (50%), Positives = 67/98 (68%), Gaps = 1/98 (1%) Query 2 Sbjct 33 Query 62 Sbjct 92 GDVEKGKKIFIMKCSQCHTVEKGGKHKTGPNLHGLFGRKTGQAPGYSYTAANKNKGIIWG 61 GD KG+KIF +C CHT+E GG ++ GPNLHG++GR+ G+A G+ Y+ A G++W GDAAKGEKIF-ARCKACHTIEAGGPNRVGPNLHGVYGREAGKAEGFKYSNAMAESGVVWT 91 EDTLMEYLENPKKYIPGTKMIFVGIKKKEERADLIAYL 99 +L YL PK+IG+MFG+KEERAD+IAYL PENLDTYLTAPAKFIKGNRMAFAGLAKPEERADIIAYL 129 > ref|ZP_08629038.1| cytochrome c2 [Bradyrhizobiaceae bacterium SG-6C] gb|EGP08397.1| cytochrome c2 [Bradyrhizobiaceae bacterium SG-6C] Length=130 Score = 111 bits (277), Expect = 2e-29, Method: Compositional matrix adjust. Identities = 54/99 (55%), Positives = 66/99 (67%), Gaps = 1/99 (1%) Query Sbjct Query Sbjct 3 DVEKGKKIFIMKCSQCHTVEKGGKHKTGPNLHGLFGRKTGQAPGYSYTAANKNKGIIWGE 62 DV G+K F KC CH V + K+ GP L+GLFGRK+G GY+Y+ ANKN GI W E 25 DVAAGEKSF-NKCRACHQVGETAKNSVGPELNGLFGRKSGSVAGYNYSDANKNSGITWDE 83 63 DTLMEYLENPKKYIPGTKMIFVGIKKKEERADLIAYLKK 101 EY+++PK IPGTKM F GIKK EE DL A+LK+ 84 AVFAEYIKDPKAKIPGTKMAFAGIKKDEEIKDLTAFLKQ 122 3b. NOTE – We are NOT showing you an example of the multiple alignment – we want you to figure that one out on your own, but do NOT forget to include it! 4. Here is the 3D structure of the human cytochrome c protein PDB ID 1J3S: This is an NMR solution structure of recombinant human cytochromes c. It has a heme c cofactor. They did not submit a value for resolution for this particular structure, but it was the only one available. 5. Functional domains: (From Motif Scan): cAMP and cGMP-dependent protein kinase phosphorylation site, amidation site, Casein kinase II phosphorylation site, N-myristoylation site. Bipartite nuclear localization signal profile. From RCSB: From INTERPRO: 6. Two page summary 7. References

Solution PreviewSolution Preview

These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice. Unethical use is strictly forbidden.

1. Search for the gene encoding your target protein in Genbank. Find and paste the Genbank page for JUST the sequence of your gene (NOT the entire genome!!!) into your document. This CANNOT be the mRNA sequence, you must find the original DNA version of the gene. Also, tell us what is the common name (if there is one) for your organism. For example, for the bacterium Escherichia coli, there is no ‘common name’ other than an abbreviation (E. coli). However, the common name for Quercus rubra is a red oak tree.

Answer: The gene provided to me is notch 1 of Mus musculus . The gene is protein coding and is present on Chromosome 2. The gene id for the above said is 18128. The common name of genus Mus musculus is house mouse.

Several things to think about:
a. Note that databases do contain spelling errors, as well as different (British versus American) spellings. For example haemoglobin versus hemoglobin.

Answer: The difference is the dialect in which it is used. I, a Briton, would call it haemoglobin, whereas an American would call it hemoglobin and so the difference in spellings.

c. Note that the DNA sequence of the gene might be MUCH larger than the actual coding sequence (why is this?). You should include the entire DNA sequence of your gene. However, do NOT paste in an entire genome!! Nor should you include an entire “CONTIG” if that is where you find the sequence. (What is a contig?) You only want the sequence of your gene.
i. Underline the DNA sequence encoding the protein (ONLY the coding sequence)
ii. Highlight the start and stop codons

Answer: A Contiguous sequence of DNA formed by assembling overlapping sequenced fragments of a chromosome. A cluster of clones represents overlapping regions of the genome. A contig is a chromosome map screening the locations of that area of a chromosome where contiguous DNA segments overlap. Contig maps are significant as they provide the capability to study an entire, and often large, segment of the genome by probing a series of overlapping clones which then offer a continuous succession of information about that region....

By purchasing this solution you'll be able to access the following files:
Solution.docx.

$38.00
for this solution

or FREE if you
register a new account!

PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

Find A Tutor

View available Genetics Tutors

Get College Homework Help.

Are you sure you don't want to upload any files?

Fast tutor response requires as much info as possible.

Decision:
Upload a file
Continue without uploading

SUBMIT YOUR HOMEWORK
We couldn't find that subject.
Please select the best match from the list below.

We'll send you an email right away. If it's not in your inbox, check your spam folder.

  • 1
  • 2
  • 3
Live Chats