Part 1: In the following questions we will explore similarities between three gene sequences that vary significantly in their evolutionary distance from a common ancestor. The sequences that you will work with are given below:
• Human Gene
• Mouse Gene
• Yeast Gene
Devise a method to find the longest conserved substring shared by all three subsequences. Recall that a substring is a contiguous series of bases common to all three strings. Explain your approach. (You may write code to solve this part if you wish, however, if you do include both your code and your answer).
Part 2: Write a function to compute the minimal pairwise edit distance between two gene sequences such as the ones given part 1. Assume exact matches are weighted as +1, mismatches as 0, and any introduced gaps as -1
Longest Common Subsequence:
• A special case of edit distance where no substitutions are allowed
• A subsequence need not be contiguous, but order must be preserved Ex. If v = ATTGCTA then AGCA and TTTA are subsequences of v, but TGTT and ACGA are not
• The length of the LCS, s, is related to the strings edit distance, d, by:
Difference between Gap and Mismatch
With two string GTAGGCTTA, GTAGATA
Here Red letters indicate mismatch while blue letters indicate gap. Total score for this pair using score of (1,0,-1) would be 6-1+0+0 = 5
Example: Consider these 4 sequences
       s1:    GATTCA
       s2:    GTCTGA
       s3:    GATATT
       s4:    GTCAGC
• with the scoring matrix: {Match = 1, Mismatch = -1, IntroGap = -1}
There are (4 Combination 2)= 6
• possible pairwise alignments
    s2: GTCTGA                         s1: GATTCA--
    s4: GTCAGC (score = 2)             s4: G-T-CAGC (score = 0)

    s1: GAT-TCA                         s2: G-TCTGA
    s2: G-TCTGA (score = 1)             s3: GATAT-T (score = -1)

    s1: GAT-TCA                         s3: GAT-ATT
    s3: GATAT-T (score = 1)            s4: G-TCAGC (score = -1)
• The best pairwise score, 2, is between s2 and s4

Part 3: Use the function that you wrote for Part 2 to find the edit distances(total score) between all pairs of genes sequences given in Part 1
editDistance (Human, Mouse) =
editDistance (Human, Yeast) =
editDistance (Mouse, Yeast) =

Solution PreviewSolution Preview

These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice. Unethical use is strictly forbidden.

from numpy import *

import sys
# Fix specifically because provided code hits a recursion limit in giant text files

SCORE_MATRIX = (0, 1, 1)

def findLCS(v, w):
    score = zeros((len(v)+1,len(w)+1), dtype="int32")
    backt = zeros((len(v)+1,len(w)+1), dtype="int32")
    for i in range(1,len(v)+1):
       for j in range(1,len(w)+1):
            # find best score at each vertex
            if (v[i-1] == w[j-1]):   
                score[i,j], backt[i,j] = max((score[i-1,j-1] + 1, 3),
                score[i,j], backt[i,j] = max((score[i-1,j],1),
    return (score, backt)

def LCS(b,v...

By purchasing this solution you'll be able to access the following files:, humanGene.seq, mouseGene.seq and yeastGene.seq.

for this solution

PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

Find A Tutor

View available Python Programming Tutors

Get College Homework Help.

Are you sure you don't want to upload any files?

Fast tutor response requires as much info as possible.

Upload a file
Continue without uploading

We couldn't find that subject.
Please select the best match from the list below.

We'll send you an email right away. If it's not in your inbox, check your spam folder.

  • 1
  • 2
  • 3
Live Chats