 # Part 1: In the following questions we will explore similarities bet...

## Question

Part 1: In the following questions we will explore similarities between three gene sequences that vary significantly in their evolutionary distance from a common ancestor. The sequences that you will work with are given below:
• Human Gene
• Mouse Gene
• Yeast Gene
Devise a method to find the longest conserved substring shared by all three subsequences. Recall that a substring is a contiguous series of bases common to all three strings. Explain your approach. (You may write code to solve this part if you wish, however, if you do include both your code and your answer).
Part 2: Write a function to compute the minimal pairwise edit distance between two gene sequences such as the ones given part 1. Assume exact matches are weighted as +1, mismatches as 0, and any introduced gaps as -1
Longest Common Subsequence:
• A special case of edit distance where no substitutions are allowed
• A subsequence need not be contiguous, but order must be preserved Ex. If v = ATTGCTA then AGCA and TTTA are subsequences of v, but TGTT and ACGA are not
• The length of the LCS, s, is related to the strings edit distance, d, by:
d(u,w)=len(v)+len(w)–2s(u,w)
Difference between Gap and Mismatch
With two string GTAGGCTTA, GTAGATA
GTAGGCTTA
GTAGA - -TA
Here Red letters indicate mismatch while blue letters indicate gap. Total score for this pair using score of (1,0,-1) would be 6-1+0+0 = 5
Example: Consider these 4 sequences
s1:    GATTCA
s2:    GTCTGA
s3:    GATATT
s4:    GTCAGC
• with the scoring matrix: {Match = 1, Mismatch = -1, IntroGap = -1}
There are (4 Combination 2)= 6
• possible pairwise alignments
s2: GTCTGA                         s1: GATTCA--
s4: GTCAGC (score = 2)             s4: G-T-CAGC (score = 0)

s1: GAT-TCA                         s2: G-TCTGA
s2: G-TCTGA (score = 1)             s3: GATAT-T (score = -1)

s1: GAT-TCA                         s3: GAT-ATT
s3: GATAT-T (score = 1)            s4: G-TCAGC (score = -1)
• The best pairwise score, 2, is between s2 and s4

Part 3: Use the function that you wrote for Part 2 to find the edit distances(total score) between all pairs of genes sequences given in Part 1
editDistance (Human, Mouse) =
editDistance (Human, Yeast) =
editDistance (Mouse, Yeast) =

## Solution Preview

These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice. Unethical use is strictly forbidden.

from numpy import *

import sys
# Fix specifically because provided code hits a recursion limit in giant text files
sys.setrecursionlimit(10000)

# (SCORE_MATCH, SCORE_MISMATCH, SCORE_GAP)
# EDIT THIS TO MODIFY SCORE FORMULA; REST OF CODE WILL USE THIS!
SCORE_MATRIX = (0, 1, 1)

def findLCS(v, w):
score = zeros((len(v)+1,len(w)+1), dtype="int32")
backt = zeros((len(v)+1,len(w)+1), dtype="int32")
for i in range(1,len(v)+1):
for j in range(1,len(w)+1):
# find best score at each vertex
if (v[i-1] == w[j-1]):
score[i,j], backt[i,j] = max((score[i-1,j-1] + 1, 3),
(score[i-1,j],1),
(score[i,j-1],2))
else:
score[i,j], backt[i,j] = max((score[i-1,j],1),
(score[i,j-1],2))
return (score, backt)

def LCS(b,v...

By purchasing this solution you'll be able to access the following files:
code.py, humanGene.seq, mouseGene.seq and yeastGene.seq.

\$50.00
for this solution

PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

### Find A Tutor

View available Python Programming Tutors

Get College Homework Help.

Are you sure you don't want to upload any files?

Fast tutor response requires as much info as possible.