QuestionQuestion

Transcribed TextTranscribed Text

The Chou-Fasman Method for Secondary Structure Prediction Chou, P.Y. and Fasman G.D., Prediction of protein conformation, Biochemistry 13(2), 222-45 (1974) The basic idea is that each amino acid residue is assigned three numbers that describe its propensity to be part of α-helices, β-sheets and tight β-turns. A large number (above 100) for a particular structure means the amino acid has a high preference for that structure, e.g. αhelix, which is typically coupled to low propensity for the other structures. These ChouFasman parameters may be determined from the occurrence of different amino acids in different types of secondary structure in known protein structures. The prediction algorithm then contains the following steps: 1. Assign propensities to all residues in the sequence. 2. Scan the peptide and identify regions where 4 out of 6 contiguous residues exhibit P(α)>100, i.e. large probability for α-helix. Extend these nucleation regions in both directions until the average propensity for a set of four contiguous residues is P(α) < 100. 3. Scan the peptide and identify regions where 3 out of 5 contiguous residues have P(β)>100. These regions nucleate β-strands. Extend these in both directions until a set of four contiguous residues have an average P(β) < 100. This ends the β-strand – almost exactly the same as for the helices, but there might now be overlapping segments. 4. Any region containing overlapping α and β assignments are taken to be helical or β depending on if the average P(α) and P(β) for that region is largest. If this reduces an α- or β-region so that it becomes less than 5 residues, the α- or β-assignment for that region is removed. 5. To identify a tight β-turn at residue number i, the product f(turn) = f(i)f(i+1)f(i+2)f(i+3) is calculated. Remember that a tight turn uses two amino acids, and then there is one before and another after the turn too. These numbers are simply proportional to the frequency of each amino acids occurring in each such position of a β-turn. 6. To predict a β-turn, the following three conditions have to be simultaneously fulfilled: (a) f(turn) > 0.000075, (b) the average value for the turn propensity p(turn) > 100 for the four amino acids and (c) the average p(turn) is larger than the average p(α) as well as p(β), i.e. it is more probable that we have a turn than any other structure. 7. The remaining parts of the sequence (without assignment) are considered coils. Your task for this activity is to write a script in awk to implement the Chou-Fasman algorithm. You can find values for the Chou-Fasman parameters on the last page. 3-letter code 1-letter code p(α) p(β) p(t) f(i) f(i+1) f(i+2) f(i+3) ALA A 142 83 66 0.06 0.076 0.035 0.058 ARG R 98 93 95 0.070 0.106 0.099 0.085 ASP D 101 54 146 0.147 0.110 0.179 0.081 ASN N 67 89 156 0.161 0.083 0.191 0.091 CYS C 70 119 119 0.149 0.050 0.117 0.128 GLU E 151 37 74 0.056 0.060 0.077 0.064 GLN Q 111 110 98 0.074 0.098 0.037 0.098 GLY G 57 75 156 0.102 0.085 0.190 0.152 HIS H 100 87 95 0.140 0.047 0.093 0.054 ILE I 108 160 47 0.043 0.034 0.013 0.056 LEU L 121 130 59 0.061 0.025 0.036 0.070 LYS K 114 74 101 0.055 0.115 0.072 0.095 MET M 145 105 60 0.068 0.082 0.014 0.055 PHE F 113 138 60 0.059 0.041 0.065 0.065 PRO P 57 55 152 0.102 0.301 0.034 0.068 SER S 77 75 143 0.120 0.139 0.125 0.106 THR T 83 119 96 0.086 0.108 0.065 0.079 TRP W 108 137 96 0.077 0.013 0.064 0.167 TYR Y 69 147 114 0.082 0.065 0.114 0.125 VAL V 106 170 50 0.062 0.048 0.028 0.053 Add the missing part to the script us and make it work. The format is .awk BEGIN{ i=0; # Map 3-letter to 1-letter code amino["ALA"]="A"; amino["ARG"]="R"; amino["ASP"]="D"; amino["ASN"]="N"; amino["CYS"]="C"; amino["GLU"]="E"; amino["GLN"]="Q"; amino["GLY"]="G"; amino["HIS"]="H"; amino["ILE"]="I"; amino["LEU"]="L"; amino["LYS"]="K"; amino["MET"]="M"; amino["PHE"]="F"; amino["PRO"]="P"; amino["SER"]="S"; amino["THR"]="T"; amino["TRP"]="W"; amino["TYR"]="Y"; amino["VAL"]="V"; # Define amino acid propensities alpha["A"]=142; beta["A"]= 83; turn["A"]= 66; f_i["A"]=0.06 ; f_ip1["A"]=0.076; f_ip2["A"]=0.035; f_ip3["A"]=0.058; alpha["R"]= 98; beta["R"]= 93; turn["R"]= 95; f_i["R"]=0.070; f_ip1["R"]=0.106; f_ip2["R"]=0.099; f_ip3["R"]=0.085; alpha["D"]=101; beta["D"]= 54; turn["D"]=146; f_i["D"]=0.147; f_ip1["D"]=0.110; f_ip2["D"]=0.179; f_ip3["D"]=0.081; alpha["N"]= 67; beta["N"]= 89; turn["N"]=156; f_i["N"]=0.161; f_ip1["N"]=0.083; f_ip2["N"]=0.191; f_ip3["N"]=0.091; alpha["C"]= 70; beta["C"]=119; turn["C"]=119; f_i["C"]=0.149; f_ip1["C"]=0.050; f_ip2["C"]=0.117; f_ip3["C"]=0.128; alpha["E"]=151; beta["E"]= 37; turn["E"]= 74; f_i["E"]=0.056; f_ip1["E"]=0.060; f_ip2["E"]=0.077; f_ip3["E"]=0.064; alpha["Q"]=111; beta["Q"]=110; turn["Q"]= 98; f_i["Q"]=0.074; f_ip1["Q"]=0.098; f_ip2["Q"]=0.037; f_ip3["Q"]=0.098; alpha["G"]= 57; beta["G"]= 75; turn["G"]=156; f_i["G"]=0.102; f_ip1["G"]=0.085; f_ip2["G"]=0.190; f_ip3["G"]=0.152; alpha["H"]=100; beta["H"]= 87; turn["H"]= 95; f_i["H"]=0.140; f_ip1["H"]=0.047; f_ip2["H"]=0.093; f_ip3["H"]=0.054; alpha["I"]=108; beta["I"]=160; turn["I"]= 47; f_i["I"]=0.043; f_ip1["I"]=0.034; f_ip2["I"]=0.013; f_ip3["I"]=0.056; alpha["L"]=121; beta["L"]=130; turn["L"]= 59; f_i["L"]=0.061; f_ip1["L"]=0.025; f_ip2["L"]=0.036; f_ip3["L"]=0.070; alpha["K"]=114; beta["K"]= 74; turn["K"]=101; f_i["K"]=0.055; f_ip1["K"]=0.115; f_ip2["K"]=0.072; f_ip3["K"]=0.095; alpha["M"]=145; beta["M"]=105; turn["M"]= 60; f_i["M"]=0.068; f_ip1["M"]=0.082; f_ip2["M"]=0.014; f_ip3["M"]=0.055; alpha["F"]=113; beta["F"]=138; turn["F"]= 60; f_i["F"]=0.059; f_ip1["F"]=0.041; f_ip2["F"]=0.065; f_ip3["F"]=0.065; alpha["P"]= 57; beta["P"]= 55; turn["P"]=152; f_i["P"]=0.102; f_ip1["P"]=0.301; f_ip2["P"]=0.034; f_ip3["P"]=0.068; alpha["S"]= 77; beta["S"]= 75; turn["S"]=143; f_i["S"]=0.120; f_ip1["S"]=0.139; f_ip2["S"]=0.125; f_ip3["S"]=0.106; alpha["T"]= 83; beta["T"]=119; turn["T"]= 96; f_i["T"]=0.086; f_ip1["T"]=0.108; f_ip2["T"]=0.065; f_ip3["T"]=0.079; alpha["W"]=108; beta["W"]=137; turn["W"]= 96; f_i["W"]=0.077; f_ip1["W"]=0.013; f_ip2["W"]=0.064; f_ip3["W"]=0.167; alpha["Y"]= 69; beta["Y"]=147; turn["Y"]=114; f_i["Y"]=0.082; f_ip1["Y"]=0.065; f_ip2["Y"]=0.114; f_ip3["Y"]=0.125; alpha["V"]=106; beta["V"]=170; turn["V"]= 50; f_i["V"]=0.062; f_ip1["V"]=0.048; f_ip2["V"]=0.028; f_ip3["V"]=0.053; } { # Extract the sequence from the pdb file and put the 1-letter sequence into the array a[i] if ($3=="CA") { a[i]=amino[$4]; i++ } } END{ ################################ # SECONDARY STRUCTURE PREDICTION ################################ ####################### # # FIND ALPHA-HELICES # ####################### alpha_nucleation_window=6; alpha_nucleation_threshold=4; j=0; while (j<i-alpha_nucleation_window) { # Find Nucleation region alpha_found=0; alpha_score=0; beta_score=0; for (k=0; k<alpha_nucleation_window; k++) { if (alpha[a[j+k]]>100) {alpha_found++} alpha_score+=alpha[a[j+k]]; beta_score+=beta[a[j+k]] } if (alpha_found>=alpha_nucleation_threshold) { # Extend left extend=j-1; stop=0; while (extend >=0 && stop==0) { alpha_av=0; for (l=0; l<4; l++) { alpha_av+=alpha[a[extend-l]]*0.25; } if (alpha_av<100 && extend>=3) {stop=1} else { alpha_score+=alpha[a[extend]]; beta_score+=beta[a[extend]]; extend-- } } left=extend+1; # Extend right extend=j+alpha_nucleation_window; stop=0; while (extend<i-4 && stop==0) { alpha_av=0; for (l=0; l<4; l++) { alpha_av+=alpha[a[extend+l]]*0.25; } if (alpha_av<100) {stop=1} else { alpha_score+=alpha[a[extend]]; beta_score+=beta[a[extend]]; extend++ } } right=extend-1; # check if conditions are met if (alpha_score/(right-left)>103 && right-left>5 && alpha_score>beta_score) { print "Found alpha-helix at position",j+1,"Region:",left+1,right+1; j=right; } } j++; } ####################### # # FIND BETA-STRANDS # ####################### beta_nucleation_window=5; beta_nucleation_threshold=3; j=0; while (j<i-beta_nucleation_window) { # Find Nucleation region beta_found=0; alpha_score=0; beta_score=0; for (k=0; k<beta_nucleation_window; k++) { if (beta[a[j+k]]>100) {beta_found++} alpha_score+=alpha[a[j+k]]; beta_score+=beta[a[j+k]] } if (beta_found>=beta_nucleation_threshold) { # Extend left extend=j-1; stop=0; while (extend >=0 && stop==0) { beta_av=0; for (l=0; l<4; l++) { beta_av+=beta[a[extend-l]]*0.25; } if (beta_av<100 && extend>=3) {stop=1} else { alpha_score+=alpha[a[extend]]; beta_score+=beta[a[extend]]; extend-- } } left=extend+1; # Extend right extend=j+beta_nucleation_window; stop=0; while (extend<i-4 && stop==0) { beta_av=0; for (l=0; l<4; l++) { beta_av+=beta[a[extend+l]]*0.25; } if (beta_av<100) {stop=1} else { alpha_score+=alpha[a[extend]]; beta_score+=beta[a[extend]]; extend++ } } right=extend-1; # check if conditions are met if (beta_score/(right-left)>105 && right-left>5 && beta_score>alpha_score) { print "Found beta-strand at position",j+1,"Region:",left+1,right+1; j=right; } } j++; } ####################### # # FIND TURNS # ####################### for (j=0; j<i-4; j++) { turn_score=0; alpha_score=0; beta_score=0; for (k=0; k<4; k++) { alpha_score+=alpha[a[j+k]]; beta_score+=beta[a[j+k]]; turn_score+=turn[a[j+k]] } p_t=f_i[a[j]]*f_ip1[a[j+1]]*f_ip2[a[j+2]]*f_ip3[a[j+3]]; # check if conditions are met if (turn_score>alpha_score && turn_score>beta_score && turn_score>400 && p_t> 0.000075) { print "Found turn at position",j+1,"Region:",j+1,j+5; } } }

Solution PreviewSolution Preview

This material may consist of step-by-step explanations on how to solve a problem or examples of proper writing, including the use of citations, references, bibliographies, and formatting. This material is made available for the sole purpose of studying and learning - misuse is strictly forbidden.

#######################
#
# FIND BETA-STRANDS
#
#######################

# Go over all AAs
beta_nucleation_window=5;
beta_nucleation_threshold=3;
j=0;

while (j<i-beta_nucleation_window) {

# Find Nucleation region
    beta_found=0;
    alpha_score=0;
    beta_score=0;
    for (k=0; k<beta_nucleation_window; k++) {   
      if (beta[a[j+k]]>100) {beta_found++}
      alpha_score+=alpha[a[j+k]];
      beta_score+=beta[a[j+k]]
    }

    if (beta_found>=beta_nucleation_threshold) {

    # Extend left
      extend=j-1;
      stop=0;
      while (extend >=0 && stop==0) {
       beta_av=0;
       for (l=0; l<4; l++) {
          beta_av+=beta[a[extend-l]]*0.25;
       }
       if (beta_av<100 && extend>=3) {stop=1}
       else {
          alpha_score+=alpha[a[extend]];
          beta_score+=beta[a[extend]];
          extend--
       }
      }
      left=extend+1;...
$15.00 for this solution

PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

Find A Tutor

View available Biology - Other Tutors

Get College Homework Help.

Are you sure you don't want to upload any files?

Fast tutor response requires as much info as possible.

Decision:
Upload a file
Continue without uploading

SUBMIT YOUR HOMEWORK
We couldn't find that subject.
Please select the best match from the list below.

We'll send you an email right away. If it's not in your inbox, check your spam folder.

  • 1
  • 2
  • 3
Live Chats