Much of bioinformatics is concerned with the sequences of subunits in nucleic acid and protein molecules. In collaboration, biologists and computer scientists have developed a standard format for storing sequence information in files.
This format is called fasta. A fasta file has the extension .fasta.
The first line of a fasta file is a description line. The first character of a description line must be ">", the greater-than symbol. This is followed without any whitespace by an optional label, a character string terminated by whitespace.
Any characters on the first line after the initial whitespace is an optional description. The lines following the description line are the sequence data which can only be alphabetic characters, upper or lower case, of arbitrary length.
The sequence is the concatenation of the sequence data lines. The sequence data are terminated by a newline.
If the file contains more than one sequence, there must be a blank line between the final sequence data line of the previous sequence and the > of the next sequence.
An example fasta file might appear as:
>gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]
>MCHU - Calmodulin - Human, rabbit, bovine, rat, and chicken

A sequence can represent either a nucleic acid (e.g., DNA or RNA) or a polypeptide chain representing amino acids.
The central dogma of biology states that a segment DNA is transcribed and translated into a protein according to the following code in which every three letters of DNA (except three cases) become one letter of amino acid:
UCA = S //Serine
AUA = I //Isoleucine
UCC = S //Serine
AUC = I //Isoleucine
UCG = S //Serine
AUU = I //Isoleucine
UCU = S //Serine
AUG = M //Methionine
UUC = F //Phenylalanine
ACA = T //Threonine
UUU = F //Phenylalanine
ACC = T //Threonine
UUA = L //Leucine
ACG = T //Threonine
UUG = L //Leucine
ACU = T //Threonine
UAC = Y //Tyrosine
AAC = N //Asparagine
UAU = Y //Tyrosine
AAU = N //Asparagine
UAA = _ //Stop
AAA = K //Lysine
UAG = _ //Stop
AAG = K //Lysine
UGC = C //Cysteine
AGC = S //Serine
UGU = C //Cysteine
AGU = S //Serine
UGA = _ //Stop
AGA = R //Arginine
UGG = W //Tryptophan
AGG = R //Arginine
CUA = L //Leucine
GUA = V //Valine
CUC = L //Leucine
GUC = V //Valine
CUG = L //Leucine
GUG = V //Valine
CUU = L //Leucine
GUU = V //Valine
CCA = P //Proline
GCA = A //Alanine
CCC = P //Proline
GCC = A //Alanine
CCG = P //Proline
GCG = A //Alanine
CCU = P //Proline
GCU = A //Alanine
CAC = H //Histidine
GAC = D //Aspartic Acid
CAU = H //Histidine
GAU = D //Aspartic Acid
CAA = Q //Glutamine
GAA = E //Glutamic Acid
CAG = Q //Glutamine
GAG = E //Glutamic Acid
CGA = R //Arginine
GGA = G //Glycine
CGC = R //Arginine
GGC = G //Glycine
CGG = R //Arginine
GGG = G //Glycine
CGU = R //Arginine
GGU = G //Glycine
For example the DNA sequence cuugaaauuucu would produce the amino acid sequence leis.
Write a program that reads a fasta file containing valid DNA sequences and creates a new fasta file that contains the same information translated into amino acid sequences.
The new filename should be the same as the original filename, except with "-aa" added before the .fasta extension. For example, the input file foo.fasta would become the output file foo-aa.fasta.
Your output file should have sequence lines exactly 75 characters long, except possibly the last one.
Your program should receive the input filename on the command line.
Your program should print the description line of each sequence processed, together with the count of the total number of sequences processed.
An example run of the program should appear exactly like this:
$ java Student foo.fasta
gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]
MCHU - Calmodulin - Human, rabbit, bovine, rat, and chicken
2 sequences processed
and this would have produced the file foo-aa.fasta.
If the input file does not exist, your program should halt with the single line "0 sequences processed".

Solution PreviewSolution Preview

This material may consist of step-by-step explanations on how to solve a problem or examples of proper writing, including the use of citations, references, bibliographies, and formatting. This material is made available for the sole purpose of studying and learning - misuse is strictly forbidden.

import java.util.Scanner;
import java.util.HashMap;

public class Converter {
* Read a file with a fasta format, convert the RNA or DNA sequences to amino acid sequences
* @param args the array of arguments put into the file
public static void main(String[] args)
String fileName = args[0];
String[] splitName = fileName.split("\\.");
String outputFile = splitName[0] + "-aa." + splitName[1];
Scanner fileInput;
String line, description, nucleic;...

By purchasing this solution you'll be able to access the following files:
data.txt, input.txt, input-aa.txt and

for this solution

or FREE if you
register a new account!

PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

Find A Tutor

View available Java Programming Tutors

Get College Homework Help.

Are you sure you don't want to upload any files?

Fast tutor response requires as much info as possible.

Upload a file
Continue without uploading

We couldn't find that subject.
Please select the best match from the list below.

We'll send you an email right away. If it's not in your inbox, check your spam folder.

  • 1
  • 2
  • 3
Live Chats