Subject Computer Science Java Programming

Question

Much of bioinformatics is concerned with the sequences of subunits in nucleic acid and protein molecules. In collaboration, biologists and computer scientists have developed a standard format for storing sequence information in files.
This format is called fasta. A fasta file has the extension .fasta.
The first line of a fasta file is a description line. The first character of a description line must be ">", the greater-than symbol. This is followed without any whitespace by an optional label, a character string terminated by whitespace.
Any characters on the first line after the initial whitespace is an optional description. The lines following the description line are the sequence data which can only be alphabetic characters, upper or lower case, of arbitrary length.
The sequence is the concatenation of the sequence data lines. The sequence data are terminated by a newline.
If the file contains more than one sequence, there must be a blank line between the final sequence data line of the previous sequence and the > of the next sequence.
An example fasta file might appear as:
>gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]
LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV
EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG
LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL
GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX
IENY
>MCHU - Calmodulin - Human, rabbit, bovine, rat, and chicken
adqlteeqiaefkeafslfdkdgdgtittkelgtvmrslgqnpteaelqdminevdadgngtid
fpefltmmarkmkdtdseeeireafrvfdkdgngyisaaelrhvmtnlgekltdeevdemirea
didgdgqvnyeefvqmmtak

A sequence can represent either a nucleic acid (e.g., DNA or RNA) or a polypeptide chain representing amino acids.
The central dogma of biology states that a segment DNA is transcribed and translated into a protein according to the following code in which every three letters of DNA (except three cases) become one letter of amino acid:
UCA = S //Serine
AUA = I //Isoleucine
UCC = S //Serine
AUC = I //Isoleucine
UCG = S //Serine
AUU = I //Isoleucine
UCU = S //Serine
AUG = M //Methionine
UUC = F //Phenylalanine
ACA = T //Threonine
UUU = F //Phenylalanine
ACC = T //Threonine
UUA = L //Leucine
ACG = T //Threonine
UUG = L //Leucine
ACU = T //Threonine
UAC = Y //Tyrosine
AAC = N //Asparagine
UAU = Y //Tyrosine
AAU = N //Asparagine
UAA = _ //Stop
AAA = K //Lysine
UAG = _ //Stop
AAG = K //Lysine
UGC = C //Cysteine
AGC = S //Serine
UGU = C //Cysteine
AGU = S //Serine
UGA = _ //Stop
AGA = R //Arginine
UGG = W //Tryptophan
AGG = R //Arginine
CUA = L //Leucine
GUA = V //Valine
CUC = L //Leucine
GUC = V //Valine
CUG = L //Leucine
GUG = V //Valine
CUU = L //Leucine
GUU = V //Valine
CCA = P //Proline
GCA = A //Alanine
CCC = P //Proline
GCC = A //Alanine
CCG = P //Proline
GCG = A //Alanine
CCU = P //Proline
GCU = A //Alanine
CAC = H //Histidine
GAC = D //Aspartic Acid
CAU = H //Histidine
GAU = D //Aspartic Acid
CAA = Q //Glutamine
GAA = E //Glutamic Acid
CAG = Q //Glutamine
GAG = E //Glutamic Acid
CGA = R //Arginine
GGA = G //Glycine
CGC = R //Arginine
GGC = G //Glycine
CGG = R //Arginine
GGG = G //Glycine
CGU = R //Arginine
GGU = G //Glycine
For example the DNA sequence cuugaaauuucu would produce the amino acid sequence leis.
Write a program that reads a fasta file containing valid DNA sequences and creates a new fasta file that contains the same information translated into amino acid sequences.
The new filename should be the same as the original filename, except with "-aa" added before the .fasta extension. For example, the input file foo.fasta would become the output file foo-aa.fasta.
Your output file should have sequence lines exactly 75 characters long, except possibly the last one.
Your program should receive the input filename on the command line.
Your program should print the description line of each sequence processed, together with the count of the total number of sequences processed.
An example run of the program should appear exactly like this:
$ java Student foo.fasta
gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]
MCHU - Calmodulin - Human, rabbit, bovine, rat, and chicken
2 sequences processed
and this would have produced the file foo-aa.fasta.
If the input file does not exist, your program should halt with the single line "0 sequences processed".

Solution Preview

This material may consist of step-by-step explanations on how to solve a problem or examples of proper writing, including the use of citations, references, bibliographies, and formatting. This material is made available for the sole purpose of studying and learning - misuse is strictly forbidden.

import java.io.BufferedWriter;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileWriter;
import java.io.IOException;
import java.util.Scanner;
import java.util.HashMap;

public class Converter {
/**
* Read a file with a fasta format, convert the RNA or DNA sequences to amino acid sequences
* @param args the array of arguments put into the file
*/
public static void main(String[] args)
{
String fileName = args[0];
String[] splitName = fileName.split("\\.");
String outputFile = splitName[0] + "-aa." + splitName[1];
Scanner fileInput;
String line, description, nucleic;...

This is only a preview of the solution. Please use the purchase button to see the entire solution

Related Homework Solutions

Java Program: A Dictionary with Array and Linked List
Homework Solution
$30.00
Java
Programming
Codes
Algorithms
Arrays
Linked Lists
Dictionary
Constructor
Class
Integers
File Management
Predecessor
Successor
Input
Output
Statements
Variables
Java Programming Problem
Homework Solution
$28.00
Java
Programming
Codes
Algorithms
Computer Science
Classes
Variables
Statements
Source Files
Input
Output
Constructors
Strings
Integers
Error Handling
Java Programming: Integer Problems
Homework Solution
$20.00
Programming
Java
Computer Science
Integers
Variables
Average Values
Even Numbers
Odd Numbers
Loops
Statements
Input
Output
File Management
Conditions
Error Messages
Exceptions
Directed Graph in Java
Homework Solution
$40.00
Java
Programming
Codes
Algorithms
Graphs
Nodes
Edges
Dept-First-Search
Statements
Variables
Loops
Variables
Input
Output
Odd Numbers Problem in Java
Homework Solution
$10.00
Java
Programming
Computer Science
Mathematics
Integers
Odd Values
Even Values
Output
Input
Range
Loops
Conditions
Statements
Samples
Matrix Problem in Java
Homework Solution
$13.00
Java
Programming
Coding
Computer Science
Matrix
Diagonals
Input File
Output File
Integers
Loops
Conditions
Statements
Swapping Numbers
Random Values
Variables
Get help from a qualified tutor
Live Chats