Bioinformatics is the cross-section of various computer science disciplines with biology, particularly focusing on genetic science. A bioinformatician will act as a key supporting resource in one of several subfields, such as genetics, pharmaceuticals (and drug design), or proteomics (the study of protein sequences). They will use their knowledge of data structures and algorithms to create and analyse models of biological processes as part of a research team, using and developing new analytical techniques along the way. At 24HourAnswers, we have tutors well versed in these subjects who are ready to answer your questions or help you with your bioinformatics projects.
Sequencing and sequence alignment
In order to perform any analysis on a DNA structure, the sequence has to first be catalogued. A chain of DNA consists of a string of 4 nucleotides arranged in some order. The nucleotides, known as the canonical bases, are represented by the letters A, C, G, and T, referring to adenine, cytosine, guanine, and thymine. The sequence of each of these molecules determines the structure, form, and ultimately function of a gene, a chromosome, or the entire genome of an organism. This is analogous to binary in a computer system; two files consisting entirely of 1s and 0s can have a completely different function and purpose depending on the order of bits. Unlike a binary file however, simply discovering the order of the four canonical bases in a sequence can prove a challenge. Some 50 years of developments have helped improve the reliability of sequencing, but issues such as sample contamination, or library chimeras (a form of contamination where a single sequence can originate from multiple parent sequences) continue to make sequencing difficult. In addition, this first stage of data collection only produces a raw dataset that needs to go through a process - called sequence assembly - to merge the numerous fragments together and create the original sequence technology. This is because sequencing techniques can only read a few tens of thousands of bases at once, depending on the method used.
Sequence alignment involves processing terabytes of data, proportional to the complexity of the organism being sequenced. A fruit fly’s genome consists of about 130 million base pairs, but a human’s consists of around 3 billion. Given we are stitching the genome together from fragments of 30 to 30,000 base pairs, it should be clear where the difficulty arises! In addition, sequences that are identical or nearly identical can massively increase the complexity of the algorithms used to perform sequence alignment, and the sequencing process itself can produce errors, making reconstruction of the full genome impossible. Broadly speaking, there are two methods of assembly. The first involves assembling a sequence with the use of a template sequence, mapping the sequence fragments onto the template and adjusting it where variations occur. The second, known as de-novo assembly (from the Latin, meaning ‘from new’), makes use of a primer. Primers are short single-stranded acids used to begin DNA synthesis in living organisms, but can also be designed and synthesised for specific purposes like sequencing.
This approach, known as primer walking, starts by using a primer to match with the beginning of a DNA sequence. The T and A nucleotides and the C and G nucleotides always bond together. Thus, a DNA sequence reading TTTGAACCG would bond with a primer sequence reading AAACTTGGC.
Once the primer has bonded with this initial sequence and produced a complete DNA strand (i.e. the famous a double helix), the strand is sequenced using a method known as chain termination, and then the end of the sequence is used as the primer for the next part of the sequence – which is why the method is referred to as ‘walking’.
There are many other methods of sequencing and sequence alignment, but any questions you have can be answered by our team of subject matter experts at 24HourAnswers, who are intimately familiar with all of these techniques, ranging from shotgun sequencing or massively parallel signature sequencing, to the practical uses of the output of sequencing, like producing phylogenetic trees.
Another key area of focus involves the conversion of DNA sequences to RNA sequences, which are then used to assemble protein structures. When DNA is transcribed in the body, an enzyme called RNA polymerase converts the 4 simple C, A, G, T nucleotides into codons by taking 3 of them at a time and using them to produce an amino acid molecule. For instance, if a sequence of CAC or CAU is read, this produces Histidine (denoted by H), which is one of the essential amino acids for living organisms. Other sequences of the bases produce different amino acids, or transcription START and STOP codes. In a way, this is analogous to the process by which high level code is compiled into executable instructions that can run on hardware, only in this case, the hardware is the body.
Once the RNA sequence has been calculated, a 3D model of the protein this creates can be constructed. However, there’s a problem. Each amino acid in the sequence can bond to another at any number of different angles. As we slowly assemble the model using known valid angles between each molecule, we will find ourselves in situations where the structure becomes invalid because of decisions we made earlier in the chain. In addition, only certain angles are valid for any given pairing, due to biochemical energy involved in maintaining the bond. Given the enormous length of the protein, we need techniques to help us. Originally, the favoured method was something known as a Ramachandran plot (named after the lead scientist who developed it), which essentially shows the valid positions for two amino acids to bond in. This can then be used by software to validate the stability of a protein. More modern solutions use x-ray crystallography to produce high quality images of proteins, which can then be fed back into the plot and narrow down the number of known valid angles.
A Ramachandran plot for proline. Proline has a particularly complex structure that limits the angles it can bond to a narrow subset of the spectrum (credit: Dcrjsr, under Creative Commons).
So, why is it so valuable to know the shape of a protein?
Drugs work by bonding with a target site to deliver their effects. In order to do this, they must physically match the target site, a little bit of a jigsaw puzzle or a key matching its lock. As such, it is vital to understand both the shape of the protein being targeted, and to control the shape of the drug’s active component so that it can perform that bond. This process is known as rational drug design, and specifically structure based design, and means that medicines can be created to specifically target the condition they’re trying to address, rather than using traditional trial-and-error methods. One of the more famous classes of drugs that have been created through this kind of process is selective serotonin reuptake inhibitors (or SSRIs), which are a class of antidepressant. The drug is designed to be able to bond with the receptors in the body that reabsorb the brain’s naturally produced serotonin (a neurotransmitter linked to happiness and well-being), and therefore allows serotonin to spend longer in the body, sending synaptic signals that alleviate a patient’s depressive symptoms.
A bioinformatician plays a key role in all aspects of this process and in so doing contributes to the creation of life saving medicines.
Bioinformatics is predominantly a master’s level discipline. Students typically enter the field via an undergraduate degree in computer science or biology, though other subjects such as mathematics can also provide an entry route. You may find that if you’re studying a subject like computer science at undergraduate level, you are able to take introductory classes in bioinformatics; try it out, and see if it’s something you would like to pursue further.
Various online providers (for example, edX) have course offerings in the field of bioinformatics. It would generally be advisable to have some understanding of basic programming principles and algorithms, or, conversely, human genetics and biology before taking such a course. If you’re unsure whether you have the right level of knowledge, you can contact one of our expert tutors for advice. Otherwise, a familiarity with the R programming language (or equivalent) is a great start, and so is any general understanding of cellular level biological processes. If you’re struggling with either of those topics, our expert tutors in biology and computer science are available to help.
As you delve deeper into bioinformatics, no matter what problem you’re facing, we have tutors available to help with any problems you might encounter. Our expert tutors can help with any aspect of bioinformatics, whether you’re just beginning to become familiar with DNA sequencing or are trying to crack a complex image analysis problem. We can offer live tutors or provide writeup deliverables, where a tutor can deliver you clearly-written, production quality code with accompanying detailed descriptions and documentation. More advanced students facing larger challenges will also find no shortage of help from our experts, who can help guide the design and development of your bioinformatics algorithm, or simply provide advice and feedback to help you make code changes and revisions as needed. Regardless of the complexity of your requirements, our seasoned veteran computer scientists are ready to help.
24HourAnswers has been helping students as a US-based online tutoring business since 2005, and our tutors have worked tirelessly to provide students with the best support possible. We are proud to be A+ rated by the Better Business Bureau (BBB), a testament to the quality support delivered by our tutors every day. We have the highest quality experts, with tutors from academia and esteemed institutions such as the Massachusetts Institute of Technology (MIT).
Should you be interested in pursuing a career in this exciting field, remember that our homework help centre is standing by 24/7 to help assist you in all aspects of computer science, including the important field of bioinformatics.
Alexander Sofras is a technical architect with over 20 years of programming experience and 10 years in industry. He currently works in e-commerce and specialises in product discovery and recommendations.
Are you sure you don't want to upload any files?
Fast tutor response requires as much info as possible.