CS 5263 Final project page
New: Tentative project topics and presentation schedule
New: Sample presentation evaluation form
Projects can be done individually, or in small groups (of at most 3). Groups combining people from different backgrounds are particularly encouraged. Feel free to use the class email list to brainstorm project ideas or to find partners. Choices for the project include, but are not limited to:
- Literature review: Read 3-4 papers on a coherent topic, and report on them.
- Implementation: Read 2-3 background papers, implement the algorithms (or find existing software that implements the algorithms), find some test data, and report on the results.
- Algorithm development: Read 2-3 background papers, propose an improvement of an existing algorithm, implement a prototype and apply it to some test data, and report on the results.
I'd suggest that each group send me a paragraph or drop by to tell me who's in the group, describe your topic, the initial papers, and the implementations and test data (if applicable). Maybe I can give you some pointers.
Before the beginning of the final week, each group will need to hand in a paper (approximately 5-10 pages) describing the project. If you are doing a group project, your paper must also clearly describe the contribution of each personnel in the group.
Each group will also need to give a 20-30 minute presentation, scheduled on the final week: 12/8 and 12/10. You can choose to turn in your project and do the presenation before the final week if you wish.
Some example projects of last year (all individual projects):
- One student Implemented an interface for testing different algorithms in Hidden Markov Models. Project description | report | software
- Another student developed a novel population-based optimization algorithm for motif finding . This resulted in a paper being published later.
- Other students have done surveys on topics including gene networks, new microarray analysis algorithms, etc.
Possible topics for this year:
You are encouraged to choose your own topics and impress me with your creative project ideas. Here are a few of mine to get you started.
- This links to a list of papers of interesting topics. I will add more later.
- Design an efficient algorithm for the (15,4)-motif challenge problem: given 20 sequences of length 600,
find an (unknown) pattern w of length 15, such that each sequence contains one occurrence of a string x which differs from w by exactly 4 letters. I will provide some references and some test data.
- Design an efficient algorithm for the following discriminative motif finding problem: given N promoter sequences, and a partition of these sequences into k mutually exclusive groups, find out over-represented motifs for each group (you may or may not be given some candidate motifs to start with). Enumerative motifs are relatively more useful when dealing with large-scale motif finding. However, most enumerative algorithms only deal with exact k-mers, or k-mers allowing a fixed number of mismatches. The first option may miss real motifs because it does not allow any mismatches. The second option may cause a real motif to be statistically insignificant: even though a real motif may have a relatively large number of mismatches from its instances, those mismatches may tend to happen at some fixed locations or with some fixed substituting bases. Allowing any type of mismatches may cause the real motif to have many spurious matches in the background sequences, and therefore reduces its statistical significance in the foregroud sequences. Design an efficient algorithm to solve this problem. I do not have a good idea on how to do this efficiently, but I believe it is possible to do better than exhausitve search. Some heuristics may be needed. I can provide you some real data set.
- To be continued...
Advices on how to read and present a paper (Adapted from this web page)
When you present a paper in this course (or elsewhere), your goal is to get your audience to appreciate the contribution that the paper makes to scientific knowledge. Generally, you need to explain the following three things about the paper to do that. It often makes sense to present each point in order, but it is more important to focus on the essence of the contribution than it is to follow any particular format.
- What is the problem the paper is trying to address? You should both define the problem and explain its broader significance. In addressing this question, you want to consider things like: What is the biological nature of the problem? Is it reconstructing evolutionary history, identifying genes relevant to the prognosis or treatment of a disease? Why is that important? What is the contribution of the paper to furthering our understanding of the biology? Then you may want to talk about the computational nature of the problem. How was the biological problem reformulated into a computational problem? Is that the main contribution (it often is)? Are there aspects of the computational problem that are particularly interesting? Is a previous (or obvious) computational formulation too slow or not accurate enough? If so, what kind of improvement in the computational approach would be important, and why? Or is this a comparison of alternative approaches? If so, why were those approaches selected and not others? How are they to be compared?
- What were the methods used in the paper? Often, this is where you have to spend the most time in your presentation, since new methods are the essence of most bioinformatics publications. You want to carefully explain exactly what was done. It may require a very close reading of the paper to figure this out; often important facts are buried in seeming asides. When you are working on this part of your presentation, imagine you were trying to replicate the work. What would you need to know?
- What were the results reported? Ideally, it would be straightforward to compare the results presented with the problem statement, but it is not always that easy. Discuss the evaluation method(s) as well as the results. It is often interesting to consider how the authors chose to evaluate there contribution: was it fair? was it indicative of "real world" performance?
Try to identify where the main contribution of the paper is. For example, some papers define interesting new problems, but apply relatively straightforward methods to addressing them.
For a paper like that, focus on work on related problems, and how the new problem statement differs from them.
Are there better approaches developed for related problems that can be applied to the new problems? Some papers present a new approach to a well studied problem. For those papers, carefully compare the new method to other approaches people have taken to the problem. Also, in that situation, the choice of the evaluation method (used to compare the new approach to existing methods) is an important place to focus.
Look for unstated assumptions made in the paper, and try to make them explicit. For example, does a paper on finding cis-regulatory elements from sequence and gene expression data assume that the elements are independent of each other? That the position of the element with respect to the start of transcription is unimportant? Reading alternative approaches to the same problem will make it easier for you to identify these assumptions.
After you have communicated these facts about the paper, you can discuss the aspects you thought were most important or interesting. Is this a method that belongs in your "bioinformatics toolkit"? Can it be applied to related problems straightforwardly, or is it highly specialized? Was there something particularly impressive about the method, the evaluation, the translation of the problem into computational terms, etc.?
In general, bioinformatics papers have an "engineering" flavor that fits well into this problem / method / results paradigm. However, some papers have more of a "basic science" flavor, where a particular claim is being made, and evidence is presented to support that claim. Providing evidence for a claim is closely related to testing a particular hypothesis. If you feel that this better fits the paper you are presenting, then rather than using the problem / method / results paradigm, you can explain it in terms of claims and evidence.