Fall 2008 Data and Vision Weekly Seminar
Organizers: Qi Tian, Kay Robbins, Weining Zhang, Tom Bylander, Carola Wenk, Yufei Huang (EE),
Jianhua Ruan.
Time: 10:00-11:00 am, Every Friday
Place: SB 4.01.20, CS Conference Room
Schedule:
09/12, Atlas Cook (Wenk lab)
09/19, Chengwei Lei (Ruan lab)
09/26, Jie Xiao (Tian lab)
10/3, Xia Li (Tian lab)
10/10, Angela Dean (Ruan lab)
10/17, Hongwei Tian (Zhang lab)
10/24, Dragana Veljkovic (Robbins lab)
10/31, Jia Meng (Huang lab)
11/7, Mark Doderer (Robbins lab)
11/14, Lijie Zhang (Zhang lab)
11/21, Hui Liu(Huang lab)
12/5
Schedule for Fall 2008
-
09/12/08 Shortest Paths in Three Dimensions
Speaker: Atlas F. Cook IV
Presentation slides in [PPT ]Abstract :
In general, the computation of shortest paths in three dimensions is NP-Hard. However, if shortest paths are required to travel along a polyhedral surface (such as a terrain), then the problem becomes quite feasible. We review three techniques from the literature for computing shortest paths on a three dimensional polyhedral surface.
References :-
P. K. Agarwal, B. Aronov, J. O. & Schevon., C. A. Star Unfolding of a Polytope with Applications. SIAM Journal on Computing, Society for Industrial and Applied Mathematics, 1997 , 26 , 1689-1713 [ PDF ]
-
V. Chandru, R. Hariharan, and N. M. Krishnakumar. Short-cuts on star, source and planar unfoldings. Foundations of Software Technology and Theoretical Computer Science (FSTTCS) , 2004, 174-185 [ PDF ]
-
Chen, J. & Han, Y. Shortest paths on a polyhedron. International Journal of Computational Geometry and Applications, 1996 , 6 , 127-144 [ PDF ]
-
Mitchell, J. S. B.; Mount, D. M. & Papadimitriou, C. H. The discrete geodesic problem. SIAM Journal on Computing, Society for Industrial and Applied Mathematics, 1987 , 16 , 647-668 [ PDF ]
-
Schreiber, Y. & Sharir, M. An optimal-time algorithm for shortest paths on a convex polytope in three dimensions. SoCG: 22nd Symposium on Computational Geometry, 2006 , 30-39 [ PDF ]
-
-
09/19/08 Algorithms for finding spaced DNA motifs
Speaker: Chengwei Lei
Presentation slides in [PPT ]Abstract :
Discovering short DNA sequence motifs from the promoters of co-regulated genes is an important step towards deciphering the complex gene regulatory networks and understanding gene functions. Although the general motif finding problem has been extensively studied, many of them do not consider gaps inside a motif. Discovering gapped motifs allows highly degenerate positions in a motif to be explicitly modeled and therefore greatly enlarges the search space and complicates the discovery process. In this presentation we compare several motif-finding algorithms for finding gapped motifs and discuss some possible improvement.
References :
- Chen, C.-Y., Tsai, H.-K., Hsu, C.-M., May Chen, M.-J., Hung, H.-G., Huang, G. T.-W., Li, W.-H. Discovering gapped binding sites of yeast transcription factors. Proc. Natl. Acad. Sci. USA 105: 2527-2532 (2008) [link]
- Edward Wijaya, Kanagasabai Rajaraman, Siu-Ming Yiu, Wing-Kin Sung. Detection of generic spaced motifs using submotif pattern mining. Bioinformatics 23(12): 1476-1485 (2007) [link]
- C. Lei and J. Ruan, A Particle Swarm Optimization Algorithm for Finding DNA Sequence Motifs, 1st Workshop on Data Mining in Functional Genomics, in Conjunction with IEEE International Conference on Bioinformatics and Biomedicine Philadelphia, PA, USA, Nov. 5-7, (2008) [link]
-
09/26/08 Discovering Object Categories from Google's Image Search
Speaker: Jie Xiao
Presentation slides in [PDF]Abstract :
Current research on image object categorization is involved in different level of manually prepared data with varying degree of supervision. The authors propose a novel method to train the model with the retrieved result from commercial search engine, a noisy training set. They expand pLSA, probabilistic Latent Semantic Analysis, with spatial information in image categorization. And their experiment result shows it outperforms the existing methods trained by human labeled datasets.
References :
- Fergus, R. and Fei-Fei, L. and Perona, P. and Zisserman, A. Learning Object Categories from Google's Image Search. In Proc . ICCV, 2005 [link]
- T.Hofmann. Probabilistic latent semantic indexing. In SIGIR, 1999 [link]
- J. Sivic and A.Zisserman. Video Google: A text retrieval approach to object matching in video. In Proc . ICCV, Oct 2003 [link]
-
10/03/08 VideoSense-Towards Effective Online Video Advertising
Speaker: Xia Li
Presentation slides in [ PDF ]Abstract :
With Internet delivery of video content surging to an unprecedented level, online video advertising is becoming increasingly pervasive. Most existing video-oriented sites match the ads with online videos only based on textual information and display ads at the beginning or the end of a video. In this paper, they proposed a novel online advertising system called VideoSense, which automatically associates the most relevant video ads with online videos and seamlessly inserts the ads at the most appropriate positions from the viewers' perspective with less intrusiveness. Their experiments demonstrate the effectiveness of VideoSense for online video advertising.
References :
- T. Mei, X.-S. Hua, L. Yang, and S. Li. VideoSense: Towards effective online video advertising. In Proceedings of ACM Multimedia, pages 1075-1084, Augsburg, Germany, 2007. [ link ]
- Y. Yang and X. Liu. A re-examination of text categorization methods. In Proceedings of SIGIR, 1999. [ link ]
- Y.-F. Ma, L. Lu, H.-J. Zhang, and M. Li. A user attention model for video summarization. In Proceedings of ACM Multimedia, pages 533-542, 2002. [ link ]
-
10/10/08 A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes
Speaker: Angela Dean
Presentation slides in [ ppt ]Abstract :
Osteocytes are a specific type of cell related to bone formation and remodeling. Identifying the regulatory networks in osteocytes may lead to new therapies for osteoporosis and loss of bone, and we have produced a putative model of this regulatory network for future experimental validation. We constructed the network by using microarray data to identify over-expressed genes of interest in osteocytes, then detecting evolutionarily conserved and enriched TF binding sites in the upstream promoter regions of these genes. After constructing the regulatory network, we then used the graph partitioning algorithm "Q-Cut" to identify clusters with strong community structure that represent cis-regulatory modules. Our results include highly conserved clusterings of muscle-related cis-regulatory modules, supporting the concept that a muscle-related gene network is important in osteocyte biology.
References :
-
A.K. Dean, S.E. Harris, J. Ruan. A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes, submitted. [ link ]
- G. Loots et. al, rVista for Comparative Sequence-Based Discovery of Functional Transcription Factor Binding Sites, Genome. Res. 12:832-839, 2002. [ link]
- J. Ruan and W Zhang , Identifying network community structures with a high resolution, Physical Review E, 77:016104, 2008. [ link]
-
-
10/17/08 Preservation of Proximity Privacy in Publishing Numerical Sensitive Data
Speaker: Hongwei Tian
Presentation slides in [ ppt ]Abstract :
Proximity breach is a privacy threat specific to numerical sensitive attributes in anonymized data publication. Such breach occurs when an adversary concludes with high confidence that the sensitive value of a victim individual must fall in a short interval-even though the adversary may have low confidence about the victim's actual value. None of the existing anonymization principles (e.g., k-anonymity, l-diversity, etc.) can effectively prevent proximity breach. In this paper, the problem is solved by introducing a novel principle called (e,m)-anonymity. Intuitively, the principle demands that, given a QI-group G, for every sensitive value x in G, at most 1/m of the tuples in G can have sensitive values "similar" to x, where the similarity is controlled by e. The paper includes a careful analytical study of the theoretical characteristics of (e,m)-anonymity, and the corresponding generalization algorithm. And the findings are verified by experiments with real data.
References :
-
J. Li, Y. Tao, and X. Xiao. Preservation of Proximity Privacy in Publishing Numerical Sensitive Data. In Proc. of ACM Management of Data (SIGMOD), pages 473-486, 2008.
-
A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. In Proc. of International Conference on Data Engineering (ICDE), page 24, 2006.
-
K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Mondrian multidimensional k-anonymity. In Proc. of International Conference on Data Engineering (ICDE), pages 277-286, 2006.
-
R. Agrawal, R. Srikant, and D. Thomas. Privacy preserving olap. In Proc. of ACM Management of Data (SIGMOD), pages 251-262, 2005.
-
-
10/24/08 DD-HDS: A Method for Visualization and Exploration of High-Dimensional Data
Speaker: Dragana Veljkovic
Presentation slides in [ ppt ]Abstract :
Mapping high-dimensional data in a low-dimensional space, for example, for visualization, is a problem of increasingly major concern in data analysis. This talk presents data-driven high-dimensional scaling (DD-HDS), a nonlinear mapping method that follows the line of multidimensional scaling (MDS) approach, based on the preservation of distances between pairs of data. DD-HDS improves the performance of existing competitors with respect to the representation of high-dimensional data, in two ways. It introduces (1) a specific weighting of distances between data taking into account the concentration of measure phenomenon and (2) a symmetric handling of short distances in the original and output spaces, avoiding false neighbor representations while still allowing some necessary tears in the original distribution. More precisely, the weighting is set according to the effective distribution of distances in the data set, with the exception of a single user-defined parameter setting the tradeoff between local neighborhood preservation and global mapping. The optimization of the stress criterion designed for the mapping is realized by "force-directed placement" (FDP). The mappings of low- and high-dimensional data sets are presented as illustrations of the features and advantages of the proposed algorithm. The weighting function specific to high-dimensional data and the symmetric handling of short distances can be easily incorporated in most distance preservation-based nonlinear dimensionality reduction.
References :
-
S. Lespinats, M. Verleysen, A. Giron, G. Fertil "DD-HDS: A Method for Visualization and Exploration of High-Dimensional Data," Neural Networks, IEEE Transactions on , vol.18, no.5, pp.1265-1279, 2007. link
-
P. Eades "A heuristic for graph drawing," Proc. 13th Manitoba Conf. Numer. Math. Comput. (Congressus Numerantium), D. S. Meek and G. H. J. V. Rees Ed. Winnipeg, MB, Canada, vol. 42, p. 149, 1984.
-
T. Fruchterman and E. Reingold "Graph drawing by force-directed placement," Software-Practice Exp., vol. 21, pp. 1129, 1991. link
-
- 10/31/08 Enrichment Constrained Time-Dependent Clustering Analysis for Finding Meaningful Temporal Transcription Modules
Speaker: Jia Meng
Presentation slides in [ ppt ]Abstract
Motivation: Clustering is a popular data exploration technique widely used in microarray data analysis. When dealing with time series data, most conventional clustering algorithms, however, either use one-way clustering methods, which fail to consider the heterogeneity of temporary domain, or use two-way clustering methods that do not take into account the time dependency between samples, thus producing less informative results. Furthermore, enrichment analysis is often performed independent of and after clustering and such practice, though capable of revealing biological significant clusters, cannot guide the clustering to produce biologically significant result.
Result: We present a new enrichment constrained framework (ECF) coupled with a time-dependent iterative signature algorithm (TDISA), which, by applying a sliding time window to incorporate the time dependency of samples and imposing an enrichment constraint to parameters of clustering, allows supervised identification of temporal transcription modules (TTMs) that are biologically meaningful. Rigorous mathematical definitions of TTM as well as the enrichment constraint framework are also provided that serve as objective functions for retrieving biologically significant modules. We applied the Enrichment Constrained Time-Dependent Iterative Signature Algorithm (ECTDISA) to human gene expression time series data of Kaposi's sarcoma-associated herpesvirus (KSHV) infection of human primary endothelial cells; the result not only confirms known biological facts, but also reveals new insight into the molecular mechanism of KSHV infection.
References :
-
Bergmann, S., Ihmels, J. and Barkai, N. (2003) Iterative signature algorithm for the analysis of large-scale gene expression data, Phys ical Review E, 67, 031902. link
-
- 11/07/08 Physical protein-protein interactions predicted from microarrays.
Speaker: Mark Doderer
Presentation slides in [ ppt ]Abstract
Motivation: Microarray expression data reveal functionally associated proteins. However, most proteins that are associated are not actually in direct physical contact. Predicting physical interactions directly from microarrays is both a challenging and important task that we addressed by developing a novel machine learning method optimized for this task.
Results: We validated our support vector machine-based method on several independent datasets. At the same levels of accuracy, our method recovered more experimentally observed physical interactions than a conventional correlation-based approach. Pairs predicted by our method to very likely interact were close in the overall network of interaction, suggesting our method as an aid for functional annotation. We applied the method to predict interactions in yeast (Saccharomyces cerevisiae). A Gene Ontology function annotation analysis and literature search revealed several probable and novel predictions worthy of future experimental validation. We therefore hope our new method will improve the annotation of interactions as one component of multi-source integrated systems.
References :
- T. Soong , K.O. Wrzeszczynski and B. Rost, (2008) Physical protein-protein interactions predicted from microarrays. Bioinformatics, 24(22):2608-2614 link
- 11/14/08 Preserving Privacy in Social Networks Against Neighborhood Attacks.
Speaker: Lijie Zhang
Presentation slides in [ ppt ]Abstract
Recently, as more and more social network data has been published in one way or another, preserving privacy in publishing social network data becomes an important concern. With some local knowledge about individuals in a social network, an adversary may attack the privacy of some victims easily. Unfortunately, most of the previous studies on privacy preservation can deal with relational data only, and cannot be applied to social network data. In this paper, we take an initiative towards preserving privacy in social network data. We identify an essential type of privacy attacks: neighborhood attacks. If an adversary has some knowledge about the neighbors of a target victim and the relationship among the neighbors, the victim may be re-identified from a social network even if the victim's identity is preserved using the conventional anonymization techniques. We show that the problem is challenging, and present a practical solution to battle neighborhood attacks. The empirical study indicates that anonymized social networks generated by our method can still be used to answer aggregate network queries with high accuracy.
References :
- Bin Zhou and Jian Pei, (2008) Preserving Privacy in Social Networks Against Neighborhood Attacks. In Proceedings of the 24 th International Conference on Data Engineering (ICDE'08). link
-
11/21/08 A new systems biology approach for miRNA target prediction
Speaker: Hui Liu
Presentation slides in [ ppt ]Abstract :
miRNA/MicroRNAs are a class of single-stranded non-coding RNAs with about 19 to 25 nucleotides in length that are mostly known to inhibit the translation of mRNAs into proteins or to promote repression of mRNA expression. miRNAs are believed inevitably play important regulatory roles in many biological processes and diseases. Because miRNA recognize mechanism is still not clear and high throughput experiment to identify miRNA targets haven't been developed, computational methods to predicted target is the best way to recognize target in genome scale. In many cases, the expression levels of targets do not change even the target was regulated by miRAN. So most published target prediction algorithms, more than 10, are based on sequence level. But the accuracy, especially the specificity are still low. In this presentation, I will present a new systems biology approach for miRNA target prediction that we worked out recently. The concept of this method is that if a gene is targeted by a miRAN, then in the related pathway, the downstream transcription factor of this gene will be affected. And finally, the target set of this transcription factor will be affected in expression level. Base on this idea, we propose a method that combines pathway data, transcription factor binding data, sequence level data, and microarray data for miRNA overexpression together for target prediction.
References :
-
Pathway Interaction Database. Link: http://pid.nci.nih.gov/
-
Subramanian, A., et al., Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A, 2005. 102(43): p. 15545-50.
Link: http://mootha.med.harvard.edu/PubPDFs/Subramanian2005.pdf -
Transcription factor.
Link: http://en.wikipedia.org/wiki/Transcription_factor.
-
Questions and Comments?
Please send emails to jruan@cs.utsa.edu, or seminar co-organizers: Kay Robbins, Weining Zhang, Yufei Huang, Carola Wenk, and Qi Tian.