Dr. Jianhua Ruan, Department of Computer Science, UTSA

Fall 2009 Data and Vision Weekly Seminar

Organizers: Qi Tian, Kay Robbins, Weining Zhang, Tom Bylander, Carola Wenk, Yufei Huang (EE), Jianhua Ruan.
Time: 10:00-11:00 am, Every Friday
Place: SB 4.01.20, CS Conference Room

Previous Seminars

Schedule for Fall 2009

9/18 Angela Dean (Ruan lab)
9/25 Chengwei Lei(Ruan lab)
10/2 Xia Li (Tian lab)
10/9 Jie Xiao (Tian lab)
10/16 Hongwei Tian (Zhang lab)
10/23 Chifeng Ma (Huang lab)
10/30 Lijie Zhang (Zhang lab)
11/6 Mark Doderer (Robbins lab)
11/13 Jian Cui (Huang lab)
11/20 Jessica Sherette (Wenk lab)
11/27 (Holiday)
12/4 Jia Meng (Huang lab)

Seminar information

09/18/09 A novel meta-analysis method exploiting consistency of high-throughput experiments
Speaker: Angela Dean

Abstract :

Motivation: Large-scale biological experiments provide snapshots into the huge number of processes running in parallel within the organism. These processes depend on a large number of (hidden) (epi)genetic, social, environmental and other factors that are out of experimentalists' control. This makes it extremely difficult to identify the dominant processes and the elements involved in them based on a single experiment. It is therefore desirable to use multiple sets of experiments targeting the same phenomena while differing in some experimental parameters (hidden or controllable). Although such datasets are becoming increasingly common, their analysis is complicated by the fact that the various biological elements could be influenced by different sets of factors.

Results: The central hypothesis of this article is that biologically related elements and processes are affected by changes in similar ways while unrelated ones are affected differently. Thus, the relations between related elements are more consistent across experiments. The method outlined here looks for groups of elements with robust intra-group relationships in the expectation that they are related. The major groups of elements may be identified in this way. The strengths of relationships per se are not valued, just their consistency. This represents a completely novel and unutilized source of information. In the analysis of time course microarray experiments, I found cell cycle- and ribosome-related genes to be the major groups. Despite not looking for these groups in particular, the identification of these genes rivals that of methods designed specifically for this purpose.


References :

Back to top

09/25/09 Trawler: de novo regulatory motif discovery pipeline for chromatin immunoprecipitation
Speaker: Chengwei Lei

Abstract:

In this presentation, I will discuss a de novo motif finding tool called Trawler, the fastest computational pipeline to date, to efficiently discover over-represented motifs in chromatin immunoprecipitation (ChIP) experiments and to predict their functional instances. When applied to data from yeast and mammals, Trawler accurately discovered 83% of the known binding sites, often with other additional binding sites, providing hints of combinatorial input. Newly discovered motifs and their features (identity, conservation, position in sequence) are displayed on a web interface.

References:

Back to top

10/02/09 Label to Region by Bi-Layer Sparsity Priors
Speaker: Xia Li

Abstract:

To achieve reliable and visible content-based image retrieval, it is critical to obtain the correspondence between the image labels and their precise regions within an image. But in practice, it is very tedious to manually annotate the image labels to the corresponding image regions, and a more feasible alternative is to annotate at image-level. Therefore, this paper investigate how to automatically reassign the manually annotated labels at image-level to those contextually derived semantic regions. First they propose a bi-layer sparse coding formulation to reconstruct an image or semantic region from over-segmented image patches of an image set. Each layer of sparse coding produces the image label assignment to those selected atomic patches and merged candidate regions based on the shared image labels. Then they fuse this for entire label to region assignment. Extensive experiments on three public image datasets clearly demonstrate the effectiveness of their proposed framework in both label to region assignment and image annotation tasks.

References:

Back to top

10/09/09 Recognition using Regions
Speaker: Jie Xiao

Abstract:

Region features are appealing for encoding the shape and scale information of objects naturally. In this work, a bag of overlaid regions are produced and represented by shape, color and texture. A max-margin method is used to learn the region weights. Then, they use a generalized Hough voting scheme to generate the hypotheses of object locations, scales and support, followed by a verification classifier and a constrained segmenter on each hypothesis. The experimental results show that their approach outperforms the state of the art on the ETHZ shape database.

References:

Back to top

10/16/09 Extending l-Diversity for Better Data Anonymization

Speaker: Hongwei Tian

Abstract:

The notions of l-diversity provide a strong privacy guarantee for generalization. However, existing l-diversity algorithms may force users to choose between publishing no data and scarifying privacy if the data have a skewed distribution of SA values. In this paper, we solve this problem by extending l-diversity in two ways. First, we allow the generalization of SA values and second, we use a simple function to constraint frequencies of SA values. The resulting (tau, l)-diversity is more flexible and elaborate. We present an efficient heuristic algorithm that uses a novel order of quasi-identifier values to achieve (tau, l)-diversity. We compare our algorithm with two state-of-the-art algorithms based on existing l-diversity measures. Our preliminary experimental results indicate that our algorithm can not only effectively deal with data with skewed SA distributions but also result in better utility of anonymous data in general.

References:

Back to top

10/23/09 A strategy for predicting the chemosensitivity of human cancers and its application to drug discovery
Speaker: Chifeng Ma

Abstract:

The U.S. National Cancer Institute has used a panel of 60 diverse human cancer cell lines (the NCI-60) to screen >100,000 chemical compounds for anticancer activity. However, not all important cancer types are included in the panel, nor are drug responses of the panel predictive of clinical efficacy in patients. We asked, therefore, whether it would be possible to extrapolate from that rich database (or analogous ones from other drug screens) to predict activity in cell types not included or, for that matter, clinical responses in patients with tumors. We address that challenge by developing and applying an algorithm we term ''coexpression extrapolation'' (COXEN). COXEN uses expression microarray data as a Rosetta Stone for translating from drug activities in the NCI-60 to drug activities in any other cell panel or set of clinical tumors. Here, we show that COXEN can accurately predict drug sensitivity of bladder cancer cell lines and clinical responses of breast cancer patients treated with commonly used chemotherapeutic drugs. Furthermore, we used COXEN for in silico screening of 45,545 compounds and identify an agent with activity against human bladder cancer.

References:

Back to top

10/30/09 K-Automorphism: A General Framework for Privacy Preserving Network Publication

Speaker: Lijie Zhang

Abstract:

The growing popularity of social networks has generated interesting data management and data mining problems. An important concern in the release of these data for study is their privacy, since social networks usually contain personal information. Simply removing all identifiable personal information (such as names and social security number) before releasing the data is insufficient. It is easy for an attacker to identify the target by performing different structural queries. In this paper they propose k-automorphism to protect against multiple structural attacks and develop an algorithm (called KM) that ensures k-automorphism. They also discuss an extension of KM to handle dynamic releases of the data. Extensive experiments show that the algorithm performs well in terms of protection it provides.

Reference:

Lei Zou, Lei Chen, and M. Tamer Özsu." K-automorphism: A general framework for privacy preserving network publication ," In Proc. 35th Int. Conf. on Very Large Data Bases, August 2009, pages 946-957.

Back to top

11/06/09 Data Integration in Genetics and Genomics: Methods, Challenges and a Case Study

Speaker: Mark Doderer

Abstract:

Due to rapid technological advances, various types of genomic and proteomic data with different sizes, formats, and structures have become available.
Among them are gene expression, single nucleotide polymorphism, copy number variation, and protein-protein/ gene-gene interactions. Each of these distinct data types provides a different, partly independent and complementary, view of the whole genome. However, understanding functions of genes, proteins, and other aspects of the genome requires more information than provided by each of the datasets. Integrating data from different sources is, therefore, an important part of current research in genomics and proteomics. Several approaches to handle data integration will be reviewed in general and a case study will be presented in depth.

Reference:

Jemila S. Hamid, Pingzhao Hu, Nicole M. Roslin, Vicki Ling, CeliaM. T. Greenwood, and Joseph Beyene. " Data Integration in Genetics and Genomics: Methods and Challenges", Human Genomics and Proteomics 2009:869093, doi:10.4061/2009/869093

Yong Wang, Xiang-Sun Zhang and Yu Xia, "Predicting eukaryotic transcriptional cooperativity by Bayesian network integration of genome-wide data", Nucleic Acids Research, 2009, 37(18):5943-5958; doi:10.1093/nar/gkp625

Back to top

11/13/09 Alignment of LC-MS images, with applications to biomarker discovery and protein identification
Speaker: Jian Cui

Abstract : LC-MS-based approaches have gained considerable interest for the analysis of complex peptide or protein mixtures, due to their potential for full automation and high sampling rates. Advances in resolution and accuracy of modern mass spectrometers allow new analytical LC-MS-based applications, such as biomarker discovery and cross-sample protein identification. Many of these applications compare multiple LC-MS experiments, each of which can be represented as a 2-D image. In this article, we survey current approaches to LC-MS image alignment. LC-MS image alignment corrects for experimental variations in the chromatography and represents a computational key technology for the comparison of LC-MS experiments. It is a required processing step for its two major applications: biomarker discovery and protein identification. Along with descriptions of the computational analysis approaches, we discuss their relative merits and potential pitfalls.

References :

Back to top

11/20/09 Computing the Fréchet Distance Between Surfaces
Speaker: Jessica Sherette

Abstract : Similarity metrics between surfaces are used in graphics and computer-aided manufacturing. For example, in computer-aided manufacturing, their use helps to ensure quality control. The Fréchet distance in particular is a useful similarity metric because it takes the continuity of the given surfaces into account. Unfortunately, Computing the Fréchet distance between arbitrary surfaces has been shown to be NP-hard [1]. However, an algorithm has been found to compute the Fréchet distance between simple polygons in polynomial time [2]. Our work extends this algorithm to one which works for a more general class of surfaces. Specifically, we developed a ?xed parameter tractable algorithm to compute the Fréchet distance between two triangulated surfaces with acyclic dual graphs [3].

References :

Back to top

12/04/09 Bayesian Sparse Correlated Factor Analysis
Speaker: Jia Meng

Abstract : In this paper, we propose a new sparse correlated factor model under a Bayesian framework that intended to model transcription factor regulation in a cell. Unlike the convention factors model, the factors are assumed to be non-negative and correlated. The correlation is due the the prior knowledge on the structure of the factors. To model the factors, a rectified function and the Dirichlet process mixture (DPM) prior are introduced. Moreover, the factors are . The loading matrix is sparse and since the prior knowledge of non-zero elements are assumed available, the sparse pattern of the loading matrix is significantly constrained, resulting unambiguous factor order. A Gibbs sampler is proposed to uncover the unknown non-negative factors and the loading matrix from data. The model and the Gibbs sampler are validated on the simulated systems.

References :

Back to top


Questions and Comments?

Please send emails to jruan@cs.utsa.edu, or seminar co-organizers: Kay Robbins, Weining Zhang, Yufei Huang, Carola Wenk, and Qi Tian.


 

©2008 Jianhua Ruan