Department of Computer Science
University of Texas at San Antonio
Schedule:
Abstract:
Wikipedia, as an open editable resource, provides reliable knowledge and taxonomy. Contrast to the rich literal information, Wikipedia is lack of visual illustrations, like images and animations. Can we visually annotate Wikipedia concept and provide representative images according to its taxonomy? The huge amount of online social media, such as the tagged images in Flickr, is a good visual resource. Nevertheless, the noisy nature of the tags hinders itself. Based on the observation that images are often collected by the groups with common interest or topic, we propose a framework to visually annotate Wikipedia via social community. The contribution of our work is two-fold: (i) we diversely enrich Wikipedia with images based on its taxonomy; (ii) we introduce community effort to overcome the noisy nature of tags in harvesting images. This work shows our concept and community data collection of the proposed system.
References:
Due to the inherent measurement noise in microarray experiments, heterogeneity across samples, and limited sample size, it is often hard to find reliable gene markers for classification. For this reason, several studies proposed to analyze the expression data at the level of groups of functionally related genes such as pathways. One practical problem of these pathway-based approaches is the limited coverage of genes by known pathways. To overcome this problem, we propose a new method for identifying effective subnetwork markers by overlaying the gene expression data with a genome-scale protein-protein interaction network. Experimental results on two independent breast cancer datasets show that the subnetwork markers lead to more accurate classification of breast cancer metastasis and are more reproducible than both gene and pathway markers.
References:
Abstract:
Recently, Bag of Visual Word (BoW) model has shown its success in many computer vision tasks, such as object recognition
and visual retrieval. However, how to build a good visual codebook is still a fundamental problem. Traditional codebook
is constructed by unsupervised clustering techniques, e.g. K-Means and all the local descriptors are matched to the generated
visual words with hard quantization. This kind of codebook may yield high quantization errors and give rise to significant
degradation in the discriminative power of descriptors. Hence, it's not optimal for image classification. In this paper,
we propose a multi-layer orthogonal codebook generation approach. The additional layer codebook generated by residues from
orthogonal direction plays a complementary role to the traditional K-Means codebook, and effectively reduces quantization
errors. We also introduce two simple schemes to make use of the new codebook with spatial pyramid matching (SPM) kernel.
Experiments results show that our approach achieves comparable state-of-the-art performance on image categorization.
References:
Abstract:
A bottleneck in drug discovery is the identification of the molecular
targets of a compound (mode of action, MoA) and of its off-target
effects. Previous approaches to elucidate drug MoA include analysis
of chemical structures, transcriptional responses following
treatment, and text mining. Methods based on transcriptional
responses require the least amount of information and can be
quickly applied to new compounds. Available methods are inefficient
and are not able to support network pharmacology. We developed
an automatic and robust approach that exploits similarity
in gene expression profiles following drug treatment, across multiple
cell lines and dosages, to predict similarities in drug effect and
MoA. We constructed a drug network of 1,302 nodes (drugs) and
41,047 edges (indicating similarities between pair of drugs). We
applied network theory, partitioning drugs into groups of densely
interconnected nodes (i.e., communities). These communities are
significantly enriched for compounds with similar MoA, or acting
on the same pathway, and can be used to identify the compoundtargeted
biological pathways. New compounds can be integrated
into the network to predict their therapeutic and off-target effects.
Using this network, we correctly predicted the MoA for nine anticancer
compounds, and we were able to discover an unreported effect
for a well-known drug. We verified an unexpected similarity
between cyclin-dependent kinase 2 inhibitors and Topoisomerase
inhibitors. We discovered that Fasudil (a Rho-kinase inhibitor)
might be repositioned as an enhancer of cellular autophagy, potentially
applicable to several neurodegenerative disorders. Our
approach was implemented in a tool (Mode of Action by NeTwoRk
Analysis, MANTRA, http://mantra.tigem.it).
References:
Abstract:
This paper addresses the problem of publishing a Naive Bayesian Classifier (NBC) or, equivalently, publishing the
necessary views for building an NBC, while protecting privacy of the individuals who provided the training data. The approach
completely preserves the accuracy of the original classifier, and thus significantly improves on current approaches, such as
randomization or anonymization, which typically degrade accuracy to preserve privacy. Current query-view security checkers
address the question of 'Is the view safe to publish?' and are computationally expensive (often p2-complete). Here instead,
this paper tackles the question of 'How to make a view safe to publish?' and propose a linear-time algorithm to publish safe
NBC-enabling views.
This paper firstly shows that a simple measure, which restricts the ratios between the published NBC statistics,
is sufficient to prevent any breach of privacy. Then, a linear-time algorithm is proposed to enforce this measure by producing
perturbed statistics that assure both (i) individuals' privacy, and (ii) a classifier that behaves in the same way as the NBC
trained on the original data. By carefully expressing the derived statistics using rational numbers, synthetic (sanitized)
datasets can be easily produced. Thus, for any given dataset, the work in this paper produces another dataset that is secure
to publish (w.r.t. a uniform prior) and achieves the same classification accuracy. Finally, the results are extended by providing
sufficient conditions to cope with arbitrary (non-uniform prior) distributions and their effectiveness is validated in practice
through experiments on real-world data.
References:
Abstract:
Operators of online social networks are increasingly sharing potentially sensitive information about users and their relationships with advertisers,
application developers, and data-mining researchers. Privacy is typically protected by anonymization, i.e., removing names, addresses, etc. They present a
framework for analyzing privacy and anonymity in social networks and develop a new re-identification algorithm targeting anonymized socialnetwork graphs.
To demonstrate its effectiveness on realworld networks, they show that a third of the users who can be verified to have accounts on both Twitter, a popular
microblogging service, and Flickr, an online photo-sharing site, can be re-identified in the anonymous Twitter graph with only a 12% error rate. Their
de-anonymization algorithm is based purely on the network topology, does not require creation of a large number of dummy 'sybil' nodes, is robust to noise
and all existing defenses, and works even when the overlap between the target network and the adversary's auxiliary information is small.
References:
Abstract:
This communication proposes a simple algorithm with high specificity and sensitivity for determining promoter regions in human genomic sequences.
This method relies upon non-redundant and experimentally verified promoter data sets form Eukaryotic Promoter Database (EPD) as training parameters.
This technique predicts and computationally satisfies the promoter regions in the NCBI annotated database around gene sequences.
References:
Abstract:
References:
Abstract:
References:
Abstract:
References:
Please send emails to qitian@cs.utsa.edu, or seminar co-organizers: Kay Robbins, Weining Zhang, Yufei Huang, Carola Wenk, Jianhua Ruan, and Qi Tian.