Matko Bošnjak&#x27;s research while affiliated with Ruđer Bošković Institute and other places

A large-scale evaluation of computational protein function prediction

[...]

January 2013

863 Reads

709 Citations

Nature Methods

[...]

Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Fifty-four methods representing the state of the art for protein function prediction were evaluated on a target set of 866 proteins from 11 organisms. Two findings stand out: (i) today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is considerable need for improvement of currently available tools. Supplementary information The online version of this article (doi:10.1038/nmeth.2340) contains supplementary material, which is available to authorized users.

Phyletic Profiling with Cliques of Orthologs Is Enhanced by Signatures of Paralogy Relationships

January 2013

223 Reads

40 Citations

PLOS Computational Biology

[...]

New microbial genomes are sequenced at a high pace, allowing insight into the genetics of not only cultured microbes, but a wide range of metagenomic collections such as the human microbiome. To understand the deluge of genomic data we face, computational approaches for gene functional annotation are invaluable. We introduce a novel model for computational annotation that refines two established concepts: annotation based on homology and annotation based on phyletic profiling. The phyletic profiling-based model that includes both inferred orthologs and paralogs-homologs separated by a speciation and a duplication event, respectively-provides more annotations at the same average Precision than the model that includes only inferred orthologs. For experimental validation, we selected 38 poorly annotated Escherichia coli genes for which the model assigned one of three GO terms with high confidence: involvement in DNA repair, protein translation, or cell wall synthesis. Results of antibiotic stress survival assays on E. coli knockout mutants showed high agreement with our model's estimates of accuracy: out of 38 predictions obtained at the reported Precision of 60%, we confirmed 25 predictions, indicating that our confidence estimates can be used to make informed decisions on experimental validation. Our work will contribute to making experimental validation of computational predictions more approachable, both in cost and time. Our predictions for 998 prokaryotic genomes include ∼400000 specific annotations with the estimated Precision of 90%, ∼19000 of which are highly specific-e.g. "penicillin binding," "tRNA aminoacylation for protein translation," or "pathogenesis"-and are freely available at http://gorbi.irb.hr/.

Dataset S3

January 2013

6 Reads

[...]

The settings files as given to the Clus-HMC-Ens algorithm. (ZIP)

Table S1

January 2013

8 Reads

[...]

Results of the experimental assays. (XLSX)

Text S1

January 2013

37 Reads

[...]

Supplementary figures. (PDF)

Memory biased random walk approach to synthetic clickstream generation

January 2012

748 Reads

3 Citations

Matko Bosnjak

Vinko Zlatic

[...]

Personalized recommender systems rely on personal usage data of each user in the system. However, privacy policies protecting users' rights prevent this data of being publicly available to a wider researcher audience. In this work, we propose a memory biased random walk model (MBRW) based on real clickstream graphs, as a generator of synthetic clickstreams that conform to statistical properties of the real clickstream data, while, at the same time, adhering to the privacy protection policies. We show that synthetic clickstreams can be used to learn recommender system models which achieve high recommender performance on real data and at the same time assuring that strong de-minimization guarantees are provided.

REVIGO Summarizes and Visualizes Long Lists of Gene Ontology Terms

July 2011

8,707 Reads

5,302 Citations

PLOS ONE

Outcomes of high-throughput biological experiments are typically interpreted by statistical testing for enriched gene functional categories defined by the Gene Ontology (GO). The resulting lists of GO terms may be large and highly redundant, and thus difficult to interpret. REVIGO is a Web server that summarizes long, unintelligible lists of GO terms by finding a representative subset of the terms using a simple clustering algorithm that relies on semantic similarity measures. Furthermore, REVIGO visualizes this non-redundant GO term set in multiple ways to assist in interpretation: multidimensional scaling and graph-based visualizations accurately render the subdivisions and the semantic relationships in the data, while treemaps and tag clouds are also offered as alternative views. REVIGO is freely available at http://revigo.irb.hr/.

ECML-PKDD 2011 Discovery Challenge overview

Article

January 2011

28 Reads

4 Citations

M. Bošnjak

Monika Žnidaršič

[...]

RSCTC’2010 Discovery Challenge: Mining DNA Microarray Data for Medical Diagnosis and Treatment

This year's Discovery Challenge was dedicated to solving of the video lecture recommendation problems, based on the data collected at VideoLectures.Net site. Challenge had two tasks: task 1 in which new-user/new-item recommendation problem was simulated, and the task 2 which was a simulation of the clickstream-based recommendation. In this overview we present challenge datasets, tasks, evaluation measure and we analyze solutions and results.

June 2010

203 Reads

29 Citations

[...]

RSCTC’2010 Discovery Challenge was a special event of Rough Sets and Current Trends in Computing conference. The challenge was organized in the form of an interactive on-line competition, at TunedIT.org platform, in days between Dec 1, 2009 and Feb 28, 2010. The task was related to feature selection in analysis of DNA microarray data and classification of samples for the purpose of medical diagnosis or treatment. Prizes were awarded to the best solutions. This paper describes organization of the competition and the winning solutions.

ECML-PKDD 2011 Discovery Challenge overview

... Our novel features provide a much larger coverage than existing methods while maintaining a high accuracy. 2) Preliminary experiments on standard WEBSPAM-UK2007 [5], ClueWeb-2009 [6], and ECML-PKDD-2011 [7] benchmark datasets demonstrate the effectiveness of the novel features on learning the classifier for detecting web spam. The rest of the paper is formed as follows: We review the previous research work in Section 2. In section 3, we describe the proposed groups of novel web spam features. ...
Reference:
Novel Features for Web Spam Detection

Citing Article
January 2011

M. Bošnjak

Monika Žnidaršič

[...]

Constructing recommender systems workflow templates in RapidMiner

... RapidMiner is often successfully used in the application of classification algorithms [7]. Furthermore, it provides a support for Meta learning for classification [8] and constructing of recommender system workflow templates [9]. In this paper, we focus on building recommender system for higher education students. ...
Reference:
Recommender System for Selection of the Right Study Program for Higher Education Students

Citing Article

Matko Bošnjak