ArticlePDF Available

The BigChaos Solution to the Netflix Grand Prize

January 2009

January 2009

Authors:

Andreas Töscher

mindfex

Michael Jahrer

Commendo research (part of Opera Solutions)

The team BellKor’s Pragmatic Chaos is a combined team of BellKor, Pragmatic Theory and BigChaos. BellKor consists of Robert Bell, Yehuda Koren and Chris Volinsky. The members of Pragmatic Theory

This figure shows the structure of a BGBDT. Each cell represents a simple decision tree. The tree on the left trains on the raw data. The second tree trains on the residual error of the first; the third tree on the residuals of the second and so on. Thus a colored row forms a chain of gradient boosted decision trees. Each colored row represents different training examples, which are drawn with replacement from the original training samples (bagging). So we train multiple chains of gradient boosted decision trees in parallel, whereas each chain uses its own training set.

…

How many results are really needed? With 18 results we breach the 10% (RMSE 0.8563) barrier. Within these results there are 11 nonlinear probe blends and 7 unblended predictors. An ordered list of these predictors can be found in Appendix C.

…

Figures - uploaded by Michael Jahrer

Content may be subject to copyright.

Content uploaded by Michael Jahrer

Content may be subject to copyright.

A preview of the PDF is not available

RouteLLM: Learning to Route LLMs with Preference Data

Preprint

Jun 2024

Large language models (LLMs) exhibit impressive capabilities across a wide range of tasks, yet the choice of which model to use often involves a trade-off between performance and cost. More powerful models, though effective, come with higher expenses, while less capable models are more cost-effective. To address this dilemma, we propose several efficient router models that dynamically select between a stronger and a weaker LLM during inference, aiming to optimize the balance between cost and response quality. We develop a training framework for these routers leveraging human preference data and data augmentation techniques to enhance performance. Our evaluation on widely-recognized benchmarks shows that our approach significantly reduces costs-by over 2 times in certain cases-without compromising the quality of responses. Interestingly, our router models also demonstrate significant transfer learning capabilities, maintaining their performance even when the strong and weak models are changed at test time. This highlights the potential of these routers to provide a cost-effective yet high-performance solution for deploying LLMs.

Landslide susceptibility mapping using ensemble machine learning methods: a case study in Lombardy, Northern Italy

Article

Full-text available

Apr 2024

This study compares the performance of ensemble machine learning methods stacking, blending, and soft voting for Landslide susceptibility mapping (LSM) in a highly affected Northern Italy region, Lombardy. We first created a spatial database based on open data ensuring the accessibility to relevant information for landslide-influencing factors, historical landslide records, and areas with a very low probability of landslide occurrence called ‘No Landslide Zone’, an innovative concept introduced in this study. Then, open-source software was employed for developing five Machine Learning classifiers (Bagging, Random Forests, AdaBoost, Gradient Tree Boosting, and Neural Networks) which were tested at a basin scale by implementing different combinations of training and testing schemes using three use cases. The three classifiers with the highest generalization performance (Random Forests, AdaBoost, and Neural Networks) were selected and combined by ensemble methods. The soft voting showed the highest performance among them. The best model to generate the LSM for the Lombardy region was a Neural Network model trained using data from three basins, achieving an accuracy of 0.93 in Lombardy. The LSM indicates that 37% of Lombardy is in the highest landslide susceptibility categories. Our findings highlight the importance of openness in advancing LSM not only by enhancing the reproducibility and transparency of our methodology but also by promoting knowledge-sharing within the scientific community.

Thaw Slump Susceptibility Mapping Based on Sample Optimization and Ensemble Learning Techniques in Qinghai-Tibet Railway Corridor

Article

Full-text available

Jan 2024

Thaw slump susceptibility mapping (TSSM) of Qinghai-Tibet Railway corridor (QTRC) is the prerequisite and basis for disaster assessment and prevention of permafrost projects. The objective of this study is to construct ensemble learning models based on single classifier models to generate the TSSM of the QTRC, compare and verify the performance of the models, and further explore the relationship between the high susceptibility area and environmental factors of the QTRC. The collinearity analysis was carried out by selecting 14 thaw slump conditioning factors (TSCFs). We used the balance bagging method for sample optimization, and the data set was divided into 70% training set and 30% verification set. Convolutional neural network (CNN), multilayer perceptron (MLP), support vector regression (SVR), random forest (RF) single classifiers were selected to construct blending and stacking ensemble learning models for the TSSM. The results showed that there was no collinearity among the 14 TSCFS. The comparison of model performance revealed that all models had good performance, but the constructed stacking and blending ensemble learning models had stable performance and high prediction accuracy for TSSM. The stacking ensemble learning model had the best effect, and the area under curve (AUC) value of receiver operating characteristic (ROC) curve reached 0.9607. It showed that the generated TSSM of QTRC based on stacking ensemble learning model had the highest reliability. The QTRC has local areas with high thaw slump susceptibility, mainly concentrated in the permafrost areas with high altitude, high slope, adjacent faults, sparse vegetation, ice and snow and the more cumulative precipitation.

Improving the Accuracy of Freight Mode Choice Models: A Case Study Using the 2017 CFS PUF Data Set and Ensemble Learning Techniques

Article

Full-text available

Nov 2023
EXPERT SYST APPL

The US Census Bureau has collected two rounds of experimental data from the Commodity Flow Survey, providing shipment-level characteristics of nationwide commodity movements, published in 2012 (i.e., Public Use Microdata) and in 2017 (i.e., Public Use File). With this information, data-driven methods have become increasingly valuable for understanding detailed patterns in freight logistics. In this study, we used the 2017 Commodity Flow Survey Public Use File data set to explore building a high-performance freight mode choice model, considering three main improvements: (1) constructing local models for each separate commodity/industry category; (2) extracting useful geographical features, particularly the derived distance of each freight mode between origin/destination zones; and (3) applying additional ensemble learning methods such as stacking or voting to combine results from local and global models for improved performance. The proposed method achieved over 92% accuracy without incorporating external information, outperforming most previously proposed models by a margin of 10%. Furthermore, SHAP (Shapely Additive Explanations) values were computed to explain the outputs and major patterns obtained from the proposed model. The model framework could enhance the performance and interpretability of existing freight mode choice models.

Cross-Sensory EEG Emotion Recognition with Filter Bank Riemannian Feature and Adversarial Domain Adaptation

Article

Full-text available

Sep 2023
BSRCCS

Emotion recognition is crucial in understanding human affective states with various applications. Electroencephalography (EEG)—a non-invasive neuroimaging technique that captures brain activity—has gained attention in emotion recognition. However, existing EEG-based emotion recognition systems are limited to specific sensory modalities, hindering their applicability. Our study innovates EEG emotion recognition, offering a comprehensive framework for overcoming sensory-focused limits and cross-sensory challenges. We collected cross-sensory emotion EEG data using multimodal emotion simulations (three sensory modalities: audio/visual/audio-visual with two emotion states: pleasure or unpleasure). The proposed framework—filter bank adversarial domain adaptation Riemann method (FBADR)—leverages filter bank techniques and Riemannian tangent space methods for feature extraction from cross-sensory EEG data. Compared with Riemannian methods, filter bank and adversarial domain adaptation could improve average accuracy by 13.68% and 8.36%, respectively. Comparative analysis of classification results proved that the proposed FBADR framework achieved a state-of-the-art cross-sensory emotion recognition performance and reached an average accuracy of 89.01% ± 5.06%. Moreover, the robustness of the proposed methods could ensure high cross-sensory recognition performance under a signal-to-noise ratio (SNR) ≥ 1 dB. Overall, our study contributes to the EEG-based emotion recognition field by providing a comprehensive framework that overcomes limitations of sensory-oriented approaches and successfully tackles the difficulties of cross-sensory situations.

Informal employment from migration shocks

Article

Full-text available

Jun 2023

We propose a new approach to detect and quantify informal employment resulting from irregular migration shocks. Focusing on a largely informal sector, agriculture , and on the exogenous variation from the Arab Spring wave on southern Italian coasts, we use machine-learning techniques to document abnormal increases in reported (vs. predicted) labor productivity on vineyards hit by the shock. Mis-reporting is largely heterogeneous across farms depending e.g. on size and grape quality. The shock resulted in a 6% increase in informal employment, equivalent to one undeclared worker for every three farms on average and 23,000 workers in total over 2011-2012. Misreporting causes significant increases in farm profits through lower labor costs, while having no impact on grape sales, prices, or wages of formal workers. JEL: F22, J61, J43, J46, C53

EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models

Preprint

Full-text available

Jul 2023

While the general machine learning (ML) community has benefited from public datasets, tasks, and models, the progress of ML in healthcare has been hampered by a lack of such shared assets. The success of foundation models creates new challenges for healthcare ML by requiring access to shared pretrained models to validate performance benefits. We help address these challenges through three contributions. First, we publish a new dataset, EHRSHOT, containing de-identified structured data from the electronic health records (EHRs) of 6,712 patients from Stanford Medicine. Unlike MIMIC-III/IV and other popular EHR datasets, EHRSHOT is longitudinal and not restricted to ICU/ED patients. Second, we publish the weights of a 141M parameter clinical foundation model pretrained on the structured EHR data of 2.57M patients. We are one of the first to fully release such a model for coded EHR data; in contrast, most prior models released for clinical data (e.g. GatorTron, ClinicalBERT) only work with unstructured text and cannot process the rich, structured data within an EHR. We provide an end-to-end pipeline for the community to validate and build upon its performance. Third, we define 15 few-shot clinical prediction tasks, enabling evaluation of foundation models on benefits such as sample efficiency and task adaption. The code to reproduce our results, as well as the model and dataset (via a research data use agreement), are available at our Github repo here: https://github.com/som-shahlab/ehrshot-benchmark

Natural language processing: The integration of a new methodological paradigm into sociology

Article

Full-text available

Apr 2023

Natural language processing (NLP) methods are designed to automatically process and analyze large amounts of textual data. The integration of this new-generation toolbox into sociology faces many challenges. NLP was institutionalized outside of sociology, while the expertise of sociology has been based on its own methods of research. Another challenge is epistemological: it is related to the validity of digital data and the different viewpoints associated with predictive and causal approaches. In our paper, we discuss the challenges and opportunities of the use of NLP in sociology, offer some potential solutions to the concerns and provide meaningful and diverse examples of its sociological application, most of which are related to research on Eastern European societies. The focus will be on the use of NLP in quantitative text analysis. Solutions are provided concerning how sociological knowledge can be incorporated into the new methods and how the new analytical tools can be evaluated against the principles of traditional quantitative methodology.

Advancing Automation of Design Decisions in Recommender System Pipelines

Conference Paper

Sep 2023

Tobias Vente

Investigating Action-Space Generalization in Reinforcement Learning for Recommendation Systems

Conference Paper

Apr 2023

Improved neighborhood-based algorithms for large-scale recommender systems

Article

Full-text available

Aug 2008

Neighborhood-based algorithms are frequently used mod-ules of recommender systems. Usually, the choice of the similarity measure used for evaluation of neighborhood re-lationships is crucial for the success of such approaches. In this article we propose a way to calculate similarities by for-mulating a regression problem which enables us to extract the similarities from the data in a problem-specific way. An-other popular approach for recommender systems is regular-ized matrix factorization (RMF). We present an algorithm – neighborhood-aware matrix factorization – which efficiently includes neighborhood information in a RMF model. This leads to increased prediction accuracy. The proposed meth-ods are tested on the Netflix dataset.

Stochastic Gradient Boosting

Article

Full-text available

Feb 2002
COMPUT STAT DATA AN

Jerome H. Friedman

Gradient boosting constructs additive regression models by sequentially fitting a simple parameterized function (base learner) to current “pseudo”-residuals by least squares at each iteration. The pseudo-residuals are the gradient of the loss functional being minimized, with respect to the model values at each training data point evaluated at the current step. It is shown that both the approximation accuracy and execution speed of gradient boosting can be substantially improved by incorporating randomization into the procedure. Specifically, at each iteration a subsample of the training data is drawn at random (without replacement) from the full training data set. This randomly selected subsample is then used in place of the full sample to fit the base learner and compute the model update for the current iteration. This randomized approach also increases robustness against overcapacity of the base learner.

Greedy Function Approximation: A Gradient Boosting Machine

Article

Oct 2001
ANN STAT

Jerome H. Friedman

Function estimation/approximation is viewed from the perspective of numerical optimization iti function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest-descent minimization. A general gradient descent "boosting" paradigm is developed for additive expansions based on any fitting criterion. Specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification. Special enhancements are derived for the particular case where the individual additive components are regression trees, and tools for interpreting such "TreeBoost" models are presented. Gradient boosting of regression trees produces competitives highly robust, interpretable procedures for both regression and classification, especially appropriate for mining less than clean data. Connections between this approach and the boosting methods of Freund and Shapire and Friedman, Hastie and Tibshirani are discussed.

Putting the collaborator back into collaborative filtering

Article

Aug 2008

Gavin Potter

Most of the published approaches to collaborative filtering and recommender systems concentrate on mathematical approaches for identifying user / item preferences. This paper demonstrates that by considering the psychological decision making processes that are being undertaken by the users of the system it is possible to achieve a significant improvement in results. This approach is applied to the Netflix dataset and it is demonstrated that it is possible to achieve a score better than the Cinematch score set at the beginning of the Netflix competition without even considering individual preferences for individual movies. The result has important implications for both the design and the analysis of the data from collaborative filtering systems.

The BellKor solution to the Netflix Grand Prize

Article

Sep 2009

Yehuda Koren

Collaborative filtering via ensembles of matrix factorizations

Article

Jan 2007

Mingrui Wu

We present a Matrix Factorization (MF) based approach for the Netflix Prize competition. Currently MF based algo-rithms are popular and have proved successful for collab-orative filtering tasks. For the Netflix Prize competition, we adopt three different types of MF algorithms: regular-ized MF, maximum margin MF and non-negative MF. Fur-thermore, for each MF algorithm, instead of selecting the optimal parameters, we combine the results obtained with several parameters. With this method, we achieve a perfor-mance that is more than 6% better than the Netflix's own system.

The BellKor 2008 solution to the Netflix Prize

Article

Jan 2008

Improving regularized singular value decomposition for collaborative filtering

Article

Jan 2007

Arkadiusz Paterek

A key part of a recommender system is a collaborative filter-ing algorithm predicting users' preferences for items. In this paper we describe different efficient collaborative filtering techniques and a framework for combining them to obtain a good prediction. The methods described in this paper are the most im-portant parts of a solution predicting users' preferences for movies with error rate 7.04% better on the Netflix Prize dataset than the reference algorithm Netflix Cinematch. The set of predictors used includes algorithms suggested by Netflix Prize contestants: regularized singular value de-composition of data with missing values, K-means, postpro-cessing SVD with KNN. We propose extending the set of predictors with the following methods: addition of biases to the regularized SVD, postprocessing SVD with kernel ridge regression, using a separate linear model for each movie, and using methods similar to the regularized SVD, but with fewer parameters. All predictors and selected 2-way interactions between them are combined using linear regression on a holdout set.

Factorization meets the neighborhood: A multifaceted collaborative filtering model

Conference Paper

Aug 2008

Yehuda Koren

Recommender systems provide users with personalized suggestions for products or services. These systems often rely on Collaborating Filtering (CF), where past transactions are analyzed in order to establish connections between users and products. The two more successful approaches to CF are latent factor models, which directly profile both users and products, and neighborhood models, which analyze similarities between products or users. In this work we introduce some innovations to both approaches. The factor and neighborhood models can now be smoothly merged, thereby building a more accurate combined model. Further accuracy improvements are achieved by extending the models to exploit both explicit and implicit feedback by the users. The methods are tested on the Netflix data. Results are better than those previously published on that dataset. In addition, we suggest a new evaluation metric, which highlights the differences among methods, based on their performance at a top-K recommendation task.

Research Track Paper Modeling Relationships at Multiple Scales to Improve Accuracy of Large Recommender Systems

Conference Paper

Aug 2007

The collaborative filtering approach to recommender system s pre- dicts user preferences for products or services by learning past user- item relationships. In this work, we propose novel algorithms for predicting user ratings of items by integrating complementary mod- els that focus on patterns at different scales. At a local sca le, we use a neighborhood-based technique that infers ratings from observed ratings by similar users or of similar items. Unlike previou s local approaches, our method is based on a formal model that accounts for interactions within the neighborhood, leading to improved esti- mation quality. At a higher, regional, scale, we use SVD-like ma- trix factorization for recovering the major structural pat terns in the user-item rating matrix. Unlike previous approaches that require imputations in order to fill in the unknown matrix entries, ou r new iterative algorithm avoids imputation. Because the models involve estimation of millions, or even billions, of parameters, sh rinkage of estimated values to account for sampling variability proves crucial to prevent overfitting. Both the local and the regional appro aches, and in particular their combination through a unifying model, com- pare favorably with other approaches and deliver substantially bet- ter results than the commercial Netflix Cinematch recommend er system on a large publicly available data set.

The BigChaos Solution to the Netflix Grand Prize

Abstract and Figures

Recommended publications

The Pragmatic Theory solution to the Netflix Grand Prize

Improved neighborhood-based algorithms for large-scale recommender systems

Combining predictions for accurate recommender systems

Collaborative Filtering with Temporal Dynamics

Collaborative filtering with temporal dynamics