ArticlePDF Available

The BigChaos Solution to the Netflix Grand Prize

Authors:
  • Commendo research (part of Opera Solutions)
A preview of the PDF is not available
... Matrix factorization Inspired by matrix factorization models [19,23] in recommendation systems to capture the low-rank structure of user-item interactions, we leverage this approach for training on preference data. The key is to uncover a hidden scoring function s : M × Q → R. The score s(M w , q) should represent the quality of the model M w 's answer to the query q, i.e. if a model M w is better than M l on a query q, then s(M w , q) > s(M l , q). ...
Preprint
Large language models (LLMs) exhibit impressive capabilities across a wide range of tasks, yet the choice of which model to use often involves a trade-off between performance and cost. More powerful models, though effective, come with higher expenses, while less capable models are more cost-effective. To address this dilemma, we propose several efficient router models that dynamically select between a stronger and a weaker LLM during inference, aiming to optimize the balance between cost and response quality. We develop a training framework for these routers leveraging human preference data and data augmentation techniques to enhance performance. Our evaluation on widely-recognized benchmarks shows that our approach significantly reduces costs-by over 2 times in certain cases-without compromising the quality of responses. Interestingly, our router models also demonstrate significant transfer learning capabilities, maintaining their performance even when the strong and weak models are changed at test time. This highlights the potential of these routers to provide a cost-effective yet high-performance solution for deploying LLMs.
... Ensemble Blending is a variation of the stacking ensemble originally introduced in the Netflix competition (Töscher, Jahrer, and Bell 2009). The flowchart of the blending ensemble model is shown in Figure 9. ...
Article
Full-text available
This study compares the performance of ensemble machine learning methods stacking, blending, and soft voting for Landslide susceptibility mapping (LSM) in a highly affected Northern Italy region, Lombardy. We first created a spatial database based on open data ensuring the accessibility to relevant information for landslide-influencing factors, historical landslide records, and areas with a very low probability of landslide occurrence called ‘No Landslide Zone’, an innovative concept introduced in this study. Then, open-source software was employed for developing five Machine Learning classifiers (Bagging, Random Forests, AdaBoost, Gradient Tree Boosting, and Neural Networks) which were tested at a basin scale by implementing different combinations of training and testing schemes using three use cases. The three classifiers with the highest generalization performance (Random Forests, AdaBoost, and Neural Networks) were selected and combined by ensemble methods. The soft voting showed the highest performance among them. The best model to generate the LSM for the Lombardy region was a Neural Network model trained using data from three basins, achieving an accuracy of 0.93 in Lombardy. The LSM indicates that 37% of Lombardy is in the highest landslide susceptibility categories. Our findings highlight the importance of openness in advancing LSM not only by enhancing the reproducibility and transparency of our methodology but also by promoting knowledge-sharing within the scientific community.
... The specific framework is shown in Fig. 7. Fig. 7. Stacking ensemble learning model for TSSM. Blending is another form of ensemble learning technique that is derived from stacking [43], the only difference between the two models is that the blending model uses a retained (verified) set from a training set to make predictions. Simply put, predictions are made only for the retained data set, and the retained data set and predictions are used to build the secondlevel model. ...
Article
Full-text available
Thaw slump susceptibility mapping (TSSM) of Qinghai-Tibet Railway corridor (QTRC) is the prerequisite and basis for disaster assessment and prevention of permafrost projects. The objective of this study is to construct ensemble learning models based on single classifier models to generate the TSSM of the QTRC, compare and verify the performance of the models, and further explore the relationship between the high susceptibility area and environmental factors of the QTRC. The collinearity analysis was carried out by selecting 14 thaw slump conditioning factors (TSCFs). We used the balance bagging method for sample optimization, and the data set was divided into 70% training set and 30% verification set. Convolutional neural network (CNN), multilayer perceptron (MLP), support vector regression (SVR), random forest (RF) single classifiers were selected to construct blending and stacking ensemble learning models for the TSSM. The results showed that there was no collinearity among the 14 TSCFS. The comparison of model performance revealed that all models had good performance, but the constructed stacking and blending ensemble learning models had stable performance and high prediction accuracy for TSSM. The stacking ensemble learning model had the best effect, and the area under curve (AUC) value of receiver operating characteristic (ROC) curve reached 0.9607. It showed that the generated TSSM of QTRC based on stacking ensemble learning model had the highest reliability. The QTRC has local areas with high thaw slump susceptibility, mainly concentrated in the permafrost areas with high altitude, high slope, adjacent faults, sparse vegetation, ice and snow and the more cumulative precipitation.
... The winning team and other top-performing teams gained valuable insights into developing high-performance machine learning models. According to descriptions provided by the two top teams (Koren, 2009;Töscher et al., 2009), these breakthroughs were mainly attributed to three significant aspects: discovering new features underlying the data; employing different types of base learners; and blending predictions from different algorithms through ensemble methods such as gradient-boosted DT. Following the Netflix Prize, numerous competitions have been conducted over the past decade on platforms such as Kaggle (Bojer & Meldgaard, 2021) for various data sets and tasks. ...
Article
Full-text available
The US Census Bureau has collected two rounds of experimental data from the Commodity Flow Survey, providing shipment-level characteristics of nationwide commodity movements, published in 2012 (i.e., Public Use Microdata) and in 2017 (i.e., Public Use File). With this information, data-driven methods have become increasingly valuable for understanding detailed patterns in freight logistics. In this study, we used the 2017 Commodity Flow Survey Public Use File data set to explore building a high-performance freight mode choice model, considering three main improvements: (1) constructing local models for each separate commodity/industry category; (2) extracting useful geographical features, particularly the derived distance of each freight mode between origin/destination zones; and (3) applying additional ensemble learning methods such as stacking or voting to combine results from local and global models for improved performance. The proposed method achieved over 92% accuracy without incorporating external information, outperforming most previously proposed models by a margin of 10%. Furthermore, SHAP (Shapely Additive Explanations) values were computed to explain the outputs and major patterns obtained from the proposed model. The model framework could enhance the performance and interpretability of existing freight mode choice models.
... Therefore, the transformational relationship between the six SVM outputs and the final classification result cannot be adjusted using the training set directly. Inspired by the blending of stacked methods for ensemble learning [59], we introduced a validation set for blending the transformative relationship between the six SVM outputs into the final classification result. ...
Article
Full-text available
Emotion recognition is crucial in understanding human affective states with various applications. Electroencephalography (EEG)—a non-invasive neuroimaging technique that captures brain activity—has gained attention in emotion recognition. However, existing EEG-based emotion recognition systems are limited to specific sensory modalities, hindering their applicability. Our study innovates EEG emotion recognition, offering a comprehensive framework for overcoming sensory-focused limits and cross-sensory challenges. We collected cross-sensory emotion EEG data using multimodal emotion simulations (three sensory modalities: audio/visual/audio-visual with two emotion states: pleasure or unpleasure). The proposed framework—filter bank adversarial domain adaptation Riemann method (FBADR)—leverages filter bank techniques and Riemannian tangent space methods for feature extraction from cross-sensory EEG data. Compared with Riemannian methods, filter bank and adversarial domain adaptation could improve average accuracy by 13.68% and 8.36%, respectively. Comparative analysis of classification results proved that the proposed FBADR framework achieved a state-of-the-art cross-sensory emotion recognition performance and reached an average accuracy of 89.01% ± 5.06%. Moreover, the robustness of the proposed methods could ensure high cross-sensory recognition performance under a signal-to-noise ratio (SNR) ≥ 1 dB. Overall, our study contributes to the EEG-based emotion recognition field by providing a comprehensive framework that overcomes limitations of sensory-oriented approaches and successfully tackles the difficulties of cross-sensory situations.
... Second, we add to a small but rapidly growing literature on machine learning for causal inference. This is the first study to show how ensemble methods can be used to detect data misreporting from observed statistics; prior work has focused on the use of similar Super Learning methods for demand estimation (Bajari et al., 2015), detection of cyber attacks (Rabbani et al., 2021), and prediction of users' movie rating on Netflix (Toescher et al., 2009). As a by-product, we also make a theoretical contribution by proposing a novel, alternative method to the bootstrap to compute confidence intervals for causal forests estimates (see . ...
Article
Full-text available
We propose a new approach to detect and quantify informal employment resulting from irregular migration shocks. Focusing on a largely informal sector, agriculture , and on the exogenous variation from the Arab Spring wave on southern Italian coasts, we use machine-learning techniques to document abnormal increases in reported (vs. predicted) labor productivity on vineyards hit by the shock. Mis-reporting is largely heterogeneous across farms depending e.g. on size and grape quality. The shock resulted in a 6% increase in informal employment, equivalent to one undeclared worker for every three farms on average and 23,000 workers in total over 2011-2012. Misreporting causes significant increases in farm profits through lower labor costs, while having no impact on grape sales, prices, or wages of formal workers. JEL: F22, J61, J43, J46, C53
... Open datasets, code, and models have been essential in advancing machine learning (ML) over the past decade [34,46,19]. Though the benefits of open code and data are well known [40,27], there is currently a dearth of publicly available datasets and pretrained models for electronic health records (EHRs), which makes conducting reproducible research challenging [38,23]. ...
Preprint
Full-text available
While the general machine learning (ML) community has benefited from public datasets, tasks, and models, the progress of ML in healthcare has been hampered by a lack of such shared assets. The success of foundation models creates new challenges for healthcare ML by requiring access to shared pretrained models to validate performance benefits. We help address these challenges through three contributions. First, we publish a new dataset, EHRSHOT, containing de-identified structured data from the electronic health records (EHRs) of 6,712 patients from Stanford Medicine. Unlike MIMIC-III/IV and other popular EHR datasets, EHRSHOT is longitudinal and not restricted to ICU/ED patients. Second, we publish the weights of a 141M parameter clinical foundation model pretrained on the structured EHR data of 2.57M patients. We are one of the first to fully release such a model for coded EHR data; in contrast, most prior models released for clinical data (e.g. GatorTron, ClinicalBERT) only work with unstructured text and cannot process the rich, structured data within an EHR. We provide an end-to-end pipeline for the community to validate and build upon its performance. Third, we define 15 few-shot clinical prediction tasks, enabling evaluation of foundation models on benefits such as sample efficiency and task adaption. The code to reproduce our results, as well as the model and dataset (via a research data use agreement), are available at our Github repo here: https://github.com/som-shahlab/ehrshot-benchmark
... In this contest, the goal was to predict the evaluation of movies. In the winning model, one of the main explanatory variables was 'if there is a number in the title of the movie' (Töscher et al., 2009). NLP applications selected the best-performing predictive models based on their predictive accuracy: the focus was on the model's performance, and how the model worked did not matter. ...
Article
Full-text available
Natural language processing (NLP) methods are designed to automatically process and analyze large amounts of textual data. The integration of this new-generation toolbox into sociology faces many challenges. NLP was institutionalized outside of sociology, while the expertise of sociology has been based on its own methods of research. Another challenge is epistemological: it is related to the validity of digital data and the different viewpoints associated with predictive and causal approaches. In our paper, we discuss the challenges and opportunities of the use of NLP in sociology, offer some potential solutions to the concerns and provide meaningful and diverse examples of its sociological application, most of which are related to research on Eastern European societies. The focus will be on the use of NLP in quantitative text analysis. Solutions are provided concerning how sociological knowledge can be incorporated into the new methods and how the new analytical tools can be evaluated against the principles of traditional quantitative methodology.
Article
Full-text available
Neighborhood-based algorithms are frequently used mod-ules of recommender systems. Usually, the choice of the similarity measure used for evaluation of neighborhood re-lationships is crucial for the success of such approaches. In this article we propose a way to calculate similarities by for-mulating a regression problem which enables us to extract the similarities from the data in a problem-specific way. An-other popular approach for recommender systems is regular-ized matrix factorization (RMF). We present an algorithm – neighborhood-aware matrix factorization – which efficiently includes neighborhood information in a RMF model. This leads to increased prediction accuracy. The proposed meth-ods are tested on the Netflix dataset.
Article
Full-text available
Gradient boosting constructs additive regression models by sequentially fitting a simple parameterized function (base learner) to current “pseudo”-residuals by least squares at each iteration. The pseudo-residuals are the gradient of the loss functional being minimized, with respect to the model values at each training data point evaluated at the current step. It is shown that both the approximation accuracy and execution speed of gradient boosting can be substantially improved by incorporating randomization into the procedure. Specifically, at each iteration a subsample of the training data is drawn at random (without replacement) from the full training data set. This randomly selected subsample is then used in place of the full sample to fit the base learner and compute the model update for the current iteration. This randomized approach also increases robustness against overcapacity of the base learner.
Article
Function estimation/approximation is viewed from the perspective of numerical optimization iti function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest-descent minimization. A general gradient descent "boosting" paradigm is developed for additive expansions based on any fitting criterion. Specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification. Special enhancements are derived for the particular case where the individual additive components are regression trees, and tools for interpreting such "TreeBoost" models are presented. Gradient boosting of regression trees produces competitives highly robust, interpretable procedures for both regression and classification, especially appropriate for mining less than clean data. Connections between this approach and the boosting methods of Freund and Shapire and Friedman, Hastie and Tibshirani are discussed.
Article
Most of the published approaches to collaborative filtering and recommender systems concentrate on mathematical approaches for identifying user / item preferences. This paper demonstrates that by considering the psychological decision making processes that are being undertaken by the users of the system it is possible to achieve a significant improvement in results. This approach is applied to the Netflix dataset and it is demonstrated that it is possible to achieve a score better than the Cinematch score set at the beginning of the Netflix competition without even considering individual preferences for individual movies. The result has important implications for both the design and the analysis of the data from collaborative filtering systems.
Article
We present a Matrix Factorization (MF) based approach for the Netflix Prize competition. Currently MF based algo-rithms are popular and have proved successful for collab-orative filtering tasks. For the Netflix Prize competition, we adopt three different types of MF algorithms: regular-ized MF, maximum margin MF and non-negative MF. Fur-thermore, for each MF algorithm, instead of selecting the optimal parameters, we combine the results obtained with several parameters. With this method, we achieve a perfor-mance that is more than 6% better than the Netflix's own system.
Article
A key part of a recommender system is a collaborative filter-ing algorithm predicting users' preferences for items. In this paper we describe different efficient collaborative filtering techniques and a framework for combining them to obtain a good prediction. The methods described in this paper are the most im-portant parts of a solution predicting users' preferences for movies with error rate 7.04% better on the Netflix Prize dataset than the reference algorithm Netflix Cinematch. The set of predictors used includes algorithms suggested by Netflix Prize contestants: regularized singular value de-composition of data with missing values, K-means, postpro-cessing SVD with KNN. We propose extending the set of predictors with the following methods: addition of biases to the regularized SVD, postprocessing SVD with kernel ridge regression, using a separate linear model for each movie, and using methods similar to the regularized SVD, but with fewer parameters. All predictors and selected 2-way interactions between them are combined using linear regression on a holdout set.
Conference Paper
Recommender systems provide users with personalized suggestions for products or services. These systems often rely on Collaborating Filtering (CF), where past transactions are analyzed in order to establish connections between users and products. The two more successful approaches to CF are latent factor models, which directly profile both users and products, and neighborhood models, which analyze similarities between products or users. In this work we introduce some innovations to both approaches. The factor and neighborhood models can now be smoothly merged, thereby building a more accurate combined model. Further accuracy improvements are achieved by extending the models to exploit both explicit and implicit feedback by the users. The methods are tested on the Netflix data. Results are better than those previously published on that dataset. In addition, we suggest a new evaluation metric, which highlights the differences among methods, based on their performance at a top-K recommendation task.
Conference Paper
The collaborative filtering approach to recommender system s pre- dicts user preferences for products or services by learning past user- item relationships. In this work, we propose novel algorithms for predicting user ratings of items by integrating complementary mod- els that focus on patterns at different scales. At a local sca le, we use a neighborhood-based technique that infers ratings from observed ratings by similar users or of similar items. Unlike previou s local approaches, our method is based on a formal model that accounts for interactions within the neighborhood, leading to improved esti- mation quality. At a higher, regional, scale, we use SVD-like ma- trix factorization for recovering the major structural pat terns in the user-item rating matrix. Unlike previous approaches that require imputations in order to fill in the unknown matrix entries, ou r new iterative algorithm avoids imputation. Because the models involve estimation of millions, or even billions, of parameters, sh rinkage of estimated values to account for sampling variability proves crucial to prevent overfitting. Both the local and the regional appro aches, and in particular their combination through a unifying model, com- pare favorably with other approaches and deliver substantially bet- ter results than the commercial Netflix Cinematch recommend er system on a large publicly available data set.