Hussein Hazimeh
Google Inc. | Google

Doctor of Philosophy

About

Publications

2,801

Reads

422

Citations

Skills and Expertise

Combinatorial Optimization

Optimization Algorithms

Machine Learning

May 2019 - August 2019

Google

Google Research
United States

Position

Research Intern

Description

Developed the tree ensemble layer: https://arxiv.org/abs/2002.07772

May 2016 - August 2016

Amazon

Core Machine Learning Team
Seattle, United States

Position

Research Intern

August 2016 - June 2021

Massachusetts Institute of Technology

Field of study

Operations Research

Publications

Fast Best Subset Selection: Coordinate Descent and Local Combinatorial Optimization Algorithms

Article

Full-text available

Mar 2018

We consider the canonical $L_0$-regularized least squares problem (aka best subsets) which is generally perceived as a `gold-standard' for many sparse learning regimes. In spite of worst-case computational intractability results, recent work has shown that advances in mixed integer optimization can be used to obtain near-optimal solutions to this p...

Sparse regression at scale: branch-and-bound rooted in first-order optimization

Article

Oct 2021

We consider the least squares regression problem, penalized with a combination of the ℓ0 and squared ℓ2 penalty functions (a.k.a. ℓ0ℓ2 regularization). Recent work shows that the resulting estimators enjoy appealing statistical properties in many high-dimensional settings. However, exact computation of these estimators remains a major challenge. In...

Figure 1: Scatter plots of the Frechet Inception Distance (FID) versus...

Figure 3: Sensitivity plots for the scheduler applied to NSGAN, LSGAN,...

Figure 4: Domain adaptation (MNIST → MNIST-M) using DANN. [Left] Test...

Figure A.1: Plots of FID vs. optimality gap for NSGAN and LSGAN with...

Figure C.2: Change in leaning rate multiplier as the training...

Mind the (optimality) Gap: A Gap-Aware Learning Rate Scheduler for Adversarial Nets

Preprint

Full-text available

Jan 2023

Adversarial nets have proved to be powerful in various domains including generative modeling (GANs), transfer learning, and fairness. However, successfully training adversarial nets using first-order methods remains a major challenge. Typically, careful choices of the learning rates are needed to maintain the delicate balance between the competing...

Fast as CHITA: Neural Network Pruning with Combinatorial Optimization

Preprint

Full-text available

Feb 2023

The sheer size of modern neural networks makes model serving a serious computational challenge. A popular class of compression techniques overcomes this challenge by pruning or sparsifying the weights of pretrained networks. While useful, these techniques often face serious tradeoffs between computational requirements and compression quality. In th...

DSelect-k: Differentiable Selection in the Mixture of Experts with Applications to Multi-Task Learning

Article

Full-text available

Apr 2022

The Mixture-of-Experts (MoE) architecture is showing promising results in improving parameter sharing in multi-task learning (MTL) and in scaling high-capacity neural networks. State-of-the-art MoE models use a trainable "sparse gate" to select a subset of the experts for each input example. While conceptually appealing, existing sparse gates, such...

How to DP-fy ML: A Practical Guide to Machine Learning with Differential Privacy

Article

Jul 2023

Machine Learning (ML) models are ubiquitous in real-world applications and are a constant focus of research. Modern ML models have become more complex, deeper, and harder to reason about. At the same time, the community has started to realize the importance of protecting the privacy of the training data that goes into these models. Differential Pri...

Figure 4: COMET for 5 (non-powers of 2) experts.

Tess Loss (×10 −2 , the smaller the better) and number of experts per...

#FLOPs per-sample at inference time for COMET against benchmarks...

COMET: Learning Cardinality Constrained Mixture of Experts with Trees and Local Search

Preprint

Full-text available

Jun 2023

The sparse Mixture-of-Experts (Sparse-MoE) framework efficiently scales up model capacity in various domains, such as natural language processing and vision. Sparse-MoEs select a subset of the "experts" (thus, only a portion of the overall network) for each input sample using a sparse, trainable gate. Existing sparse gates are prone to convergence...

How to DP-fy ML: A Practical Guide to Machine Learning with Differential Privacy

Preprint

Full-text available

Mar 2023

ML models are ubiquitous in real world applications and are a constant focus of research. At the same time, the community has started to realize the importance of protecting the privacy of ML training data. Differential Privacy (DP) has become a gold standard for making formal statements about data anonymization. However, while some adoption of DP...

Grouped variable selection with discrete optimization: Computational and statistical perspectives

Article

Feb 2023

Figure 1. Example images for obfuscations. From top left to bottom...

Figure A.1. Examples for the training obfuscations.

Figure A.2. Examples for the hold-out obfuscations.

Figure D.3. Heatmap of top 1 accuracy for all the obfuscations for...

Figure D.6. Heatmap of the top 1 accuracy for all the obfuscations for...

Benchmarking Robustness to Adversarial Image Obfuscations

Preprint

Full-text available

Jan 2023

Automated content filtering and moderation is an important tool that allows online platforms to build striving user communities that facilitate cooperation and prevent abuse. Unfortunately, resourceful actors try to bypass automated filters in a bid to post content that violate platform policies and codes of conduct. To reach this goal, these malic...

Flexible Modeling and Multitask Learning using Differentiable Tree Ensembles

Conference Paper

Aug 2022

Flexible Modeling and Multitask Learning using Differentiable Tree Ensembles

Article

Aug 2022

Figure 1: Timing comparison on CPU training with tensor-based...

Test mean squared error performance of a single axis aligned and...

Tree ensemble sizes for soft trees, RF, and GRF.

Flexible Modeling and Multitask Learning using Differentiable Tree Ensembles

Preprint

Full-text available

May 2022

Decision tree ensembles are widely used and competitive learning models. Despite their success, popular toolkits for learning tree ensembles have limited modeling capabilities. For instance, these toolkits support a limited number of loss functions and are restricted to single task learning. We propose a flexible framework for learning tree ensembl...

L0Learn: A Scalable Package for Sparse Learning using L0 Regularization

Preprint

Full-text available

Feb 2022

We introduce L0Learn: an open-source package for sparse regression and classification using L0 regularization. L0Learn implements scalable, approximate algorithms, based on coordinate descent and local combinatorial optimization. The package is built using C++ and has a user-friendly R interface. Our experiments indicate that L0Learn can scale to p...

Sparse Learning using Discrete Optimization: Scalable Algorithms and Statistical Insights

Thesis

Full-text available

Jun 2021

Hussein Hazimeh

Sparsity is a central concept in interpretable machine learning and high-dimensional statistics. While sparse learning problems can be naturally modeled using discrete optimization, computational challenges have historically shifted the focus towards alternatives based on continuous optimization and heuristics. Recently, growing evidence suggests t...

DSelect-k: Differentiable Selection in the Mixture of Experts with Applications to Multi-Task Learning

Preprint

Full-text available

Jun 2021

The Mixture-of-experts (MoE) architecture is showing promising results in multi-task learning (MTL) and in scaling high-capacity neural networks. State-of-the-art MoE models use a trainable sparse gate to select a subset of the experts for each input example. While conceptually appealing, existing sparse gates, such as Top-k, are not smooth. The la...

Learning Sparse Classifiers: Continuous and Mixed Integer Optimization Perspectives

Article

Full-text available

Jun 2021

We consider a discrete optimization formulation for learning sparse classifiers, where the outcome depends upon a linear combination of a small subset of features. Recent work has shown that mixed integer programming (MIP) can be used to solve (to optimality) 0-regularized regression problems at scales much larger than what was conventionally consi...

Grouped Variable Selection with Discrete Optimization: Computational and Statistical Perspectives

Preprint

Full-text available

Apr 2021

We present a new algorithmic framework for grouped variable selection that is based on discrete mathematical optimization. While there exist several appealing approaches based on convex relaxations and nonconvex heuristics, we focus on optimal solutions for the $\ell_0$-regularized formulation, a problem that is relatively unexplored due to computa...

The Tree Ensemble Layer: Differentiability meets Conditional Computation

Conference Paper

Full-text available

Jul 2020

Neural networks and tree ensembles are state-of-the-art learners, each with its unique statistical and computational advantages. We aim to combine these advantages by introducing a new layer for neural networks, composed of an ensemble of differentiable decision trees (a.k.a. soft trees). While differentiable trees demonstrate promising results in...

Learning Hierarchical Interactions at Scale: A Convex Optimization Approach

Article

Full-text available

Jun 2020

In many learning settings, it is beneficial toaugment the main features with pairwise in-teractions. Such interaction models can beoften enhanced by performing variable selec-tion under the so-calledstrong hierarchycon-straint: an interaction is non-zero only if itsassociated main features are non-zero. Ex-isting convex optimization-based algorithm...

Sparse Regression at Scale: Branch-and-Bound rooted in First-Order Optimization

Preprint

Full-text available

Apr 2020

We consider the least squares regression problem, penalized with a combination of the $\ell_{0}$ and $\ell_{2}$ norms (a.k.a. $\ell_0 \ell_2$ regularization). Recent work presents strong evidence that the resulting $\ell_0$-based estimators can outperform popular sparse learning methods, under many important high-dimensional settings. However, exac...

The Tree Ensemble Layer: Differentiability meets Conditional Computation

Preprint

Full-text available

Feb 2020

Learning Sparse Classifiers: Continuous and Mixed Integer Optimization Perspectives

Preprint

Full-text available

Jan 2020

We consider a discrete optimization based approach for learning sparse classifiers, where the outcome depends upon a linear combination of a small subset of features. Recent work has shown that mixed integer programming (MIP) can be used to solve (to optimality) $\ell_0$-regularized problems at scales much larger than what was conventionally consid...

The Tree Ensemble Layer: Differentiability meets Conditional Computation

Article

Jan 2020

Learning Hierarchical Interactions at Scale: A Convex Optimization Approach

Preprint

Full-text available

Feb 2019

In many learning settings, it is beneficial to augment the main features with pairwise interactions. Such interaction models can be often enhanced by performing variable selection under the so-called strong hierarchy constraint: an interaction is non-zero only if its associated main features are non-zero. Existing convex optimization based algorith...

Fast Algorithms for Best Subset Selection

Poster

Full-text available

Dec 2018

Fast Algorithms for Best Subset Selection

Axiomatic Analysis of Smoothing Methods in Language Models for Pseudo-Relevance Feedback

Conference Paper

Full-text available

Sep 2015

Pseudo-Relevance Feedback (PRF) is an important general technique for improving retrieval effectiveness without requiring any user effort. Several state-of-the-art PRF models are based on the language modeling approach where a query language model is learned based on feedback documents. In all these models, feedback documents are represented with u...

Network

Sara van de Geer
ETH Zurich
Runze Li
Pennsylvania State University
Rob Tibshirani
Stanford University
Pietro Belotti
Politecnico di Milano
Y. Bengio
Université de Montréal

Tomoyuki Hiroyasu
Doshisha University
Emilio Carrizosa
Universidad de Sevilla
Dolores Romero Morales
Copenhagen Business School
Lev Utkin
Peter the Great Saint-Petersburg Polytechnic University
Yong-Sheng Chen
National Chiao Tung University

Hussein HazimehGoogle Inc. | Google

About

Publications

Network

Hussein Hazimeh
Google Inc. | Google