D. Sculley's research while affiliated with Google Inc. and other places

What is this page?


This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

Publications (46)


Adversarial Nibbler: A Data-Centric Challenge for Improving the Safety of Text-to-Image Models
  • Preprint
  • File available

May 2023

·

53 Reads

Alicia Parrish

·

·

Jessica Quaye

·

[...]

·

The generative AI revolution in recent years has been spurred by an expansion in compute power and data quantity, which together enable extensive pre-training of powerful text-to-image (T2I) models. With their greater capabilities to generate realistic and creative content, these T2I models like DALL-E, MidJourney, Imagen or Stable Diffusion are reaching ever wider audiences. Any unsafe behaviors inherited from pretraining on uncurated internet-scraped datasets thus have the potential to cause wide-reaching harm, for example, through generated images which are violent, sexually explicit, or contain biased and derogatory stereotypes. Despite this risk of harm, we lack systematic and structured evaluation datasets to scrutinize model behavior, especially adversarial attacks that bypass existing safety filters. A typical bottleneck in safety evaluation is achieving a wide coverage of different types of challenging examples in the evaluation set, i.e., identifying 'unknown unknowns' or long-tail problems. To address this need, we introduce the Adversarial Nibbler challenge. The goal of this challenge is to crowdsource a diverse set of failure modes and reward challenge participants for successfully finding safety vulnerabilities in current state-of-the-art T2I models. Ultimately, we aim to provide greater awareness of these issues and assist developers in improving the future safety and reliability of generative AI models. Adversarial Nibbler is a data-centric challenge, part of the DataPerf challenge suite, organized and supported by Kaggle and MLCommons.

Download
Share

Figure 2: Top row: Examples of Plex's capabilities in vision: (left) Label uncertainty in ImageNet ReaL-H, demonstrating the ability to capture the inherent ambiguity of image labels assigned by humans. (middle) Active learning on ImageNet1K, displaying that Plex achieves significantly higher accuracy compared to a non-pretrained baseline and with fewer labels. (right) Zero-shot open set recognition on ImageNet1K vs Places365, showing that Plex can distinguish visually similar images without finetuning. Bottom row: Examples of Plex's capabilities in language: (left) Plex enables human+AI collaboration with selective prediction, where the model is able to defer a fraction of test examples to humans. Plex is able to better identify cases where it is likely to be wrong than the baseline, and thus achieves higher Collaborative AUC. (middle) Plex is robust while a baseline latches onto spurious features such as "destination" and "around the world". (right) Plex enables structured open set recognition. This provides nuanced clarifications, where Plex can distinguish cases where only part of a request is not supported.
Figure 4: The model and task pipeline. We experiment with approaches for pretraining; given the pretrained model's checkpoint, we experiment with finetuning and adaptation; finally, given the finetuned model's checkpoint, we evaluate downstream metrics.
Figure 11: Trading off model size with deep ensemble size. Overall, scaling the single model from Small to Large has a greater influence on the overall performance.
Figure 11 shows that naive ensembling consistently provides better performance as the number of ensemble members increases. However, this comes at significant computational cost. Scaling up the model size from S to B or from B to L leads to an ∼4x increase in compute, and larger single models tend to outperform the ensembles of 4 smaller models. This motivates the importance of efficient ensembles in Plex: BatchEnsemble adds minimal extra compute and outperforms the baseline without ensembles consistently in the ranking across tasks (Figure 7).
Figure 22: Accuracy over subpopulations for CIFAR-10 and CIFAR-100. For each setting, we display the 5th, 25th, 50th, 75th, and 95th percentile accuracy for ViT-Plex L among the subpopulations. See Figure 3 for comparison with no pretraining, which performs much worse.

+5

Plex: Towards Reliability using Pretrained Large Model Extensions

July 2022

·

150 Reads

·

8 Citations

A recent trend in artificial intelligence is the use of pretrained models for language and vision tasks, which have achieved extraordinary performance but also puzzling failures. Probing these models' abilities in diverse ways is therefore critical to the field. In this paper, we explore the reliability of models, where we define a reliable model as one that not only achieves strong predictive performance but also performs well consistently over many decision-making tasks involving uncertainty (e.g., selective prediction, open set recognition), robust generalization (e.g., accuracy and proper scoring rules such as log-likelihood on in- and out-of-distribution datasets), and adaptation (e.g., active learning, few-shot uncertainty). We devise 10 types of tasks over 40 datasets in order to evaluate different aspects of reliability on both vision and language domains. To improve reliability, we developed ViT-Plex and T5-Plex, pretrained large model extensions for vision and language modalities, respectively. Plex greatly improves the state-of-the-art across reliability tasks, and simplifies the traditional protocol as it improves the out-of-the-box performance and does not require designing scores or tuning the model for each task. We demonstrate scaling effects over model sizes up to 1B parameters and pretraining dataset sizes up to 4B examples. We also demonstrate Plex's capabilities on challenging tasks including zero-shot open set recognition, active learning, and uncertainty in conversational language understanding.


Model performance on Pfam-seed
Held-out test error rate as a function of max percent sequence identity of each test sequence with training sequences in the same Pfam family (Methods). Data are binned by percent sequence identity; the x axis labels describe the bin ranges. a, For the random split, ProtCNN makes significantly fewer errors than alignment-based methods for sequence identities in the range of 30–90% (two-sided McNemar test, 10–20%: 292 sequences, P = 0.000070; 20–30%: 3,628 sequences, P < 10 × 10⁻⁶; 30–40%: 9,537 sequences, P < 10 × 10⁻⁶; 40–50%: 16,798 sequences, P < 10 × 10⁻⁶; 50–60%: 22,662 sequences, P < 10 × 10⁻⁶; 60–70%: 28,277 sequences, P < 10 × 10⁻⁶; 70–80%: 40,221 sequences, P < 10 × 10⁻⁶; 80–90%: 4,429 sequences, P = 0.000244), while ProtENN is significantly better than alignment-based methods for sequence identities less than 90% (two-sided McNemar test, 10–20%: 292 sequences, P = 0.000070; 20–30%: 3,628 sequences, P < 10 × 10⁻⁶; 30–40%: 9,537 sequences, P < 10 × 10⁻⁶; 40–50%: 16,798 sequences, P < 10 × 10⁻⁶; 50–60%: 22,662 sequences, P < 10 × 10⁻⁶; 60–70%: 28,277 sequences, P < 10 × 10⁻⁶; 70–80%: 40,221 sequences, P < 10 × 10⁻⁶; 80–90%: 4,429 sequences, P = 0.000244). b, For the clustered split, where all sequence identities are ≤25%, ProtENN is significantly more accurate for all bins (two-sided McNemar test, 10–12%: 62 sequences, P = 0.006348; 12–14%: 426 sequences, P < 10 × 10⁻⁶; 14–16%: 1,058 sequences, P < 10 × 10⁻⁶; 16–18%: 2,516 sequences, P < 10 × 10⁻⁶; 18–20%: 4,419 sequences, P < 10 × 10⁻⁶; 20–22%: 6,013 sequences, P = 0.011417; 22–24%: 4,892 sequences, P = 0.000120; 24–25%: 1,902 sequences, P = 0.005172).
ProtCNN architecture
The central graph illustrates the input (red), embedding (yellow) and prediction (green) networks together with the residual network (ResNet) architecture (left), while the right illustrates the representation of sequence space learned by ProtCNN and exploited by ProtREP through simple nearest neighbor approaches. In this representation, each sequence corresponds to a single point, and sequences from the same family are typically closer to each other than to sequences from other families; GPCR, G-protein-coupled receptor; PCA, principal-component analysis.
Clustered split performance of ProtCNN, ProtENN, TPHMM and ProtREP, which uses the learned representation of sequence space
a, Error rates at classifying held-out test sequences are stratified by the number of training sequences per family. ProtREP has the same computational complexity as ProtCNN but is more accurate for small families. This suggests that the speed–accuracy tradeoff between ProtCNN and ProtENN can be ameliorated to yield classifiers that are both faster than ensembles and as accurate. b, The amino acid embedding learned by ProtCNN from unaligned sequence data reflects the overall structure of the BLOSUM62 matrix²⁹.
A combination of ProtENN and TPHMM improves performance on the remote homology task
a, Model performance comparison for held-out test sequences from the clustered split of Pfam-seed. A simple combination of the TPHMM and ProtENN models reduces the error rate by 38.6%, increasing accuracy from the ProtENN figure of 89.0% to 93.3%. By contrast, combining the BLAST and TPHMM models did not improve performance over the TPHMM performance of 88.1% accuracy. b, Illustration of the simple combination of the TPHMM and ProtENN models used to propose new additions to Pfam. Following these steps for the test sequences used in a that clear the confidence score thresholds results in an accuracy of 100%. c, The ProtENN-proposed increase in Pfam sequence regions in the context of recent Pfam releases.
Using deep learning to annotate the protein universe

June 2022

·

1,105 Reads

·

173 Citations

Nature Biotechnology

Understanding the relationship between amino acid sequence and protein function is a long-standing challenge with far-reaching scientific and translational implications. State-of-the-art alignment-based techniques cannot predict function for one-third of microbial protein sequences, hampering our ability to exploit data from diverse organisms. Here, we train deep learning models to accurately predict functional annotations for unaligned amino acid sequences across rigorous benchmark assessments built from the 17,929 families of the protein families database Pfam. The models infer known patterns of evolutionary substitutions and learn representations that accurately cluster sequences from unseen families. Combining deep models with existing methods significantly improves remote homology detection, suggesting that the deep models learn complementary information. This approach extends the coverage of Pfam by >9.5%, exceeding additions made over the last decade, and predicts function for 360 human reference proteome proteins with no previous Pfam annotation. These results suggest that deep learning models will be a core component of future protein annotation tools. A deep learning model predicts protein functional annotations for unaligned amino acid sequences.


Underspecification Presents Challenges for Credibility in Modern Machine Learning

January 2022

·

189 Reads

·

306 Citations

Journal of Machine Learning Research

Machine learning (ML) systems often exhibit unexpectedly poor behavior when they are deployed in real-world domains. We identify underspecification in ML pipelines as a key reason for these failures. An ML pipeline is the full procedure followed to train and validate a predictor. Such a pipeline is underspecified when it can return many distinct predictors with equivalently strong test performance. Underspecification is common in modern ML pipelines that primarily validate predictors on held-out data that follow the same distribution as the training data. Predictors returned by underspecified pipelines are often treated as equivalent based on their training domain performance, but we show here that such predictors can behave very differently in deployment domains. This ambiguity can lead to instability and poor model behavior in practice, and is a distinct failure mode from previously identified issues arising from structural mismatch between training and deployment domains. We provide evidence that underspecfication has substantive implications for practical ML pipelines, using examples from computer vision, medical imaging, natural language processing, clinical risk prediction based on electronic health records, and medical genomics. Our results show the need to explicitly account for underspecification in modeling pipelines that are intended for real-world deployment in any domain.



Uncertainty Baselines: Benchmarks for Uncertainty & Robustness in Deep Learning

June 2021

·

32 Reads

High-quality estimates of uncertainty and robustness are crucial for numerous real-world applications, especially for deep learning which underlies many deployed ML systems. The ability to compare techniques for improving these estimates is therefore very important for research and practice alike. Yet, competitive comparisons of methods are often lacking due to a range of reasons, including: compute availability for extensive tuning, incorporation of sufficiently many baselines, and concrete documentation for reproducibility. In this paper we introduce Uncertainty Baselines: high-quality implementations of standard and state-of-the-art deep learning methods on a variety of tasks. As of this writing, the collection spans 19 methods across 9 tasks, each with at least 5 metrics. Each baseline is a self-contained experiment pipeline with easily reusable and extendable components. Our goal is to provide immediate starting points for experimentation with new methods or applications. Additionally we provide model checkpoints, experiment outputs as Python notebooks, and leaderboards for comparing results. Code available at https://github.com/google/uncertainty-baselines.


Large-scale machine learning-based phenotyping significantly improves genomic discovery for optic nerve head morphology

June 2021

·

137 Reads

·

31 Citations

The American Journal of Human Genetics

Genome-wide association studies (GWASs) require accurate cohort phenotyping, but expert labeling can be costly, time intensive, and variable. Here, we develop a machine learning (ML) model to predict glaucomatous optic nerve head features from color fundus photographs. We used the model to predict vertical cup-to-disc ratio (VCDR), a diagnostic parameter and cardinal endophenotype for glaucoma, in 65,680 Europeans in the UK Biobank (UKB). A GWAS of ML-based VCDR identified 299 independent genome-wide significant (GWS; p ≤ 5 × 10⁻⁸) hits in 156 loci. The ML-based GWAS replicated 62 of 65 GWS loci from a recent VCDR GWAS in the UKB for which two ophthalmologists manually labeled images for 67,040 Europeans. The ML-based GWAS also identified 93 novel loci, significantly expanding our understanding of the genetic etiologies of glaucoma and VCDR. Pathway analyses support the biological significance of the novel hits to VCDR: select loci near genes involved in neuronal and synaptic biology or harboring variants are known to cause severe Mendelian ophthalmic disease. Finally, the ML-based GWAS results significantly improve polygenic prediction of VCDR and primary open-angle glaucoma in the independent EPIC-Norfolk cohort.


Fig 1. ML-based phenotyping concept and its application to VCDR.​ ​ a,​ "Model training" phase in which a phenotype prediction model is trained using expert-labelled data. ​ b,​ "Model application" phase in which the validated phenotype prediction model is applied to new, unlabelled data followed by genomic discovery. ​ c,​ Definition of vertical cup-to-disc ratio (VCDR) in a real fundus image. ​ d,​ Schematic of the multi-task ensemble model used in phenotype prediction. ​ e-h,​ Scatter plots of the ML-based VCDR vs expert-labelled VCDR values for the train (​ e​ ), tune (​ f​ ), test (​ g​ ), and UK Biobank (​ h​ ) datasets. Number of grades per image is shown in parentheses.
Fig 2. ML-based VCDR GWAS results and comparison to known associations. a, ​ Manhattan plot depicting ML-based VCDR-associated ​ P​ values from the BOLT-LMM analysis. There are 156 GWS (genome-wide significant) loci, representing 299 independent (R​ 2​ =0.1) GWS hits. For each locus, the closest gene is shown. Blue gene names and dots indicate loci also identified in the Craig ​ et al.​ 's study and red dots and black gene names indicate novel loci. The dashed redline denotes the GWS ​ P​ value, 5×10​ -8​ . ​ b,​ Venn diagram of loci overlap for three VCDR GWAS. ML-based GWAS replicates all 22 loci of the IGGC VCDR meta-analysis (Springelkamp ​ et al.​ ) and 62 of 65 loci identified by Craig ​ et al.​ , while discovering 92 novel loci. ​ c, ​ Effect sizes for the 73 GWS hits shared by the Craig ​ et al. and ML-based VCDR GWAS. The three Craig ​ et al.​ hits not included failed the ML-based GWAS QC (rs61952219 for low imputation quality and rs7039467 and rs146055611 for violating Hardy-Weinberg equilibrium). Blue and red dots denote the SNP being more significant in the ML-based and Craig ​ et al.​ GWAS, respectively. The banding in Craig ​ et al. ​ effect sizes is due to large effect sizes being reported in multiples of 0.01. The blue line is the best fit line and the shaded area shows the 95% confidence interval.
Fig 3. VCDR polygenic risk score performance metrics. ​ Pearson's correlations between measured VCDR values and predictions of the pruning and thresholding (P+T) (​ a​ ) and the Elastic Net models (​ b​ ) are shown for the PRS learned from ML-based and Craig ​ et al.​ hits. Error bars depict 95% confidence intervals. Numbers above bars are the observed Pearson's correlations. Indications of P value ranges: * ​ P ​ ≤ 0.05, ** ​ P ​ ≤ 0.01, *** ​ P ​ ≤ 0.001. The Craig ​ et al. P+T model uses 58 out of 76 hits. Measured VCDR values were obtained from adjudicated expert labeling of fundus images (UKB, ​ n​ =2,076) and scanning laser ophthalmoscopy (HRT) (EPIC-Norfolk, ​ n​ =5,868).
Fig 4. Relationship between glaucoma and VCDR. a,​ Glaucoma odds ratios for each ML-based VCDR bin vs. the bottom bin is shown. The fraction of individuals in each bin is shown (​ n​ =65,193). ​ b,​ Glaucoma odds ratios for different VCDR elastic net PRS bins vs. the bottom bin for individuals with a glaucoma phenotype not used in the GWAS or developing the PRS (​ n​ =98,151). The fractions are selected to match those from ​ a​ . ​ c,​ A histogram of ML-based glaucoma liability vs. ML-based VCDR (Pearson's correlation ​ R​ =0.91, ​ n​ =65,680, ​ P​ <1×10​ -300​ ). ​ d, LocusZoom for the strongest associated variant (rs12913832, ​ P​ =2.2×10​ -66​ ) in the ML-based glaucoma liability GWAS conditioned on the ML-based VCDR.
Fig 5. Primary open-angle glaucoma (POAG) prediction in the EPIC-Norfolk cohort. ​ Odds ratios for POAG prevalence by decile of VCDR PRS; reference is decile 1. Results are from logistic regression models adjusted for age and sex for ​ a​ , primary open-angle glaucoma (175 cases, 5,693 controls), ​ b​ ,​ ​ high-tension glaucoma (HTG; 98 cases, 5,693 controls), and ​ c​ , normal-tension glaucoma (NTG; 77 cases, 5,693 controls). Results are presented for the ML-based elastic net VCDR PRS (blue) and the Craig ​ et al​ . elastic net VCDR PRS (yellow). Note the y-axis log scale.
Large-scale machine learning-based phenotyping significantly improves genomic discovery for optic nerve head morphology

November 2020

·

151 Reads

Genome-wide association studies (GWAS) require accurate cohort phenotyping, but expert labeling can be costly, time-intensive, and variable. Here we develop a machine learning (ML) model to predict glaucomatous optic nerve head features from color fundus photographs. We used the model to predict vertical cup-to-disc ratio (VCDR), a diagnostic parameter and cardinal endophenotype for glaucoma, in 65,680 Europeans in the UK Biobank (UKB). A GWAS of ML-based VCDR identified 299 independent genome-wide significant (GWS; $P\leq5\times10^{-8}$) hits in 156 loci. The ML-based GWAS replicated 62 of 65 GWS loci from a recent VCDR GWAS in the UKB for which two ophthalmologists manually labeled images for 67,040 Europeans. The ML-based GWAS also identified 92 novel loci, significantly expanding our understanding of the genetic etiologies of glaucoma and VCDR. Pathway analyses support the biological significance of the novel hits to VCDR, with select loci near genes involved in neuronal and synaptic biology or known to cause severe Mendelian ophthalmic disease. Finally, the ML-based GWAS results significantly improve polygenic prediction of VCDR and primary open-angle glaucoma in the independent EPIC-Norfolk cohort.


Underspecification Presents Challenges for Credibility in Modern Machine Learning

November 2020

·

731 Reads

ML models often exhibit unexpectedly poor behavior when they are deployed in real-world domains. We identify underspecification as a key reason for these failures. An ML pipeline is underspecified when it can return many predictors with equivalently strong held-out performance in the training domain. Underspecification is common in modern ML pipelines, such as those based on deep learning. Predictors returned by underspecified pipelines are often treated as equivalent based on their training domain performance, but we show here that such predictors can behave very differently in deployment domains. This ambiguity can lead to instability and poor model behavior in practice, and is a distinct failure mode from previously identified issues arising from structural mismatch between training and deployment domains. We show that this problem appears in a wide variety of practical ML pipelines, using examples from computer vision, medical imaging, natural language processing, clinical risk prediction based on electronic health records, and medical genomics. Our results show the need to explicitly account for underspecification in modeling pipelines that are intended for real-world deployment in any domain.


Evaluating Prediction-Time Batch Normalization for Robustness under Covariate Shift

June 2020

·

117 Reads

Covariate shift has been shown to sharply degrade both predictive accuracy and the calibration of uncertainty estimates for deep learning models. This is worrying, because covariate shift is prevalent in a wide range of real world deployment settings. However, in this paper, we note that frequently there exists the potential to access small unlabeled batches of the shifted data just before prediction time. This interesting observation enables a simple but surprisingly effective method which we call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift. Using this one line code change, we achieve state-of-the-art on recent covariate shift benchmarks and an mCE of 60.28\% on the challenging ImageNet-C dataset; to our knowledge, this is the best result for any model that does not incorporate additional data augmentation or modification of the training pipeline. We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness (e.g. deep ensembles) and combining the two further improves performance. Our findings are supported by detailed measurements of the effect of this strategy on model behavior across rigorous ablations on various dataset modalities. However, the method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift, and is therefore worthy of additional study. We include links to the data in our figures to improve reproducibility, including a Python notebooks that can be run to easily modify our analysis at https://colab.research.google.com/drive/11N0wDZnMQQuLrRwRoumDCrhSaIhkqjof.


Citations (33)


... In the results above, we use conditioning information that perfectly specifies concepts underlying the data-generating process, i.e., h = z. In practice, however, instructions are underspecified and one can thus expect correlations between concepts in the conditioning information extracted from those instructions [96][97][98][99][100]. For example, images of a strawberry are often correlated with the color red (see Fig. 7(a)). ...

Reference:

Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space
Underspecification Presents Challenges for Credibility in Modern Machine Learning

Journal of Machine Learning Research

... To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/. example, Tran et al. [56] implemented a selective prediction task. This involved utilizing the number of decodes matching a given answer from self-consistency as a measure of uncertainty. ...

Plex: Towards Reliability using Pretrained Large Model Extensions

... In recent years, there has been increasing interest in using machine learning (ML) for a broad range of applications related to the functional prediction and design of enzymes [11,14,15]. In particular, ML models have emerged to classify protein sequences based on their function, which are reviewed here [16] with some more recent examples listed here [17,7,18,19,20,21,22]. Despite these advancements, there is no standard benchmark or dataset for evaluating computational models for enzyme function prediction. ...

Using deep learning to annotate the protein universe

Nature Biotechnology

... Nevertheless, as speech may be distorted, some researchers have studied alternatives. For instance, one vowel sound, /a/, was used in a keyword-spotting model, which triggers wake-up words [27] When interacting with humans, the use of sentences and continuous speech is closer to normal speech. However, non-verbal voice inputs also have advantages; because dysarthric speech has a slower rate of expression, non-verbal voice interactions may allow users to interact more quickly with the system. ...

A Voice-Activated Switch for Persons with Motor and Speech Impairments: Isolated-Vowel Spotting Using Neural Networks
  • Citing Conference Paper
  • August 2021

... Missingness often arises due to the difficulty, expense or invasiveness of ascertaining the target phenotype 7 . Examples of partially missing phenotypes from the UKB 8 include body composition phenotypes obtained from dual-energy X-ray absorptiometry (DEXA) scans 9 , neurological 10 and cardiac 11 structural features extracted from functional magnetic resonance imaging, optic morphology parameters extracted from retinal fundus images 12 and sleep-wake patterns extracted from accelerometry trackers 13 . Each of these phenotypes was ascertained, at least initially, in only a subset of the cohort. ...

Large-scale machine learning-based phenotyping significantly improves genomic discovery for optic nerve head morphology

The American Journal of Human Genetics

... With this information, trustworthy metrics that can be instrumented as AI sensors are reviewed in current state-of-the-art. For instance, a sensor for fairness can be instrumented to analyze raw input data as well as to characterize fairness in decision making after model deployment [15]. Notice that currently, there is a misalignment between regulatory (legal) and technical trustworthy requirements. ...

Fairness is not static: deeper understanding of long term fairness via simulation studies
  • Citing Conference Paper
  • January 2020

... While algorithms for bridging geographic domain gaps have been proposed in [18,41,86], they are restricted to road scenes with limited number of classes. A major hindrance has been the lack of suitable benchmark datasets for geographic adaptation, so several datasets have been recently proposed to address this issue [24,58,61,72]. Datasets based on dollar street images [61] highlight the geographic differences induced by income disparities between various countries, Ego4D [30] contains egocentric videos with actions from various geogra- stadium_baseball courthouse motel butte baseball_field water_tower creek skyscraper dam stadium_football amusement_park gas_station fire_station swamp cemetery canyon lighthouse marsh railroad_track fire_escape badlands boardwalk schoolhouse pond bridge dock crosswalk botanical_garden highway rock_arch office_building river runway arch forest_road wind_farm parking_lot ocean valley mansion plaza harbor bayou sky mountain building_facade airport_terminal coffee_shop tower bookstore hot_spring fountain aquarium coast ice_skating_rink orchard forest_path playground campsite construction_site ski_resort desert_sand ballroom art_gallery pasture hospital bar auditorium lobby television_studio restaurant basement alley windmill cockpit golf_course tree_farm neighborhood volcano corn_field train_railway train_station staircase shed courtyard yard patio ski_slope phies, while researchers in [58] design an adaptation dataset with images from YFCC-100M [26] to analyze geographic shift. ...

The Inclusive Images Competition
  • Citing Chapter
  • January 2020

... For modeling, we chose the residual neural network architecture, 33 which has been successfully applied on protein function annotation. 34 After optimization (Section 4, Figures S1-S3), the resulting model contained only 1 residual block with 512 filters (kernels) (Figure 1c). For model training, one-hot encoded enzyme sequences were used as input and OGT values as output, after which the model could explain~60% of the variance in the hold out data set (Pearson's r = 0.77, p value < 1eÀ16, Figure 1d). ...

Using Deep Learning to Annotate the Protein Universe

... To quantitatively evaluate our model's generalization performance, we measure average cosine similarity on different types of held-out spectrum data. MassFormer is compared with two deep learning methods, a fingerprint (FP) neural network model (adapted from ref. 27) and a Weisfeiler-Lehman (WLN) graph neural network model 40 (adapted from ref. 28), as well as with CFM 24 , a widely used probabilistic method for spectrum prediction that does not use deep neural networks (see the section 'Baseline models' for more details). The three deep learning models are trained on a portion of the National Institute of Standards and Technology 2020 Tandem Mass Spectrometry dataset 21 (NIST). ...

Rapid Prediction of Electron–Ionization Mass Spectrometry Using Neural Networks

ACS Central Science

... Acquisition functions such as upper confidence bound optimization [40], Thompson sampling [32], and expected improvement [20] have been developed and analyzed in the preceding decades. These methods have been applied to optimize functions observed in real systems including in the feedforward control of robots [42], hyperparameter tuning of machine learning models [21], and even the design of recipes for chocolate chip cookies [38]. This work uses approximate Bayesian optimization to find a feedforward rampdown trajectory that avoids disruptions. ...

Bayesian Optimization for a Better Dessert