Brian Davison

Brian Davison
Lehigh University · Department of Computer Science and Engineering

Ph.D.

About

237
Publications
62,071
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
7,752
Citations
Introduction
Dr. Brian D. Davison is Professor and Chair of the Department of Computer Science and Engineering, Professor and Co-founder of the interdisciplinary Master's Program in Data Science, Co-founder of Lehigh's Center for Catastrophe Modeling and the Master's Program in Catastrophe Modeling and Resilience, member (former associate director) of the Institute for Data, Intelligent Systems and Computation (I-DISC), and was Founding Director of the undergraduate minor in data science.
Additional affiliations
June 2009 - present
Lehigh University
Position
  • Professor (Associate)
August 2001 - May 2009
Lehigh University
Position
  • Professor (Assistant)
September 1991 - August 2001
Rutgers, The State University of New Jersey
Position
  • Research Assistant
Education
June 1995 - October 2002
Rutgers, The State University of New Jersey
Field of study
  • Computer Science
September 1991 - May 1995
Rutgers, The State University of New Jersey
Field of study
  • Computer Science
September 1987 - May 1991
Bucknell University
Field of study
  • Computer Engineering

Publications

Publications (237)
Conference Paper
Full-text available
While unsupervised domain adaptation has been explored to leverage the knowledge from a labeled source domain to an unlabeled target domain, existing methods focus on the distribution alignment between two domains. However, how to better align source and target features is not well addressed. In this paper, we propose a deep feature registration (D...
Preprint
Full-text available
In this paper, we introduce a unified and generalist Biomedical Generative Pre-trained Transformer (BiomedGPT) model, which leverages self-supervision on large and diverse datasets to accept multi-modal inputs and perform a range of downstream tasks. Our experiments demonstrate that BiomedGPT delivers expansive and inclusive representations of biom...
Preprint
Full-text available
The goal of unbiased learning to rank~(ULTR) is to leverage implicit user feedback for optimizing learning-to-rank systems. Among existing solutions, automatic ULTR algorithms that jointly learn user bias models (\ie propensity models) with unbiased rankers have received a lot of attention due to their superior performance and low deployment cost i...
Article
The increasing availability of data search tools brings opportunities for non-expert users. Among these users, interdisciplinary researchers and data journalists represent a growing population whose work can lead to societal benefit. Through in-depth interviews, we examine what strategies and approaches researchers and journalists adopt to search o...
Preprint
Full-text available
Unbiased Learning to Rank~(ULTR) that learns to rank documents with biased user feedback data is a well-known challenge in information retrieval. Existing methods in unbiased learning to rank typically rely on click modeling or inverse propensity weighting~(IPW). Unfortunately, the search engines are faced with severe long-tail query distribution,...
Chapter
Full-text available
Critical infrastructure systems provide essential services for economic prosperity and a good quality of life. Over time, they have become increasingly interdependent on each other at multiple levels. Under ordinary conditions, these interconnections enhance the overall performance of the infrastructure and can accelerate the transition to “smart c...
Article
IP geolocation databases map IP addresses to their physical locations. They are used to determine the location of online users when their precise location is unavailable. These databases are vital for a number of online services, including search engine personalization, content delivery, local ads, and fraud detection. However, IP geolocation datab...
Preprint
Full-text available
A large amount of information is stored in data tables. Users can search for data tables using a keyword-based query. A table is composed primarily of data values that are organized in rows and columns providing implicit structural information. A table is usually accompanied by secondary information such as the caption, page title, etc., that form...
Article
IP Geolocation databases are widely used in online services to map end-user IP addresses to their geographical location. However, they use proprietary geolocation methods, and in some cases they have poor accuracy. We propose a systematic approach to use reverse DNS hostnames for geolocating IP addresses, with a focus on end-user IP addresses as op...
Preprint
Full-text available
Many deep neural networks are susceptible to minute perturbations of images that have been carefully crafted to cause misclassification. Ideally, a robust classifier would be immune to small variations in input images, and a number of defensive approaches have been created as a result. One method would be to discern a latent representation which co...
Article
Critical infrastructure systems are interdependent to ensure normal operations for supporting a national economy and social well-being. In the wake of a disaster, such interdependencies may introduce additional vulnerability and cause cascading failures. Therefore, understanding interdependencies and assessing their impact are essential to mitigate...
Chapter
Transmission electron microscopy (TEM) is one of the primary tools to show microstructural characterization of materials as well as film thickness. However, manual determination of film thickness from TEM images is time-consuming as well as subjective, especially when the films in question are very thin and the need for measurement precision is ver...
Article
Full-text available
Ranking models are the main components of information retrieval systems. Several approaches to ranking are based on traditional machine learning algorithms using a set of hand-crafted features. Recently, researchers have leveraged deep learning models in information retrieval. These models are trained end-to-end to extract features from the raw dat...
Conference Paper
Full-text available
Unsupervised domain adaptation leverages rich information from a labeled source domain to model an unlabeled target domain. Existing methods attempt to align the cross-domain distributions. However, the statistical representations of the alignment of the two domains are not well addressed. In this paper, we propose deep least squares alignment (DLS...
Preprint
Full-text available
Unsupervised domain adaptation leverages rich information from a labeled source domain to model an unlabeled target domain. Existing methods attempt to align the cross-domain distributions. However, the statistical representations of the alignment of the two domains are not well addressed. In this paper, we propose deep least squares alignment (DLS...
Conference Paper
Full-text available
The COVID-19 pandemic has been one of the biggest health crises in recent memory. According to leading scientists, face masks and maintaining six feet of social distancing are the most substantial protections to limit the virus's spread. Experimental data on face mask usage in the US is limited and has not been studied in scale. Thus, an understand...
Article
Manufacturers of TV sets have recently started adding social media features to their products. Some of these products display microblogging messages relevant to the TV show which the user is currently watching. However, such systems suffer from low precision and recall when they use the title of the show to search for relevant messages. Titles of s...
Conference Paper
Full-text available
Unsupervised Domain adaptation is an effective method in addressing the domain shift issue when transferring knowledge from an existing richly labeled domain to a new domain. Existing manifold-based methods either are based on traditional models or largely rely on Grassmannian manifold via minimizing differences of single covariance matrices of two...
Conference Paper
Full-text available
Domain adaptation aims to mitigate the domain gap when transferring knowledge from an existing labeled domain to a new domain. However, existing disentanglement-based methods do not fully consider separation between domain-invariant and domain-specific features, which means the domain-invariant features are not discriminative. The reconstructed fea...
Conference Paper
Full-text available
Unsupervised domain adaptation (UDA) focuses on transferring knowledge from a labeled source domain to an unlabeled target domain. However, existing domain adaptation methods try to handle various DA scenarios that are subject to imbalanced labels or large domain discrepancy datasets. In this paper, we propose a weighted pseudo labeling refinement...
Preprint
Full-text available
Transmission electron microscopy (TEM) is one of the primary tools to show microstructural characterization of materials as well as film thickness. However, manual determination of film thickness from TEM images is time-consuming as well as subjective, especially when the films in question are very thin and the need for measurement precision is ver...
Preprint
Full-text available
Domain adaptation aims to mitigate the domain gap when transferring knowledge from an existing labeled domain to a new domain. However, existing disentanglement-based methods do not fully consider separation between domain-invariant and domain-specific features, which means the domain-invariant features are not discriminative. The reconstructed fea...
Conference Paper
Full-text available
Domain adaptation (DA) mitigates the domain shift problem when transferring knowledge from one annotated domain to another similar but different unlabeled domain. However, existing models often utilize one of the ImageNet models as the backbone without exploring others, and fine-tuning or retraining the backbone ImageNet model is also time-consumin...
Chapter
Estimation of bone age from hand radiographs is essential to determine skeletal age in diagnosing endocrine disorders and depicting the growth status of children. However, existing automatic methods only apply their models to test images without considering the discrepancy between training samples and test samples, which will lead to a lower genera...
Article
Full-text available
Manually labeling data for training machine learning models is time-consuming and expensive. Therefore, it is often necessary to apply models built in one domain to a new domain. However, existing approaches do not evaluate the quality of intermediate features that are learned in the process of transferring from the source domain to the target doma...
Conference Paper
Full-text available
Domain adaptation aims to mitigate the domain shift problem when transferring knowledge from one domain into another similar but different domain. However, most existing works rely on extracting marginal features without considering class labels. Moreover, some methods name their model as so-called unsupervised domain adaptation while tuning the pa...
Preprint
Full-text available
Domain adaptation aims to mitigate the domain shift problem when transferring knowledge from one domain into another similar but different domain. However, most existing works rely on extracting marginal features without considering class labels. Moreover, some methods name their model as so-called unsupervised domain adaptation while tuning the pa...
Conference Paper
Full-text available
We describe the development, characteristics and availability of a test collection for the task of Web table retrieval, which uses a large-scale Web Table Corpora extracted from the Common Crawl. Since a Web table usually has rich context information such as the page title and surrounding paragraphs, we not only provide relevance judgments of query...
Preprint
Full-text available
Unsupervised Domain adaptation is an effective method in addressing the domain shift issue when transferring knowledge from an existing richly labeled domain to a new domain. Existing manifold-based methods either are based on traditional models or largely rely on Grassmannian manifold via minimizing differences of single covariance matrices of two...
Preprint
We describe the development, characteristics and availability of a test collection for the task of Web table retrieval, which uses a large-scale Web Table Corpora extracted from the Common Crawl. Since a Web table usually has rich context information such as the page title and surrounding paragraphs, we not only provide relevance judgments of query...
Preprint
Full-text available
Domain adaptation (DA) mitigates the domain shift problem when transferring knowledge from one annotated domain to another similar but different unlabeled domain. However, existing models often utilize one of the ImageNet models as the backbone without exploring others, and fine-tuning or retraining the backbone ImageNet model is also time-consumin...
Article
Full-text available
While scientific collaboration is critical for a scholar, some collaborators can be more significant than others, e.g., lifetime collaborators. It has been shown that lifetime collaborators are more influential on a scholar’s academic performance. However, little research has been done on investigating predicting such special relationships in acade...
Conference Paper
Full-text available
Estimation of bone age from hand radiographs is essential to determine skeletal age in diagnosing endocrine disorders and depicting the growth status of children. However, existing automatic methods only apply their models to test images without considering the discrepancy between training samples and test samples, which will lead to a lower genera...
Preprint
Full-text available
Estimation of bone age from hand radiographs is essential to determine skeletal age in diagnosing endocrine disorders and depicting the growth status of children. However, existing automatic methods only apply their models to test images without considering the discrepancy between training samples and test samples, which will lead to a lower genera...
Chapter
Domain adaptation has emerged as a crucial technique to address the problem of domain shift, which exists when applying an existing model to a new population of data. Adversarial learning has made impressive progress in learning a domain invariant representation via building bridges between two domains. However, existing adversarial learning method...
Preprint
Full-text available
Ranking models are the main components of information retrieval systems. Several approaches to ranking are based on traditional machine learning algorithms using a set of hand-crafted features. Recently, researchers have leveraged deep learning models in information retrieval. These models are trained end-to-end to extract features from the raw dat...
Article
Due to continuous population expansion and the threat of climate change, the past century has witnessed increasing occurrences of natural hazards, leading to significant global losses and requiring substantial restoration efforts. This issue challenges decision makers to act in a timely and effective manner to protect infrastructure systems from fu...
Conference Paper
Full-text available
Domain adaptation has emerged as a crucial technique to address the problem of domain shift, which exists when applying an existing model to a new population of data. Adversarial learning has made impressive progress in learning a domain invariant representation via building bridges between two domains. However, existing adversarial learning method...
Conference Paper
Transferring knowledge from an existing labeled domain to a new domain often suffers from domain shift in which performance degrades because of differences between the domains. Domain adaptation has been a prominent method to mitigate such a problem. There have been many pre-trained neural networks for feature extraction. However, little work discu...
Article
Full-text available
Recent years have seen the success of applying word embedding algorithms to natural language processing (NLP) tasks. Most word embedding algorithms only produce a single embedding per word. This makes the learned embeddings indiscriminative since many words are polysemous. Some prior work utilizes the context in which the word resides to learn mult...
Article
Full-text available
The article Learning class-specific word embeddings, written by Sicong Kuang and Brian D. Davison, was originally published electronically on the publisher’s Internet portal (currently SpringerLink) on 23 October 2019 with open access.
Preprint
Full-text available
Domain adaptation is one of the most crucial techniques to mitigate the domain shift problem, which exists when transferring knowledge from an abundant labeled sourced domain to a target domain with few or no labels. Partial domain adaptation addresses the scenario when target categories are only a subset of source categories. In this paper, to ena...
Article
Full-text available
Natural hazards have the potential to cause catastrophic damage and significant socioeconomic loss. The actual damage and loss observed in the recent decades has shown an increasing trend. As a result, disaster managers need to take a growing responsibility to proactively protect their communities by developing efficient management strategies. A nu...
Preprint
Full-text available
Extreme multi-label text classification (XMTC) is a task for tagging a given text with the most relevant labels from an extremely large label set. We propose a novel deep learning method called APLC-XLNet. Our approach fine-tunes the recently released generalized autoregressive pretrained model (XLNet) to learn a dense representation for the input...
Article
Infrastructure interdependencies have been widely recognized, especially in the postdisaster restoration process. It is essential to develop models to simulate interdependencies and quantify their impact on the functionality recovery of infrastructures. This study presents a generalized simulator to investigate the impact of different types of inte...
Preprint
Full-text available
Pretrained contextualized language models such as BERT have achieved impressive results on various natural language processing benchmarks. Benefiting from multiple pretraining tasks and large scale training corpora, pretrained models can capture complex syntactic word relations. In this paper, we use the deep contextualized language model BERT for...
Chapter
A search engine’s ability to retrieve desirable datasets is important for data sharing and reuse. Existing dataset search engines typically rely on matching queries to dataset descriptions. However, a user may not have enough prior knowledge to write a query using terms that match with description text. We propose a novel schema label generation mo...
Article
Full-text available
Deep neural networks are widely used in image classification problems. However, little work addresses how features from different deep neural networks affect the domain adaptation problem. Existing methods often extract deep features from one ImageNet model, without exploring other neural networks. In this paper, we investigate how different ImageN...
Preprint
Full-text available
Deep neural networks are widely used in image classification problems. However, little work addresses how features from different deep neural networks affect the domain adaptation problem. Existing methods often extract deep features from one ImageNet model, without exploring other neural networks. In this paper, we investigate how different ImageN...
Preprint
Full-text available
A search engine's ability to retrieve desirable datasets is important for data sharing and reuse. Existing dataset search engines typically rely on matching queries to dataset descriptions. However, a user may not have enough prior knowledge to write a query using terms that match with description text.We propose a novel schema label generation mod...
Conference Paper
Full-text available
As infrastructure systems are highly interdependent, one needs to analyze their disaster resilience and develop restoration plans with the consideration of infrastructure interdependencies. This study presents two probabilistic models for infrastructure decision-makers to simulate the recovery of interdependent systems in a post-disaster scenario....
Preprint
Deep neural networks have been widely used in computer vision. There are several well trained deep neural networks for the ImageNet classification challenge, which has played a significant role in image recognition. However, little work has explored pre-trained neural networks for image recognition in domain adaption. In this paper, we are the firs...
Preprint
Learning the underlying geometry from the source domain to the target domain is essential to solving classification problems using domain adaptation. However, existing approaches are constrained to one manifold, and generated samples do not follow the geometry of the manifold, that means the sampling results are sub-optimal. In this paper, we defin...
Preprint
Full-text available
IP Geolocation databases are widely used in online services to map end user IP addresses to their geographical locations. However, they use proprietary geolocation methods and in some cases they have poor accuracy. We propose a systematic approach to use publicly accessible reverse DNS hostnames for geolocating IP addresses. Our method is designed...
Article
Full-text available
The transportation infrastructure plays an important role in supporting the national economy and ensuring the well-being of its citizenry. Extreme events (including both natural hazards and man-made disasters) have caused terrible physical damages to the transportation infrastructure, long-term socioeconomic impacts, and psychological damages. Ther...
Conference Paper
Impoverished descriptions and convoluted schema labels are common challenges in data-centric tasks such as schema matching and data linking, especially when datasets can span domains. To address these issues, we consider the task of schema label generation. Typically, schema labels are created by dataset providers and are useful for users to unders...
Article
Full-text available
Twitter is a popular source for the monitoring of healthcare information and public disease. However, there exists much noise in the tweets. Even though appropriate keywords appear in the tweets, they do not guarantee the identification of a truly health-related tweet. Thus, the traditional keyword-based classification task is largely ineffective....
Article
It has been shown that top-k retrieval quality can be considerably improved by taking not only relevance but also diversity into account. However, currently proposed diversification approaches have not put much attention on practical usability in large-scale settings, such as modern web search systems. In this work, we make two contributions toward...
Article
Search satisfaction is defined as the fulfillment of a user's information need. Characterizing and predicting the satisfaction of search engine users is vital for improving ranking models, increasing user retention rates, and growing market share. This article provides an overview of the research areas related to user satisfaction. First, we show t...
Article
Crowdsourcing is emerging as a powerful paradigm to help perform a wide range of tedious tasks in various enterprise applications. As such applications become more complex, crowdsourcing systems often require the collaboration of several experts connected through professional/social networks and organized in various teams. For instance, a well-know...
Conference Paper
IP geolocation databases map IP addresses to their geographical locations. These databases are important for several applications such as local search engine relevance, credit card fraud protection, geotargetted advertising, and online content delivery. While they are the most popular method of geolocation, they can have low accuracy at the city le...
Article
Topic modeling has emerged as a popular learning technique not only in mining text representations, but also in modeling authors’ interests and influence, as well as predicting linkage among documents or authors. However, few existing topic models distinguish and make use of the prior knowledge in regard to the different importance of documents (au...
Article
Analysis of content streams gathered from social networking sites such as Twitter has several applications ranging from content search and recommendation, news detection to business analytics. However, processing large amounts of data generated on these sites in real-time poses a difficult challenge. To cope with the data deluge, analytics companie...
Article
Today’s web personalization technologies use approaches like user categorization, configuration, and customization but do not fully support individualized requirements. As a significant portion of our social and working interactions are migrating to the web, we can expect an increase in these kinds of minority requirements. Browser-side transcoding...
Conference Paper
Full-text available
Sponsored search is the primary business for today's commercial search engines. Accurate prediction of the Click-Through Rate (CTR) for ads is key to displaying relevant ads to users. In this paper, we systematically study the two kinds of contextual factors influencing the CTR: 1) In micro factors, we focus on the factors for mainline ads, includi...
Conference Paper
Full-text available
In modern commercial search engines, the pay-per-click (PPC) advertising model is widely used in sponsored search. The search engines try to deliver ads which can produce greater click yields (the total number of clicks for the list of ads per impression). Therefore, predicting user clicks plays a critical role in sponsored search. The current ad-d...
Conference Paper
We propose a novel probabilistic topic model that jointly models authors, documents, cited authors, and venues simultaneously in one integrated framework, as compared to previous work which embeds fewer components. This model is designed for three typical applications in academic network analysis: the problems of expert ranking, cited author predic...
Conference Paper
Collaborative tagging systems are now deployed extensively to help users share and organize resources. Tag prediction and recommendation can simplify and streamline the user experience, and by modeling user preferences, predictive accuracy can be significantly improved. However, previous methods typically model user behavior based only on a log of...
Conference Paper
Users of popular services like Twitter and Facebook are often simultaneously overwhelmed with the amount of information delivered via their social connections and miss out on much content that they might have liked to see, even though it was distributed outside of their social circle. Both issues serve as difficulties to the users and drawbacks to...
Conference Paper
As early as the late nineteenth century, scientists began research in author attribution, mostly by identifying the writing styles of authors. Following research over centuries has repeatedly demonstrated that people tend to have distinguishable writing styles. Today we not only have more authors, but we also have all different kinds of publication...
Conference Paper
One of the principal goals for most research scientists is to publish. There are many thousands of publications: journals, conferences, workshops, and more, covering different topics and requiring different writing formats. However, when a researcher that is new to a certain research domain finishes the work, it is sometimes difficult to find a pro...
Article
Full-text available
As online social media further integrates deeper into our lives, we spend more time consuming social update streams that come from our online connections. Although social update streams provide a tremendous opportunity for us to access information on-the-fly, we often complain about its relevance. Some of us are flooded with a steady stream of info...
Article
Full-text available
A principal goal for most research scientists is to publish. There are different kinds of publications covering different topics and requiring different writing formats. While authors tend to have unique personal writing styles, no work has been carried out to find out whether publication venues are distinguishable by their writing styles. Our work...
Article
Purpose This work aims to investigate the sensitivity of ranking performance with respect to the topic distribution of queries selected for ranking evaluation. Design/methodology/approach The authors reweight queries used in two TREC tasks to make them match three real background topic distributions, and show that the performance rankings of retri...
Conference Paper
With hundreds of millions of participants, social media services have become commonplace. Unlike a traditional social network service, a microblogging network like Twitter is a hybrid network, combining aspects of both social networks and information networks. Understanding the structure of such hybrid networks and predicting new links are importan...
Conference Paper
Hierarchical classification has been shown to have superior performance than flat classification. It is typically performed on hierarchies created by and for humans rather than for classification performance. As a result, classification based on such hierarchies often yields suboptimal results. In this paper, we propose a novel genetic algorithm-ba...
Conference Paper
Full-text available
Text corpora with documents from a range of time epochs are natural and ubiquitous in many fields, such as research papers, newspaper articles and a variety of types of recently emerged social media. People not only would like to know what kind of topics can be found from these data sources but also wish to understand the temporal dynamics of these...
Article
Collaborative tagging systems are now deployed extensivelyto help users share and organize resources.Tag prediction and recommendation systems generallymodel user behavior as research has shown that accuracycan be significantly improved by modeling users’preferences. However, these preferences are usuallytreated as constant over time, neglecting th...
Conference Paper
Each year many ACM SIG communities will recognize an outstanding researcher through an award in honor of his or her profound impact and numerous research contributions. This work is the first to investigate an automated mechanism to help in selecting future award winners. We approach the problem as a researchers' expertise ranking problem, and prop...
Conference Paper
Full-text available
Unlike a traditional social network service, a microblogging network like Twitter is a hybrid network, combining aspects of both social networks and information networks. Understanding the structure of such hybrid networks and to predict new links are important for many tasks such as friend recommendation, community detection, and network growth mo...

Network

Cited By