Sean Moran's research works | Jpmorgan Chase & Co., NY and other places

What is this page?

This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

Estimating class separability of text embeddings with persistent homology

Conference Paper

Full-text available

Jun 2024

This paper introduces an unsupervised method to estimate the class separability of text datasets from a topological point of view. Using persistent homology, we demonstrate how tracking the evolution of embedding manifolds during training can inform about class sep-arability. More specifically, we show how this technique can be applied to detect wh...

Model-Agnostic Utility-Preserving Biometric Information Anonymization

Preprint

Full-text available

May 2024

The recent rapid advancements in both sensing and machine learning technologies have given rise to the universal collection and utilization of people's biometrics, such as fingerprints, voices, retina/facial scans, or gait/motion/gestures data, enabling a wide range of applications including authentication, health monitoring, or much more sophistic...

Model-Agonstic Utility-Preserving Biometric Information Anonymization

Our utility-preserving biometric information anonymization transforms...

Varying set purity t. Higher t leads to better attribute-of-interest...

Varying weight w. a–d Under rp=1%\documentclass[12pt]{minimal}...

Varying set size g. With larger set size, the level of mixture...

Model-Agnostic Utility-Preserving Biometric Information Anonymization

Article

Full-text available

May 2024

The recent rapid advancements in both sensing and machine learning technologies have given rise to the universal collection and utilization of people’s biometrics, such as fingerprints, voices, retina/facial scans, or gait/motion/gestures data, enabling a wide range of applications including authentication, health monitoring, or much more sophistic...

Mixing Gradients in Neural Networks as a Strategy to Enhance Privacy in Federated Learning

Conference Paper

Jan 2024

DeepClean: Machine Unlearning on the Cheap by Resetting Privacy Sensitive Weights using the Fisher Diagonal

Preprint

Full-text available

Nov 2023

Machine learning models trained on sensitive or private data can inadvertently memorize and leak that information. Machine unlearning seeks to retroactively remove such details from model weights to protect privacy. We contribute a lightweight unlearning algorithm that leverages the Fisher Information Matrix (FIM) for selective forgetting. Prior wo...

A Baseline Generative Probabilistic Model for Weakly Supervised Learning

Chapter

Sep 2023

Finding relevant and high-quality datasets to train machine learning models is a major bottleneck for practitioners. Furthermore, to address ambitious real-world use-cases there is usually the requirement that the data come labelled with high-quality annotations that can facilitate the training of a supervised model. Manually labelling data with hi...

Estimating Class Separability of Datasets Using Persistent Homology with Application to LLM Fine-Tuning

Preprint

Full-text available

May 2023

This paper proposes a method to estimate the class separability of an unlabeled text dataset by inspecting the topological characteristics of sentence-transformer embeddings of the text. Experiments conducted involve both binary and multi-class cases, with balanced and imbalanced scenarios. The results demonstrate a clear correlation and a better c...

Code Librarian: A Software Package Recommendation System

Conference Paper

May 2023

Fig. 1: Flowchart of the traditional preprocessing steps used for text...

Table captions should be placed above the tables.

Spam-T5: Benchmarking Large Language Models for Few-Shot Email Spam Detection

Preprint

Full-text available

Apr 2023

This paper investigates the effectiveness of large language models (LLMs) in email spam detection by comparing prominent models from three distinct families: BERT-like, Sentence Transformers, and Seq2Seq. Additionally, we examine well-established machine learning techniques for spam detection, such as Na\"ive Bayes and LightGBM, as baseline methods...

Figure 1. The covariance heatmaps of the three labelling matrices (Λ)....

Figure 2. Jointplots between the two factors from the FA model. The top...

Figure 3. Comparison of classification performance of Snorkel and...

Dataset Statistics. λ is the labelling function. Absent, shows the...

A Benchmark Generative Probabilistic Model for Weak Supervised Learning

Preprint

Full-text available

Mar 2023

CV4Code: Sourcecode Understanding via Visual Code Representations

Chapter

Feb 2023

We present CV4Code1, a compact and effective computer vision method for sourcecode understanding. Our method leverages the contextual and the structural information available from the code snippet by treating each snippet as a two-dimensional image, which naturally encodes the context and retains the underlying structural information through an exp...

Figure 1: Illustration of a gating module with binary decision into the...

Figure 2: Illustration of layer-pruning gating modules in ResNet.

Figure 5: Comparison between our scheme and related methods in...

Figure 6: Layer opening ratio during training under different λ values....

A New Baseline for GreenAI: Finding the Optimal Sub-Network via Layer and Channel Pruning

Preprint

Full-text available

Feb 2023

The concept of Green AI has been gaining attention within the deep learning community given the recent trend of ever larger and more complex neural network models. Some large models have billions of parameters causing the training time to take up to hundreds of GPU/TPU-days. The estimated energy consumption can be comparable to the annual total ene...

Fig. 3: Featurization of endpoint specifications

API-Spector: an API-to-API Specification Recommendation Engine

Preprint

Full-text available

Dec 2022

When designing a new API for a large project, developers need to make smart design choices so that their code base can grow sustainably. To ensure that new API components are well designed, developers can learn from existing API components. However, the lack of standardized method for comparing API designs makes this learning process time-consuming...

Senatus: a fast and accurate code-to-code recommendation engine

Conference Paper

Oct 2022

Ledgit: A Service to Diagnose Illicit Addresses on Blockchain using Multi-modal Unsupervised Learning

Conference Paper

Oct 2022

Code Librarian: A Software Package Recommendation System

Preprint

Oct 2022

The use of packaged libraries can significantly shorten the software development cycle by improving the quality and readability of code. In this paper, we present a recommendation engine called Librarian for open source libraries. A candidate library package is recommended for a given context if: 1) it has been frequently used with the imported lib...

Utility-Preserving Biometric Information Anonymization

Chapter

Sep 2022

The use of biometrics such as fingerprints, voices, and images are becoming increasingly more ubiquitous through people’s daily lives, in applications ranging from authentication, identification, to much more sophisticated analytics, thanks to the recent rapid advances in both the sensing hardware technologies and machine learning techniques. While...

Topical: Learning Repository Embeddings from Source Code using Attention

Preprint

Aug 2022

Machine learning on source code (MLOnCode) promises to transform how software is delivered. By mining the context and relationship between software artefacts, MLOnCode augments the software developers capabilities with code auto-generation, code recommendation, code auto-tagging and other data-driven enhancements. For many of these tasks a script l...

CV4Code: Sourcecode Understanding via Visual Code Representations

Preprint

May 2022

We present CV4Code, a compact and effective computer vision method for sourcecode understanding. Our method leverages the contextual and the structural information available from the code snippet by treating each snippet as a two-dimensional image, which naturally encodes the context and retains the underlying structural information through an expl...

Enhancing Privacy against Inversion Attacks in Federated Learning by using Mixing Gradients Strategies

Preprint

Apr 2022

Federated learning reduces the risk of information leakage, but remains vulnerable to attacks. We investigate how several neural network design decisions can defend against gradients inversion attacks. We show that overlapping gradients provides numerical resistance to gradient inversion on the highly vulnerable dense layer. Specifically, we propos...

ST-FL: style transfer preprocessing in federated learning for COVID-19 segmentation

Conference Paper

Apr 2022

ST-FL: Style Transfer Preprocessing in Federated Learning for COVID-19 Segmentation

Preprint

Mar 2022

Chest Computational Tomography (CT) scans present low cost, speed and objectivity for COVID-19 diagnosis and deep learning methods have shown great promise in assisting the analysis and interpretation of these images. Most hospitals or countries can train their own models using in-house data, however empirical evidence shows that those models perfo...

Improving Streaming Cryptocurrency Transaction Classification via Biased Sampling and Graph Feedback

Conference Paper

Dec 2021

DeSkew-LSH based Code-to-Code Recommendation Engine

Preprint

Nov 2021

Machine learning on source code (MLOnCode) is a popular research field that has been driven by the availability of large-scale code repositories and the development of powerful probabilistic and deep learning models for mining source code. Code-to-code recommendation is a task in MLOnCode that aims to recommend relevant, diverse and concise code sn...