Benjamin M. Schmidt's scientific contributions

What is this page?


This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

Publications (2)


The landscape of biomedical research
  • Article

April 2024

·

9 Reads

·

8 Citations

Patterns

Rita González-Márquez

·

·

Benjamin M. Schmidt

·

[...]

·

Dmitry Kobak
Share

Quality metrics for the embeddings. Acc.: kNN accuracy (k = 10) of label prediction. RMSE: root-mean- squared error of kNN prediction of publication year. Recall: overlap between k nearest neighbours in the 2D embedding and in the high-dimensional space. See Methods for details.
The landscape of biomedical research
  • Preprint
  • File available

April 2023

·

178 Reads

·

6 Citations

The number of publications in biomedicine and life sciences has rapidly grown over the last decades, with over 1.5 million papers now published every year. This makes it difficult to keep track of new scientific works and to have an overview of the evolution of the field as a whole. Here we present a 2D atlas of the entire corpus of biomedical literature, and argue that it provides a unique and useful overview of the life sciences research. We base our atlas on the abstract texts of 21 million English articles from the PubMed database. To embed the abstracts into 2D, we use a large language model PubMedBERT, combined with t-SNE tailored to handle samples of our size. We use our atlas to study the emergence of the Covid-19 literature, the evolution of the neuroscience discipline, the uptake of machine learning, and the distribution of gender imbalance in academic authorship. Furthermore, we present an interactive web version of our atlas that allows easy exploration and will enable further insights and facilitate future research.

Download

Citations (2)


... We found words like ebola with r = 9.9 in 2015 and zika with r = 40.4 in 2017, but from 2013 until 2019, no single word has ever shown excess frequency gap δ > 0.01. This changed during the Covid pandemic: in 2020-2022 words like coronavirus, lockdown, and pandemic showed very large excess usages (up to r > 1000 and δ = 0.037), in agreement with the observation that the Covid pandemic had an unprecedented effect on biomedical publishing (González-Márquez et al., 2024). ...

Reference:

Delving into ChatGPT usage in academic writing through excess vocabulary
The landscape of biomedical research
  • Citing Article
  • April 2024

Patterns

... In addition, the tSNE representation in low dimension has been shown to be of relevance to represent a knowledge space. In a recent study leveraging a 2D tSNE embedding of 21 million Pubmed articles [62], the accuracy of a k-Nearest-Neighbors prediction in the 2D tSNE embedding (63%) was found to be very close to the one obtained with a 768-dimensional BERT embedding (69.7%), and a 4,679,130-dimensional TF-IDF method (65.2%). This finding reflects the high information preservation of 2D tSNE embeddings in the context of representing scientific landscapes and the good performance of tSNE in comparison to alternative and higher-dimensional methods. ...

The landscape of biomedical research