Article

ChatGPT and conversational artificial intelligence: Friend, foe, or future of research?

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Artificial intelligence (AI) and machine learning are increasingly utilized across healthcare. More recently, there has been a rise in the use AI within research, particularly through novel conversational AI platforms, such as ChatGPT. In this Controversies paper, we discuss the advantages, limitations, and future directions for ChatGPT and other forms of conversational AI in research and scholarly dissemination.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... AI-driven mechanisms can offer flexibility and possibilities that help teachers reduce their cognitive load and stress by saving time and energy and offering various suggestions to teachers' needs. Further, there have been discussions about the stress-reducing and well-being-enhancing functions of AI-based protocols, along with debates concerning the negative aspects of overreliance on AIs (Gottlieb et al., 2023). Also, AI-based protocols have potential drawbacks, such as creating more problems for teachers who overuse them or lack digital literacy and confusing learners in their language learning process. ...
... ChatGPT can comprehend natural, slang, and idiomatic language and provide coherent and contextually appropriate responses to its users. User-friendliness, flexibility, versatility, and language processing capabilities of ChatGPT were among the reasons why the platform absorbed over 1 million users in 5 days after its launch in November 2022 (Gottlieb et al., 2023;Ray, 2023). Introducing itself, ChatGPT indicates that its functionality ranges from answering questions, offering suggestions, and assisting people with their language-related needs to translation, writing assistance, language learning support, and idea generation. ...
... Early investigations suggest that ChatGPT and similar AIs can play four roles in educational contexts: material suppliers, interlocutors, assessors, and assistants (Jeon & Lee, 2023). Creating written content, translating, generating study documents, and disseminating knowledge are possible benefits of AI usage (Gottlieb et al., 2023). Contextualizing ChatGPT in healthcare education, Sallam (2023) noted that AI boosts critical thinking and personalized learning while saving time and money. ...
Article
Language teaching is a highly emotional profession that can affect the teachers' well-being and learners' achievement. However, studies have yet to explore the potential of positive psychology interventions and artificial intelligence (AI) tools to promote the psycho-emotional aspects of second language (L2) teachers and learners. Further, studies regarding the effectiveness of AI in promoting the learners' language skills could have been expansive. Responding to these gaps, researchers chose ChatGPT, an AI-powered chatbot capable of generating natural and coherent texts, as a potential tool to foster positive emotions and interactions between Iranian English language teachers (n = 12) and learners (n = 48) in the L2 writing context. We operationalized ChatGPT in a three-phased writing instruction protocol (CGWIP): (1) a planning phase, where teachers used ChatGPT to brainstorm ideas and generate outlines for each session; (2) an instruction phase, where teachers used ChatGPT to engage the learners in writing process, analyse and reflect on their drafts, and (3) an assessment phase, where teachers used ChatGPT to simulate IELTS writing exam and provided detailed and constructive feedback to the learners. We further tested the effectiveness of CGWIP on teachers' self-efficacy and learners' writing skills before and after a 10-week instruction program. The Independent Samples t-test results showed that CGWIP significantly enhanced teachers' self-efficacy compared to the control group. Also, the results of One Way ANCOVA revealed that CGWIP significantly improved learners' writing skills and that these effects persisted over time. The study implied that the protocol can nurture teachers' efficiency by helping them in various aspects of L2 writing instruction, including brainstorming, revising, providing feedback, and assessment, which in turn, improves learners' writing skills.
... 53 records were included in the dataset, encompassing 23 original articles, 23, including theoretical or empirical work, 11 letters, 60-70 six editorials, 71-76 four reviews, 8,77-79 three comments, 22,80,81 one report 82 and five unspecified articles. [83][84][85][86][87] Most works focus on applications utilizing ChatGPT across various healthcare fields, as indicated in Table II. During analysis, four general themes emerged in our dataset, which we use to structure reporting. ...
... Gottlieb et al. 85 Using Conversational AI to create study documents by translating complex concepts into simpler ones or designing informed consent documents for patients. ...
... 8,83 To certain authors, this could involve condensing crucial aspects of their work, like crafting digestible research documents for ethics reviews or consent forms. 85 However, LLMs capacities are also critically examined, with Tang et al. Tang et al. emphasizing ChatGPT's tendency to produce attribution and misinterpretation errors, potentially distorting original source information, echoing concerns over interpretability, reproducibility, uncertainty handling, and transparency. ...
Preprint
Full-text available
Background: With the introduction of ChatGPT, Large Language Models (LLMs) have received enormous attention in healthcare. Despite their potential benefits, researchers have underscored various ethical implications. While individual instances have drawn much attention, the debate lacks a systematic and comprehensive overview of practical applications currently researched and ethical issues connected to them. Against this background, this work aims to map the ethical landscape surrounding the current stage of deployment of LLMs in medicine and healthcare. Methods: Electronic databases and commonly used preprint servers were queried using a comprehensive search strategy which generated 796 records. Studies were screened and extracted following a modified rapid review approach. Method-ological quality was assessed using a hybrid approach. For 53 records, a meta-aggregative synthesis was performed. Results: Four general fields of applications emerged and testify to a vivid phase of exploration. Advantages of using LLMs are attributed to their capacity in data analysis, personalized information provisioning, and support in decision-making or mitigating information loss and enhancing medical information accessibility. However, our study also identifies recurrent ethical concerns connected to fairness, bias, non-maleficence, transparency , and privacy. A distinctive concern is the tendency to produce harmful misinformation or convincingly but inaccurate content. A recurrent plea for ethical guidance and human oversight is evident. Discussion: Given the variety of use cases, it is suggested that the ethical guidance debate be reframed to focus on defining what constitutes acceptable human oversight across the spectrum of applications. This involves considering the diversity of setting, varying potentials for harm, and different acceptable thresholds for performance and certainty in diverse healthcare settings. In addition, a critical inquiry is necessary to determine the extent to which the current experimental use of LLMs is both necessary and justified.
... 6,7 Inevitably, it has sparked numerous discussions on the objective and ethics of employing LLMs for research purposes. 8 Given that this form of computing is destined to remain and is probably simply venturing to get more advanced and ubiquitous, it's imperative to analyze the potentials, obstacles, and future directions of GPT-4 for research emphasizing medical research. ...
... 10 • The neural machine translation algorithm behind GPT-4 can modify the outcomes for researchers who desire to publish in a non-native language to improve communication fluency, vocabulary, and syntax at a minimal cost. 8 • In addition to finding lacunae in the existing literature across multiple databases to guide future investigations, GPT-4 could likewise be employed to detect notions in the literature for theoretical articles and narrative reviews. 11 ...
... GPT-4 can translate full articles into several languages, increasing an article's worldwide accessibility promptly and may lead to journal citations. 8 • LLMs might compose the abstract part of an academic paper using the conclusive paperwork. 3 ...
Article
In its broadest sense, Artificial Intelligence (AI) describes any machine or computer capable of carrying out operations that normally demand human intellect, such as comprehension, perception, problem-solving skills, and judgement. In the emerging disciplines of generative artificial intelligence (AI), several large language models (LLMs) have evolved as promising tools in the recent decade. ChatGPT is an advanced development in Large Language Model (LLM) technology that employs Deep Learning (DL) techniques to generate human-like responses to natural language inputs. ChatGPT is among the most extensive language models made accessible to the public, and belongs to the family of OpenAI's Generative Pre-trained Transformer (GPT) models. With the help of an extensive text database, ChatGPT can produce reasonable and contextually relevant responses to a wide range of inquiries by comprehending the intricacies and complexity of human language, and having interactive debates. In March 2023, an open AI recently introduced GPT-4 as the latest version of the fine-tuned ChatGPT that can execute an array of real-world tasks significantly faster than an individual by replicating distinct human cognitive abilities including rapid computation, scientific reasoning, visuospatial ability, memory, image analysis as well as comprehension proficiencies.
... It could generate false or fabricated information and ChatGPT training data extend to only 2021 Gottlieb (2023) 0 ...
... The first central theme is that ChatGPT needs to abide by the code of ethics and guidelines. Specifically, the use of ChatGPT was listed in some sources as a tool that has the potential of plagiarism, lack of integrity and over-reliance on ChatGPT (Lund et al., 2023;Sallam, 2023;Hosseini et al., 2023;Gottlieb, 2023;Rahimi et al., 2023;Mijwil et al., 2023;Sok, 2023). The cited references recognize that ChatGPT presents noteworthy prospects for educational institutions to furnish researchers with a diverse range of research avenues (Hong, 2023). ...
Article
Full-text available
... It could generate false or fabricated information and ChatGPT training data extend to only 2021 Gottlieb (2023) 0 ...
... The first central theme is that ChatGPT needs to abide by the code of ethics and guidelines. Specifically, the use of ChatGPT was listed in some sources as a tool that has the potential of plagiarism, lack of integrity and over-reliance on ChatGPT (Lund et al., 2023;Sallam, 2023;Hosseini et al., 2023;Gottlieb, 2023;Rahimi et al., 2023;Mijwil et al., 2023;Sok, 2023). The cited references recognize that ChatGPT presents noteworthy prospects for educational institutions to furnish researchers with a diverse range of research avenues (Hong, 2023). ...
Article
Full-text available
This paper reports on the findings of a systematic review in relation to the research management practices in Higher Learning Institutions through the use of Artificial intelligence (AI) technologies such as ChatGPT in Tanzania. AI technologies have gained significant popularity in recent times. However, their integration into academic settings raises concerns, especially in terms of potential ethical considerations. The systematic review at hand used the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines to retrieve English records in Google Scholar under the phrase "ChatGPT in research¨. Eligibility criteria included the published research papers on ChatGPT and research practices. A total of 28 documents were retrieved. Only 20 documents met the inclusion criteria after full screening. The findings indicate that setting a code of ethics for using AI is paramount. Further research is needed in order to gain detailed insights into this new innovation and technology. It was concluded that ChatGPT in research has to be validated with other methods.
... [ Figure 2] illustrates the pros and cons of AI in research and analysis, leaving the question open to debate whether open AI is a boon or a bane to the scientific community. [6,[42][43][44] Moore et al. 3Fs limiting the confidence in research [45] As a useful guide to comprehending the features which can limit confidence in the contemporary literature, Moore et al., alongside narrating the multitude of relevant factors, present a context-appropriate classification of the same into the 3Fs, eventually culminating as either (F)lawed, (F)utile, or (F) abricated research [ Table 2]. [45] FUTURE DIRECTIONS e quintessential altercation in the subject regarding the peak of the pyramid continues to intensify with some arguing in favor of well-conducted RCTs mimicking realworld situations better than any other study design, thus minimizing the likelihood of confounding. ...
... [46] We believe the "3Es" would be instrumental in achieving this pinnacle, i.e., adequately backed by robust (E)vidence, physician (E)xperience, and meeting the patient (E)xpectations. [6,[42][43][44] [45] • Flawed (prone to imperfections/errors) Due to bias in the study, low standard of study design and execution, risk of carrying forward flaws into SRs and MAs • Futile (unnecessary, irrelevant, and adds no value) ...
Article
Full-text available
Evidence-based medicine (EBM) undeniably classifies as a pre-eminent advance in the clinical approach to decision-making. Although EBM as a topic has been discussed at length, it is more about the process of integrating EBM into practice, wherein the actual debate becomes even more interesting with unique roadblocks cropping up at the very end of the translational highway. Meanwhile, the core concept of EBM has stood firm over decades; it is likely the research landscape and the corresponding intricacies continue to evolve at a rather rampant pace. Evidence-based practice is thus best elaborated in close conjunction with the recent advent of precision medicine, the impact of the coronavirus disease 2019 pandemic, and the ever-compounding present-age research concerns. In this reference, the randomized controlled trials and now the meta-analysis (second-order analysis of analyses) are also being increasingly scrutinized for the contextual veracities and how the quality of the former can be rendered more robust to strengthen our epic pyramid of EBM. Withstanding, the index narrative article is a modern-day take on EBM keeping abreast of the evolving opportunities and challenges, with the noble objective of deliberating a standpoint that aims to potentially bridge some of the existing gaps in the translation of research to patient care and outcome improvement, at large. Keywords: Coronavirus disease 2019, Evidence-based medicine, Meta-analysis, Precision medicine, Randomized controlled trials, Systematic reviews, Research
... malignant disorders (OPMD) or oral squamous cell carcinoma (OSCC) [7]. Cancerophobic individuals or patients diagnosed/treated for OSCC who are psychologically distressed may also gain from ChatGPT interactions though it could also be counterproductive in aggravating their psychological issues [8]. ...
... For some scholars, ChatGPT and similar AIs pinpoint the start of an era where technology precedes pedagogy (Al-Kadi, 2018). In other words, the ever-updating nature of ChatGPT and AI-wired mechanisms challenges the existing educational discourse by posing the idea that AI would be a replacement for teachers (Ausat, Massang, Efendi, Nofirman, & Riady, 2023); however, some believe that the use of ChatGPT should be controlled and structured (Gottlieb, Kline, Schneider, & Coates, 2023;Perry, 2021). ...
Article
Language education as a dynamic field of study requires constant innovations to meet the L2 needs in the classroom milieu. Additionally, the surge of technological advancements warns us about the significance of studying the possible beneficial roles of artificial intelligence in language teaching and learning about which the field is in its infancy stage. In this vein, the present study examined the effectiveness of a four-staged ChatGPT-based rapport-building protocol (CGRBP) on teacher-student rapport and L2 grit to not only profile a non-correlational evidence to L2 emotion studies, but also to link the realm of artificial intelligence with positive psychology in order to find practical ways for cultivating an emotionally supportive learning context. To do so, 30 intermediate-level Iranian EFL learners participated in experimental (n = 15) and control (n = 15) groups in a 16-week instruction program. Data gathered from a pre-test post-test experimental design was analyzed by One-Way ANCOVA and the analyses showed that students who were taught English through CGRBP outperformed the students in control group on L2 grit. The results verified the mediating role of CGRBP in the L2 context by suggesting that the application of a well-structured and staged ChatGPT-based instruction would possibly lead to enhanced L2 grit. Since grit is an integral part of one's positive psycho-emotional network, several theoretical and pedagogical implications were discussed and directions for future explorations were suggested.
... ChatGPT is an advanced large language model that employs machine learning algorithms to generate text of near human quality on a multitude of topics. 5 Recent iterations have excelled in knowledge benchmarks such as the Uniform Bar Examination and even produced academic writing virtually indistinguishable from humanauthored work. 6,7 It is yet to be determined if ChatGPT could aid in crafting LORs, particularly in high-stakes contexts like faculty promotion. ...
Article
Full-text available
Objectives Letters of recommendation (LORs) are essential within academic medicine, affecting a number of important decisions regarding advancement, yet these letters take significant amounts of time and labor to prepare. The use of generative artificial intelligence (AI) tools, such as ChatGPT, are gaining popularity for a variety of academic writing tasks and offer an innovative solution to relieve the burden of letter writing. It is yet to be determined if ChatGPT could aid in crafting LORs, particularly in high‐stakes contexts like faculty promotion. To determine the feasibility of this process and whether there is a significant difference between AI and human‐authored letters, we conducted a study aimed at determining whether academic physicians can distinguish between the two. Methods A quasi‐experimental study was conducted using a single‐blind design. Academic physicians with experience in reviewing LORs were presented with LORs for promotion to associate professor, written by either humans or AI. Participants reviewed LORs and identified the authorship. Statistical analysis was performed to determine accuracy in distinguishing between human and AI‐authored LORs. Additionally, the perceived quality and persuasiveness of the LORs were compared based on suspected and actual authorship. Results A total of 32 participants completed letter review. The mean accuracy of distinguishing between human‐ versus AI‐authored LORs was 59.4%. The reviewer's certainty and time spent deliberating did not significantly impact accuracy. LORs suspected to be human‐authored were rated more favorably in terms of quality and persuasiveness. A difference in gender‐biased language was observed in our letters: human‐authored letters contained significantly more female‐associated words, while the majority of AI‐authored letters tended to use more male‐associated words. Conclusions Participants were unable to reliably differentiate between human‐ and AI‐authored LORs for promotion. AI may be able to generate LORs and relieve the burden of letter writing for academicians. New strategies, policies, and guidelines are needed to balance the benefits of AI while preserving integrity and fairness in academic promotion decisions.
... The language model ChatGPT has taken the world by storm with its impressive conversational abilities and has rapidly become a widely used tool [5]. ChatGPT can engage in human-like interactions, providing an opportunity for personalized and discreet advice [6]. ...
... In recent years, the influence of AI has expanded across various aspects of human life, with ChatGPT emerging as a widely used and powerful tool. While ChatGPT has been helpful in healthcare education and research [11,32,33], concerns about the reliability and accuracy of data, particularly in nephrology, have arisen [34,35]. This study aims to evaluate the effectiveness of ChatGPT in identifying authentic references for literature reviews in the various fields of nephrology and to determine the accuracy of each component in nephrology and specific nephrology areas provided by ChatGPT. ...
Article
Full-text available
Literature reviews are valuable for summarizing and evaluating the available evidence in various medical fields, including nephrology. However, identifying and exploring the potential sources requires focus and time devoted to literature searching for clinicians and researchers. ChatGPT is a novel artificial intelligence (AI) large language model (LLM) renowned for its exceptional ability to generate human-like responses across various tasks. However, whether ChatGPT can effectively assist medical professionals in identifying relevant literature is unclear. Therefore, this study aimed to assess the effectiveness of ChatGPT in identifying references to literature reviews in nephrology. We keyed the prompt “Please provide the references in Vancouver style and their links in recent literature on… name of the topic” into ChatGPT-3.5 (03/23 Version). We selected all the results provided by ChatGPT and assessed them for existence, relevance, and author/link correctness. We recorded each resource’s citations, authors, title, journal name, publication year, digital object identifier (DOI), and link. The relevance and correctness of each resource were verified by searching on Google Scholar. Of the total 610 references in the nephrology literature, only 378 (62%) of the references provided by ChatGPT existed, while 31% were fabricated, and 7% of citations were incomplete references. Notably, only 122 (20%) of references were authentic. Additionally, 256 (68%) of the links in the references were found to be incorrect, and the DOI was inaccurate in 206 (54%) of the references. Moreover, among those with a link provided, the link was correct in only 20% of cases, and 3% of the references were irrelevant. Notably, an analysis of specific topics in electrolyte, hemodialysis, and kidney stones found that >60% of the references were inaccurate or misleading, with less reliable authorship and links provided by ChatGPT. Based on our findings, the use of ChatGPT as a sole resource for identifying references to literature reviews in nephrology is not recommended. Future studies could explore ways to improve AI language models’ performance in identifying relevant nephrology literature.
... As the capabilities of these technologies grow exponentially, their potential applications in diverse fields warrant in-depth exploration. The academic review process within public administration research is among such fields ripe for re-imagination and innovation (Gottlieb et al., 2023). ...
Preprint
Full-text available
In the ever-evolving landscape of academia, artificial intelligence (AI) presents promising opportunities for enhancing the academic review process. In this study, we evaluated the proficiency of Bard and GPT-4, two of the most advanced AI models, in conducting academic reviews. Bard and GPT-4 were compared to human reviewers, highlighting their capabilities and potential areas for improvement. Through a mixed-methods approach of quantitative scoring and qualitative thematic analysis, we observed a consistent performance of the AI models surpassing human reviewers in comprehensibility, clarity of review, the relevance of feedback, and accuracy of technical assessments. Qualitative analysis revealed nuanced proficiency in evaluating structure, readability, argumentation, narrative coherence, attention to detail, data analysis, and implications assessment. While Bard exhibited exemplary performance in basic comprehension and feedback relevance, GPT-4 stood out in detailed analysis, showcasing impressive attention to minor discrepancies and meticulous scrutiny. The results underscore the potential of AI as an invaluable tool in the academic review process, capable of complementing human reviewers to improve the quality, efficiency, and effectiveness of reviews. However, we also identified areas where human reviewers excel, particularly in understanding complex academic language and intricate logical progressions, offering crucial insights for future AI model training and development.
... As the capabilities of these technologies grow exponentially, their potential applications in diverse elds warrant in-depth exploration. The academic review process within public administration research is among such elds ripe for re-imagination and innovation (Gottlieb et al., 2023). ...
Preprint
Full-text available
In the ever-evolving landscape of academia, artificial intelligence (AI) presents promising opportunities for enhancing the academic review process. In this study, we evaluated the proficiency of Bard and GPT-4, two of the most advanced AI models, in conducting academic reviews. Bard and GPT-4 were compared to human reviewers, highlighting their capabilities and potential areas for improvement. Through a mixed-methods approach of quantitative scoring and qualitative thematic analysis, we observed a consistent performance of the AI models surpassing human reviewers in comprehensibility, clarity of review, the relevance of feedback, and accuracy of technical assessments. Qualitative analysis revealed nuanced proficiency in evaluating structure, readability, argumentation, narrative coherence, attention to detail, data analysis, and implications assessment. While Bard exhibited exemplary performance in basic comprehension and feedback relevance, GPT-4 stood out in detailed analysis, showcasing impressive attention to minor discrepancies and meticulous scrutiny. The results underscore the potential of AI as an invaluable tool in the academic review process, capable of complementing human reviewers to improve the quality, efficiency, and effectiveness of reviews. However, we also identified areas where human reviewers excel, particularly in understanding complex academic language and intricate logical progressions, offering crucial insights for future AI model training and development.
Article
Classroom teachers are usually responsible for creating materials to meet students' needs and course requirements. The arrival of generative AI, such as ChatGPT, offers EFL teachers an opportunity to regularly collaborate with AI chatbots to create new source materials. The writing experiments in this study considered how to effectively and ethically work with generative AI to produce culturally appropriate EFL teaching materials. The experiments involved co-producing moral dilemmatic stories with ChatGPT to support a new Chinese EFL curriculum unit on “Morals and Virtues”. We draw on Lo's (2023a, 2023b) CLEAR framework for prompt engineering to generate appropriate stimulus materials. We refer to Durkheim's (1961) sociological theory on morality and Bernstein's (1971, 1981, 2003) concept of framing to unpack how moral decisions might be framed as dilemmas in particular sociocultural contexts. We then mobilize Martin and White's (2005) appraisal framework to uncover the patterns of cultural biases embedded in the AI-generated text. We propose a two-step “Navigation and Generation” method for effective prompt engineering with generative AI: first navigating the AI chatbot to a clear and consistent positioning around a concept, and then requesting the AI chatbot to generate text based on the clarifications generated in the first step. Our appraisal analysis indicates that WEIRD (western, educated, industrial, rich, and democratic) cultural values are embedded in moral dilemmatic stories generated by ChatGPT. EFL Teachers need to be aware of how these values are presented to encourage critical cultural awareness in their students. [Full text (open access) available at: https://doi.org/10.1016/j.caeai.2024.100223]
Article
Background Artificial intelligence (AI), more specifically large language models (LLMs), holds significant potential in revolutionizing emergency care delivery by optimizing clinical workflows and enhancing the quality of decision-making. Although enthusiasm for integrating LLMs into emergency medicine (EM) is growing, the existing literature is characterized by a disparate collection of individual studies, conceptual analyses, and preliminary implementations. Given these complexities and gaps in understanding, a cohesive framework is needed to comprehend the existing body of knowledge on the application of LLMs in EM. Objective Given the absence of a comprehensive framework for exploring the roles of LLMs in EM, this scoping review aims to systematically map the existing literature on LLMs’ potential applications within EM and identify directions for future research. Addressing this gap will allow for informed advancements in the field. Methods Using PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) criteria, we searched Ovid MEDLINE, Embase, Web of Science, and Google Scholar for papers published between January 2018 and August 2023 that discussed LLMs’ use in EM. We excluded other forms of AI. A total of 1994 unique titles and abstracts were screened, and each full-text paper was independently reviewed by 2 authors. Data were abstracted independently, and 5 authors performed a collaborative quantitative and qualitative synthesis of the data. Results A total of 43 papers were included. Studies were predominantly from 2022 to 2023 and conducted in the United States and China. We uncovered four major themes: (1) clinical decision-making and support was highlighted as a pivotal area, with LLMs playing a substantial role in enhancing patient care, notably through their application in real-time triage, allowing early recognition of patient urgency; (2) efficiency, workflow, and information management demonstrated the capacity of LLMs to significantly boost operational efficiency, particularly through the automation of patient record synthesis, which could reduce administrative burden and enhance patient-centric care; (3) risks, ethics, and transparency were identified as areas of concern, especially regarding the reliability of LLMs’ outputs, and specific studies highlighted the challenges of ensuring unbiased decision-making amidst potentially flawed training data sets, stressing the importance of thorough validation and ethical oversight; and (4) education and communication possibilities included LLMs’ capacity to enrich medical training, such as through using simulated patient interactions that enhance communication skills. Conclusions LLMs have the potential to fundamentally transform EM, enhancing clinical decision-making, optimizing workflows, and improving patient outcomes. This review sets the stage for future advancements by identifying key research areas: prospective validation of LLM applications, establishing standards for responsible use, understanding provider and patient perceptions, and improving physicians’ AI literacy. Effective integration of LLMs into EM will require collaborative efforts and thorough evaluation to ensure these technologies can be safely and effectively applied.
Chapter
Full-text available
How do artificial neural networks and other forms of artificial intelligence interfere with methods and practices in the sciences? Which interdisciplinary epistemological challenges arise when we think about the use of AI beyond its dependency on big data? Not only the natural sciences, but also the social sciences and the humanities seem to be increasingly affected by current approaches of subsymbolic AI, which master problems of quality (fuzziness, uncertainty) in a hitherto unknown way. But what are the conditions, implications, and effects of these (potential) epistemic transformations and how must research on AI be configured to address them adequately?
Article
Full-text available
Background: ChatGPT is an artificial intelligence-based tool developed by OpenAI (California, USA). This systematic review examines the potential of ChatGPT in patient care and its role in medical research. Methods: The systematic review was done according to the PRISMA guidelines. Embase, Scopus, PubMed, and Google Scholar databases were searched. We also searched preprint databases. Our search was aimed to identify all kinds of publications, without any restrictions, on ChatGPT and its application in medical research, medical publishing and patient care. We used search term “ChatGPT”. We reviewed all kinds of publications including original articles, reviews, editorial/ commentaries, and even letter to the editor. Each selected records were analysed using ChatGPT and responses generated were compiled in a table. The word table was transformed in to a PDF and was further analysed using ChatPDF. Results: We reviewed full texts of 118 articles. ChatGPT can assist with patient enquiries, note writing, decision-making, trial enrolment, data management, decision support, research support, and patient education. But the solutions it offers are usually insufficient and contradictory, raising questions about their originality, privacy, correctness, bias, and legality. Due to its lack of human-like qualities, ChatGPT’s legitimacy as an author is questioned when used for academic writing. ChatGPT-generated content has concerns with bias and possible plagiarism. Conclusion: Although it can help with patient treatment and research, there are issues with accuracy, authorship, and bias. ChatGPT can serve as a “clinical assistant” and be a help in research and scholarly writing.
Article
Full-text available
We reflect on our experiences of using Generative Pre-trained Transformer ChatGPT, a chatbot launched by OpenAI in November 2022, to draft a research article. We aim to demonstrate how ChatGPT could help researchers to accelerate drafting their papers. We created a simulated data set of 100 000 health care workers with varying ages, Body Mass Index (BMI), and risk profiles. Simulation data allow analysts to test statistical analysis techniques, such as machine-learning based approaches, without compromising patient privacy. Infections were simulated with a randomized probability of hospitalisation. A subset of these fictitious people was vaccinated with a fictional vaccine that reduced this probability of hospitalisation after infection. We then used ChatGPT to help us decide how to handle the simulated data in order to determine vaccine effectiveness and draft a related research paper. AI-based language models in data analysis and scientific writing are an area of growing interest, and this exemplar analysis aims to contribute to the understanding of how ChatGPT can be used to facilitate these tasks.
Article
Full-text available
To the Editor: Recently, there has been significant interest in ChatGPT, an artificial intelligence (AI) program from OpenAI that is making its way into various fields, including medicine.1, 2 The program has been used to write a scientific article3 and even pass Master of Business Administration Course,4 the law-school entrance exam,5 and the medical license test,6 showcasing its remarkable capabilities and usability compared to previous AI models. Large language models, such as ChatGPT, are capable of learning and analyzing vast amounts of language data from various sources, and generating outputs in a human-like manner. Unlike traditional AI that simply analyzes objects and identifies patterns, ChatGPT can create new and unique objects and effects, making it a powerful generative AI. ChatGPT is a program equipped with a language model called Generative Pre-trained Transformer-3.5, which can output common knowledge quite accurately, and can also provide tailored responses based on detailed questions related to resuscitation medicine (Table 1). This is based on online articles, books, etc. that present cardiopulmonary resuscitation (CPR) guidelines. CPR guidelines should be easily accessible and include recommendations and content related to cardiopulmonary resuscitation for healthcare professionals and the general public. However, it can be difficult for people who have not received basic life support education and do not have specialized knowledge to understand the guidelines. While it is possible to search for resuscitation methods via web surfing, incorrect knowledge may be acquired due to unreliable information and unclear sources. Compared to web surfing, ChatGPT allows you to quickly receive well-tailored answers to the desired questions and receive AI-based medical decision making based on the latest research and guidelines. Even without reviewing the guidelines and papers one by one, you can easily check the information summarized and extracted by AI. Of course, according to “Garbage in, garbage out,” an algorithm approached based on inaccurate information can output inaccurate information, and the output information is solely the result of a sophisticated algorithm without human interaction, which can be dangerous in medical situations. Despite this, its utilization value can be very high due to its ability to provide personalized interaction and quick response time. Information about CPR is gaining increasing interest among the general public. Previously, information about CPR was provided in the form of guidelines and articles, which were difficult for the general public to access, and could only be accessed through professional education. However, it is thought that the development of chat-style programs that can provide the latest information about CPR in a way that is easily accessible and understandable to the general public will be possible. Therefore, it is suggested that various options for quickly adopting and utilizing this technology in provision of information on resuscitation and CPR education be explored.
Preprint
Full-text available
Background Large language models such as ChatGPT can produce increasingly realistic text, with unknown information on the accuracy and integrity of using these models in scientific writing. Methods We gathered ten research abstracts from five high impact factor medical journals (n=50) and asked ChatGPT to generate research abstracts based on their titles and journals. We evaluated the abstracts using an artificial intelligence (AI) output detector, plagiarism detector, and had blinded human reviewers try to distinguish whether abstracts were original or generated. Results All ChatGPT-generated abstracts were written clearly but only 8% correctly followed the specific journal’s formatting requirements. Most generated abstracts were detected using the AI output detector, with scores (higher meaning more likely to be generated) of median [interquartile range] of 99.98% [12.73, 99.98] compared with very low probability of AI-generated output in the original abstracts of 0.02% [0.02, 0.09]. The AUROC of the AI output detector was 0.94. Generated abstracts scored very high on originality using the plagiarism detector (100% [100, 100] originality). Generated abstracts had a similar patient cohort size as original abstracts, though the exact numbers were fabricated. When given a mixture of original and general abstracts, blinded human reviewers correctly identified 68% of generated abstracts as being generated by ChatGPT, but incorrectly identified 14% of original abstracts as being generated. Reviewers indicated that it was surprisingly difficult to differentiate between the two, but that the generated abstracts were vaguer and had a formulaic feel to the writing. Conclusion ChatGPT writes believable scientific abstracts, though with completely generated data. These are original without any plagiarism detected but are often identifiable using an AI output detector and skeptical human reviewers. Abstract evaluation for journals and medical conferences must adapt policy and practice to maintain rigorous scientific standards; we suggest inclusion of AI output detectors in the editorial process and clear disclosure if these technologies are used. The boundaries of ethical and acceptable use of large language models to help scientific writing remain to be determined.
Article
Full-text available
Large language models utilizing transformer neural networks and other deep learning architectures demonstrated unprecedented results in many tasks previously accessible only to human intelligence. In this article, we collaborate with ChatGPT, an AI model developed by OpenAI to speculate on the applications of Rapamycin, in the context of Pascal's Wager philosophical argument commonly utilized to justify the belief in god. In response to the query "Write an exhaustive research perspective on why taking Rapamycin may be more beneficial than not taking Rapamycin from the perspective of Pascal's wager" ChatGPT provided the pros and cons for the use of Rapamycin considering the preclinical evidence of potential life extension in animals. This article demonstrates the potential of ChatGPT to produce complex philosophical arguments and should not be used for any off-label use of Rapamycin.
Preprint
Full-text available
We evaluated the performance of a large language model called ChatGPT on the United States Medical Licensing Exam (USMLE), which consists of three exams: Step 1, Step 2CK, and Step 3. ChatGPT performed at or near the passing threshold for all three exams without any specialized training or reinforcement. Additionally, ChatGPT demonstrated a high level of concordance and insight in its explanations. These results suggest that large language models may have the potential to assist with medical education, and potentially, even clinical decision-making.
Article
Full-text available
While the opportunities of ML and AI in healthcare are promising, the growth of complex data-driven prediction models requires careful quality and applicability assessment before they are applied and disseminated in daily practice. This scoping review aimed to identify actionable guidance for those closely involved in AI-based prediction model (AIPM) development, evaluation and implementation including software engineers, data scientists, and healthcare professionals and to identify potential gaps in this guidance. We performed a scoping review of the relevant literature providing guidance or quality criteria regarding the development, evaluation, and implementation of AIPMs using a comprehensive multi-stage screening strategy. PubMed, Web of Science, and the ACM Digital Library were searched, and AI experts were consulted. Topics were extracted from the identified literature and summarized across the six phases at the core of this review: (1) data preparation, (2) AIPM development, (3) AIPM validation, (4) software development, (5) AIPM impact assessment, and (6) AIPM implementation into daily healthcare practice. From 2683 unique hits, 72 relevant guidance documents were identified. Substantial guidance was found for data preparation, AIPM development and AIPM validation (phases 1–3), while later phases clearly have received less attention (software development, impact assessment and implementation) in the scientific literature. The six phases of the AIPM development, evaluation and implementation cycle provide a framework for responsible introduction of AI-based prediction models in healthcare. Additional domain and technology specific research may be necessary and more practical experience with implementing AIPMs is needed to support further guidance.
Article
Full-text available
Background The high demand for health care services and the growing capability of artificial intelligence have led to the development of conversational agents designed to support a variety of health-related activities, including behavior change, treatment support, health monitoring, training, triage, and screening support. Automation of these tasks could free clinicians to focus on more complex work and increase the accessibility to health care services for the public. An overarching assessment of the acceptability, usability, and effectiveness of these agents in health care is needed to collate the evidence so that future development can target areas for improvement and potential for sustainable adoption. Objective This systematic review aims to assess the effectiveness and usability of conversational agents in health care and identify the elements that users like and dislike to inform future research and development of these agents. Methods PubMed, Medline (Ovid), EMBASE (Excerpta Medica dataBASE), CINAHL (Cumulative Index to Nursing and Allied Health Literature), Web of Science, and the Association for Computing Machinery Digital Library were systematically searched for articles published since 2008 that evaluated unconstrained natural language processing conversational agents used in health care. EndNote (version X9, Clarivate Analytics) reference management software was used for initial screening, and full-text screening was conducted by 1 reviewer. Data were extracted, and the risk of bias was assessed by one reviewer and validated by another. Results A total of 31 studies were selected and included a variety of conversational agents, including 14 chatbots (2 of which were voice chatbots), 6 embodied conversational agents (3 of which were interactive voice response calls, virtual patients, and speech recognition screening systems), 1 contextual question-answering agent, and 1 voice recognition triage system. Overall, the evidence reported was mostly positive or mixed. Usability and satisfaction performed well (27/30 and 26/31), and positive or mixed effectiveness was found in three-quarters of the studies (23/30). However, there were several limitations of the agents highlighted in specific qualitative feedback. Conclusions The studies generally reported positive or mixed evidence for the effectiveness, usability, and satisfactoriness of the conversational agents investigated, but qualitative user perceptions were more mixed. The quality of many of the studies was limited, and improved study design and reporting are necessary to more accurately evaluate the usefulness of the agents in health care and identify key areas for improvement. Further research should also analyze the cost-effectiveness, privacy, and security of the agents. International Registered Report Identifier (IRRID) RR2-10.2196/16934
Article
This letter to the editor suggests adding a technical point to the new editorial policy expounded by Hosseini et al. on the mandatory disclosure of any use of natural language processing (NLP) systems, or generative AI, in writing scholarly publications. Such AI systems should naturally also be forbidden from being named as authors, because they would not have fulfilled prevailing authorship guidelines (such as the widely adopted ICMJE authorship criteria).
Article
Conversational AI is a game-changer for science. Here’s how to respond. Conversational AI is a game-changer for science. Here’s how to respond.
Article
At least four articles credit the AI tool as a co-author, as publishers scramble to regulate its use. At least four articles credit the AI tool as a co-author, as publishers scramble to regulate its use. Credit: Iryna Imago/Shutterstock Hands typing on a laptop keyboard with screen showing artificial intelligence chatbot ChatGPT Hands typing on a laptop keyboard with screen showing artificial intelligence chatbot ChatGPT
Article
Infographics are a valuable tool for increasing knowledge translation and dissemination. They can be used to simplify complex topics and supplement the written text of a study. This Educator's Blueprint paper will provide 10 strategies for creating high‐quality infographics. These strategies include selecting appropriate content, defining the target audience, considering the format, selecting the software, using consistent font and color schemes, increasing image utilization, ensuring a consistent flow of ideas, avoiding copyright and HIPAA violations, getting feedback from others, and utilizing effective dissemination strategies. These strategies will help guide educators to increase their ability to create more effective infographics.
Article
Artificial intelligence and machine learning systems are increasingly replacing human decision makers in commercial, healthcare, educational and government contexts. But rather than eliminate human errors and biases, these algorithms have in some cases been found to reproduce or amplify them. We argue that to better understand how and why these biases develop, and when they can be prevented, machine learning researchers should look to the decades-long literature on biases in human learning and decision-making. We examine three broad causes of bias—small and incomplete datasets, learning from the results of your decisions, and biased inference and evaluation processes. For each, findings from the psychology literature are introduced along with connections to the machine learning literature. We argue that rather than viewing machine systems as being universal improvements over human decision makers, policymakers and the public should acknowledge that these system share many of the same limitations that frequently inhibit human judgement, for many of the same reasons. Artificial intelligence and machine learning systems may reproduce or amplify biases. The authors discuss the literature on biases in human learning and decision-making, and propose that researchers, policymakers and the public should be aware of such biases when evaluating the output and decisions made by machines.
Guidelines and quality criteria for artificial intelligence-based prediction models in healthcare: a scoping review
  • Aah De Hond
  • A M Leeuwenberg
  • L Hooft
  • Imj Kant
  • Swj Nijman
  • Hja Van Os
de Hond AAH, Leeuwenberg AM, Hooft L, Kant IMJ, Nijman SWJ, van Os HJA, et al. Guidelines and quality criteria for artificial intelligence-based prediction models in healthcare: a scoping review. NPJ Digit Med. 2022 Jan 10;5(1):2.
Remarks by the President in State of the Union Address
  • The White House
The White House. Remarks by the President in State of the Union Address. Available at: https://obamawhitehouse.archives.gov/the-press-office/2015/01/20/remarkspresident-state-union-address-January-20-2015; January 20, 2015. Last accessed 4/27/2023.
Instructions for Authors
JAMA. Instructions for Authors. Updated January 30, 2023. Available at: https:// jamanetwork.com/journals/jama/pages/instructions-for-authors. Last accessed 4/ 27/2023.
Comparing scientific abstracts generated by ChatGPT to original abstracts using an artificial intelligence output detector, plagiarism detector, and blinded human reviewers
  • Gao