Content uploaded by Kim N. Awa
Author content
All content in this area was uploaded by Kim N. Awa on Sep 19, 2023
Content may be subject to copyright.
Artificial Intelligence is More Creative Than Humans: A Cognitive Science
Perspective on the Current State of Generative Language Models
Kent F. Hubert*+, Kim N. Awa+, and Darya L. Zabelina
University of Arkansas, Department of Psychological Sciences Fayetteville, AR 72701, USA
* Corresponding author
+ These authors contributed equally to this work
______________________________________________________________________________
Abstract
The emergence of publicly accessible artificial intelligence (AI) large language models such as
ChatGPT has given rise to global conversations on the implications of AI capabilities. Emergent
research on AI has challenged the assumption that creative potential is a uniquely human trait
thus, there seems to be a disconnect between human perception versus what AI is objectively
capable of creating. Here, we aimed to assess the creative potential of humans in comparison to
AI. In the present study, human participants (N = 151) and GPT-4 provided responses for the
Alternative Uses Task, Consequences Task, and Divergent Associations Task. We found that AI
was robustly more creative along each divergent thinking measurement in comparison to the
human counterparts. Specifically, when controlling for fluency of responses, AI was more
original and elaborate. The present findings suggest that the current state of AI language models
demonstrate higher creative potential than human respondents.
Keywords: Generative language models, Artificial intelligence, Creative potential, Divergent
thinking, Creative Cognition
______________________________________________________________________________
E-mail addresses: khubert@uark.edu (K.F. Hubert), knawa@uark.edu (K.N. Awa), dlzabeli@uark.edu
(D.L. Zabelina)
Publication status: Under review
Preprint upload date: September 18, 2023
APA Citation: Hubert, K. F., Awa, K. N., & Zabelina, D. L. (2023). Artificial Intelligence is More
Creative Than Humans: A Cognitive Science Perspective on the Current State of Generative Language
Models. PsyArXiv.
Introduction
The emergence of ChatGPT – a natural language processing (NLP) model developed by
OpenAI (2022) to the general public has garnered global conversation on the utility of artificial
intelligence (AI). OpenAI’s Generative Pretrained Transformer (GPT) is a type of machine
learning that specializes in pattern recognition and prediction and has been further trained using
Reinforcement Learning from Human Feedback (RLHF) so that ChatGPT responses would be
indistinguishable from human responses. Recently, OpenAI (2023) has advertised the new model
(GPT-4) as “more creative” particularly “on creative and technical writing tasks” in comparison
to previous versions, although there are arguably semantic limitations such as nonsensical
answers or the possibilities of incorrect information generation (Rahaman et al., 2023). Given the
accessibility of AI models in the current climate, research across a variety of domains has started
to emerge, thus contributing to our growing understanding of the possibilities and potential
limitations of AI.
Creativity as a phenomenological construct is not immune to the effects of AI. For
example, researchers have begun to assess AI models to determine appropriate design solutions
(Lee & Lin, 2023) and logical reasoning (Liu et al., 2023). These assessments focus on
convergent thinking, i.e., determining an optimal solution to a pre-defined problem (Cropley,
2006). In contrast to convergent thinking, which suggests a single solution path, divergent
thinking allows for the flexibility to determine multiple solutions to an ill-defined problem
(Guilford, 1967). Accordingly, a person’s creative potential has been captured via divergent
thinking tasks such as the Alternative Uses Task (Guilford, 1967; Runco & Acar, 2012) or the
Consequences Task (Torrance, 1974; Wilson et al., 1954). Divergent thinking tasks can be
evaluated along three dimensions: fluency (number of responses), originality (response novelty),
and elaboration (length/detail of response). These scores have been used to assess individual
differences in creativity, but given the emergence of OpenAI’s GPT-4 as a large language model,
research has just begun to account for the creative potential of artificial intelligence models.
While human-assistance may aid in accounting for semantic limitations of AI tools,
divergent AI creativity may not necessarily need to rely on inference or emotion to generate
novel products or ideas (Chatterjee, 2022). The notion that emotion is an integral component of
creativity (Kane et al., 2022) may now be a philosophical argument, rather than empirically
supported under the vein of artificial intelligence models (Boden, 2009). For instance, in two
studies, people were shown a series of AI-created artworks, but were told that the pieces were
either human-created or AI-created, with results showing that in general, people thought more
highly of the artworks if they were told the artworks were created by humans (Bellaiche et al.,
2023; Chiarella et al., 2022). The expectancy that AI-created products or ideas are less creative
or hold less aesthetic value than human-created artworks appear to depend on implicit anti-AI
biases (Chiarella et al., 2022; Fortuna & Modliński, 2021; Liu et al., 2022), as AI has been found
to be indistinguishable from human-created products (Chamberlain et al., 2018; Gao et al., 2023;
Samo & Highhouse, 2023).
Indeed, AI has been found to generate novel connections in music (Yin et al., 2023),
science (Gao et al., 2023), medicine (Kumar et al., 2022), and visual art (Anantrasirichai & Bull,
2022) to name a few. In assessments of divergent thinking, humans outperformed AI on the
Alternative Uses Task (Stevenson et al., 2022), but it is noteworthy that the authors propose a
possible rise in AI capabilities given future progress of large language models. In fact, one study
found that AI creativity matched that of humans using a later version of GPT-4 (Haase & Hanel,
2023). Similarly, when scores were compared between humans and GPT-4 on a Divergent
Associations Task (DAT; Olson et al., 2021), the researcher found that GPT-4 was more creative
than human counterparts (Cropley, 2023). Recent research on OpenAI’s text-to-image platform
DALL▪E has reported similar findings (Chen et al., 2023), and suggests that OpenAI models
could match or even outperform humans in combinational creativity tasks. Given the research on
AI creativity thus far, OpenAI’s advertorial claims that GPT-4 is “more creative” may hold more
merit than anticipated.
Current Research
Thus far, the novelty of OpenAI’s ChatGPT has posed more questions that have yet to be
examined. Although creativity has been considered to be uniquely human (Sawyer, 2012), the
emergence of OpenAI’s generative models suggests a possible shift in how people may approach
tasks that require “out of the box” thinking. Thus, the current research aims to examine how
creativity (i.e., fluency, originality, elaboration) may differ between humans and AI on verbal
divergent thinking tasks. To our knowledge, this is the first study to comprehensively examine
the verbal responses across a battery of the most common divergent thinking tasks. We anticipate
that AI may demonstrate higher creative potential in comparison to humans, though given the
recency of AI-centered creativity research, our primary research questions serve as exploratory
in nature.
Methods
Participants
Human Participation
Human participants (N = 151) were recruited via Prolific online data collection platform
in exchange for monetary compensation of $8.00. Participants were limited to having a reported
approval rating above 97%, were proficient English speakers, and were born/resided in the USA.
Average total response time for completing the survey was 34.66 minutes. A statistical
sensitivity analysis indicated that we had sufficient power to detect small effects with the present
sample size (f²=0.06, 1-β=0.80). All statistical analyses were conducted in R studio (2021). See
Table 1 for participant demographics.
Tab l e 1
Demographics of Human Sample
M (SD) or n (%)
Gender
Female
58 (38%)
Male
93 (62%)
Age
41.21 (12.18)
Ethnicity
White or European American
102 (68%)
Black or African American
21 (14%)
Asian or Asian American
11 (7.1%)
Hispanic or Latinx
7 (5%)
Multiracial
10 (7%)
Education
Less than high school
3 (2%)
High school graduate
26 (17%)
Some college
28 (19%)
2 year degree
12 (8%)
4 year degree
62 (41%)
Professional degree
18 (12%)
Doctorate
2 (1%)
Note. N = 151. Percentages depict approximations.
AI Participation
Artificial participants were operationalized as ChatGPT’s instancing feature. Each
ChatGPT session was considered an independent interaction between the user and GPT interface.
Here, we prompted separate instances per creativity measure (as detailed below) which resulted
in artificial participation sessions. For example, we used a single session instance to feed each
prompt and aggregated each prompt response into a data file. In total, we collected 151 instances
which represent AI’s participation for a balanced sample. For two of the creativity measures
(AUT and CT), which are the only timed tasks, fluency was matched 1:1 such that the number of
responses for both groups is equal on these timed tasks. Fluency scores of each human
respondent were first calculated to match 1:1 for each GPT-4 instance for the Alternative Uses
Task and Consequences Task (detailed below). Only valid responses were retained. For example,
human participant #52 had a total fluency score of 6, thus GPT-4 instance #52 was instructed to
provide 6 responses.
Creativity Measures
Alternative Uses Task
The Alternate Uses Task (AUT; Guilford, 1967) was used to test divergent thinking. In
this task, participants were presented with a common object (‘fork’ and ‘rope’) and were asked to
generate as many creative uses as possible for these objects. Responses were scored for fluency
(i.e., number of responses), originality (i.e., uniqueness of responses), and elaboration (i.e.,
number of words per valid response). Participants were given 3 minutes to generate their
responses for each item. Instructions for human respondents on the AUT followed Nusbaum and
colleagues (2014) protocol. See Appendix A.
Because the goal was to control for fluency, we excluded prompt parameters such as
'quantity' from the GPT-4 instructions. Similarly, GPT does not need timing parameters in
comparison to humans because we denoted the specific number of responses required. See
Appendix B for adapted instructions.
Consequences Task
The Consequences Task (CT; Torrance, 1974; Wilson et al., 1954) is part of the verbal
section of the Torrance Test of Creative Thinking (TTCT). Responses were scored for fluency
(i.e., number of responses), originality (i.e., uniqueness of responses), and elaboration (i.e.,
number of words per valid response). See Appendix A.
Participants were given two prompts shown independently: “Imagine humans no longer
needed sleep,” and “Imagine humans walked with their hands.” The two CT prompts have been
extensively used in research on divergent thinking (Acar et al., 2021; Hass & Beaty, 2018; Urban
& Urban, 2022). Similar to the AUT, fluency and timing parameters were excluded from the
GPT instructions on the CT. See Appendix B for adapted instructions.
Divergent Associations Task
The Divergent Association Task (DAT; Olson et al., 2021) is a task of divergent and
verbal semantic creative ability. This task asks participants to come up with 10 nouns as different
from each other as possible. These nouns must not be proper nouns or any type of technical term.
Pairwise comparisons of semantic distance between the 10 nouns are calculated using cosine
distance. The average distance scores between all pairwise comparisons are then multiplied by
100 that results in a final DAT score (https://osf.io/bm5fd/ ). High scores indicate longer
distances (i.e., words are not similar).
There were no time constraints for this task. The average human response time was
126.19 seconds (SD = 90.62) and the average DAT score was 76.95 (SD = 6.13). We scored all
appropriate words that participants gave. Participants with fewer than 7 responses were excluded
from data analysis (n = 2). Instructions were identical for the GPT-4 to the human instructions.
See Appendix A and Appendix B.
Procedure
Human participants’ responses were collected online via Qualtrics. The entire study took
on average 34 minutes (SD = 13.64). The order of the creativity tasks was counterbalanced. The
online study used two attention checks randomly presented throughout the study. Each attention
check allowed one additional attempt. Participants who failed two attention checks were
removed from all analyses (N = 2). After providing their responses to each task, participants
answered demographics questions.
GPT-4 procedural responses were generated through human-assistance facilitated by the
first author, who provided each prompt in the following order: AUT, CT, and DAT. We did not
have to account for typical human-centered confounds such as feelings of fatigue (Day et al.,
2012; Igorov et al., 2016) and order biases (Day et al., 2012) as these states are not relevant
confounds in AI, thus the order of tasks was not counterbalanced.
Results
Creativity Scoring
Both human and GPT-4 responses were cleaned to remove any instances that were
incomplete or inappropriate at two stages: First, human responses that did not follow instructions
from the task or were not understandable as a use (AUT; 0.96% removed) or a consequence (CT;
4.83%) were removed. Only valid human responses were used in matching for GPT fluency;
Second, inappropriate or incomplete GPT responses for the AUT (< .001% removed) and CT (<
.001% removed) were removed. Despite matching for fluency, only valid responses in both
groups were used in subsequent analyses.
The Open Creativity Scoring tool (OCS; Organisciak & Dumas, 2020) was used to score
both the AUT and CT tasks. Specifically, the semantic distance scoring tool (Dumas et al., 2021)
was used, which applies the GLoVe 840B text-mining model (Pennington et al., 2014) to assess
originality of responses by representing a prompt and response as vectors in semantic space and
calculates the cosine of the angle between the vectors. The prompts for the AUT were “rope” and
“fork” and the prompts for the CT were “humans no sleep” and “humans walked hands.” The
OCS tool also scores for elaboration by using the stoplist method (Organisciak & Dumas, 2020).
Automated scoring of semantic distance objectively captures the originality of ideas by assigning
scores of the remoteness (uniqueness) of responses, and circumvents the issues of potential
confounds such as fatigue or implicit biases of the subjective human creativity scoring (Beaty et
al., 2022)
Preliminary Results
Descriptive statistics for all tasks are reported in Table 2 and Table 3. Fluency descriptive
statistics are reported in Table 2. Semantic distance descriptive statistics are reported in Table 3.
Tab l e 2
Descriptive statistics of fluency for Alternative Uses Task, Consequences Task, and
Divergent Associations Task responses for human and GPT-4 samples
Prompt
M (SD)
Median
Skew
Kurtosis
Human
Fork (AUT)
6.82 (3.67)
6
1.79
4.67
Rope (AUT)
7.06 (3.92)
6
1.07
1.17
No more sleep (CT)
5.98 (3.09)
5
1.45
3.48
Walk on hands (CT)
5.44 (3.30)
5
2.73
15.20
DAT
9.72 (.62)
10
-2.73
8.18
GPT-4
Fork (AUT)
6.87 (3.66)
6
1.80
4.69
Rope (AUT)
7.13 (3.95)
6
1.03
1.01
No more sleep (CT)
5.72 (3.03)
5
1.39
3.28
Walk on hands (CT)
5.27 (3.26)
5
2.87
16.60
DAT
9.97 (.18)
10
-5.25
25.93
Note. Skewness and kurtosis of DAT fluency was expected due to the task requiring 10 responses.
Only valid and legible DAT responses were retained between both groups. AUT = Alternative Uses
Task; CT = Consequences Task; DAT = Divergent Associations Task.
Tab l e 3
Descriptive statistics of originality using semantic distance for Alternative Uses Task,
Consequences Task, and Divergent Associations Task responses for human and GPT-4
samples
Prompt
M (SD)
Median
Skew
Kurtosis
Human
Fork (AUT)
.79 (.04)
.79
-.35
.50
Rope (AUT)
.68 (.06)
.68
.03
.03
No more sleep (CT)
.67 (.05)
.67
.18
-.28
Walk on hands (CT)
.67 (.06)
.67
-.58
1.27
DAT
76.95 (6.13)
77.58
-.85
1.5
GPT-4
Fork (AUT)
.84 (.02)
.84
-.14
-.48
Rope (AUT)
.79 (.02)
.80
-.59
1.00
No more sleep (CT)
.71 (.02)
.71
.05
.34
Walk on hands (CT)
.73 (.01)
.73
-.13
.61
DAT
84.56 (3.05)
84.79
-.29
-.48
Note. AUT = Alternative Uses Task; CT = Consequences Task; DAT = Divergent Associations Task.
Primary Results
Alternative Uses Task
As expected, an independent sample t-test revealed no significant differences in total
fluency due to controlling for fluency (as detailed above) between humans (M = 6.94, SD = 3.80)
and GPT-4 (M = 7.01, SD = 3.81), t(602) = .21, 95% CI [-.54, .67], p = .83.
To assess originality of responses via semantic distance scores, we conducted a 2 (group:
human, GPT-4) X 2 (prompt: ‘fork, rope) analysis of variance. The model revealed significant
main effects of group (F(1, 600) = 622.10, p < .001, η² = .51) and prompt (F(1, 600) = 584.50, p
< .001, η² = .49) on originality of responses. Additionally, there were significant interaction
effects between group and prompt, F(1, 600) = 113.80, p < .001, η² = .16. Particularly, both
samples had higher originality scores for the prompt ‘fork’ in comparison to ‘rope,’ but GPT-4
scored higher in originality, regardless of prompt. Tukey’s HSD post hoc analysis showed that
all pairwise comparisons were significantly different (p < .001) aside from the human ‘fork’ and
GPT-4 ‘rope’ originality (p = .989). Overall, GPT-4 was more successful at coming up with
divergent responses given the same number of opportunities to generate answers compared to the
human counterpart and showed higher originality but only for specific prompts (Figure 1).
Figure 1
Analysis of Variance of Originality on the Alternative Uses Task
Next, we compared elaboration scores between humans and GPT-4. Fluency scores differ
from elaboration in the sense that fluency accounts for each coherent response whereas
elaboration quantifies the number of words per valid response. For example, a person could
respond “you could use a fork to knit or as a hair comb.” In this example, the fluency would be 2
(knitting instrument and comb), but the elaboration would be 12 (number of words used in the
response). The results of an independent t-test revealed that elaboration was significantly higher
for GPT-4 (M = 15.45, SD = 6.74) in comparison to humans (M = 3.38, SD = 2.91), t(602) =
28.57, 95% CI [11.24, 12.90], p < .001.
Consequences Task
As expected, an independent t-test revealed no significant differences in total fluency
between humans (M = 5.71, SD = 3.20) and GPT-4 (M = 5.50, SD = 3.15), t(621) = .82, 95% CI
[-.29, .71], p = .41.
To assess originality of responses via semantic distance scores, we conducted a 2 (group:
human, GPT) X 2 (prompt: ‘no more sleep,’ ‘walk on hands’) analysis of variance. The model
revealed significant main effects of group (F(1, 619) = 622.10, p < .001, η² = .51) and prompt
(F(1, 619) = 584.50, p < .001, η² = .49) on the originality of responses. Additionally, there were
significant interaction effects between group and prompt, F(1, 619) = 113.80, p < .001, η² = .16.
Particularly, originality was marginally higher for the prompt ‘walk on hands’ in the GPT
sample, although there were no significant differences in originality in the human sample
between the two prompts. Tukey’s HSD post hoc analysis showed that all pairwise comparisons
were significantly different (p < .001) aside from the human responses for both prompts (p =
.607). Overall, GPT-4 was more successful at coming up with more divergent responses given
the same number of opportunities compared to the human counterparts, and also showed higher
originality dependent on prompt type (Figure 2).
Figure 2
Analysis of Variance of Originality on the Consequences Task
Next, we calculated the difference in elaboration between humans and GPT-4. The results
of an independent I-test revealed that elaboration was significantly higher in the GPT-4 sample
(M =38.69, SD = 15.60) than in the human sample (M = 5.45, SD = 4.04), t(621) = -36.04, 95%
CI [-35.04, -31.45], p < .001.
Divergent Associations Task
Overall, humans had a higher number of single-occurrence words (n = 523) that
accounted for 69.92% within the total group response in comparison to GPT’s number of single-
occurrence words (n = 152) that accounted for 47.95% within the total group response (Table 4).
In total, there was 9.11% (n = 97) of overlapping responses between both groups. Exclusively
unique words that only occurred in the human responses accounted for 87.03% (n = 651) in
comparison to unique GPT responses which accounted for 69.40% (n = 220). A chi-square test
of independence was performed to examine the relationship between groups (GPT vs human)
and word type (single occurrence vs unique occurrence). The relationship between these
variables was not significant, 2 (1, N = 302) = 1.56, p = .211. This suggests that uniqueness and
occurrences of words may not have necessarily aided either group in originality, but rather aided
in word complexity.
Tab l e 4
Top 20 most frequent words on the Divergent Association Task in human and GPT-4 samples
Human
GPT-4
Word
Frequency
Word
Frequency
Dog
28
Elephant
98
Car
25
Symphony
55
Book
25
Microscope
51
Cloud
22
Quasar
44
Tree
21
Freedom
44
Computer
20
Dream
43
Water
16
Democracy
43
Chair
16
Love
40
Cat
16
Volcano
39
Moon
13
Quantum
39
Table
12
Philosophy
31
Sky
12
Microbe
27
Ocean
12
Galaxy
27
Mountain
12
Desert
26
Grass
12
Compass
22
Elephant
11
Microchip
19
Paper
10
Ocean
16
Flower
10
Justice
15
Fire
10
Harmony
15
Shoe
9
Dolphin
15
Differences in semantic distance scores were calculated between human and GPT-4 DAT
responses. An independent sample t-test revealed that GPT responses (M = 84.56, SD = 3.05)
had higher semantic distances in comparison to human responses (M = 76.95, SD = 6.13), t(300)
= 13.65, 95% CI [6.51, 8.71], p < .001. Despite human participants having a broader range of
unique responses, the fluency uniqueness did not appear to advantage semantic distance scores
when comparing groups.
Discussion
The present study offers novel evidence on the current state of large language models
(i.e., GPT-4) and the capabilities of divergent creative output in comparison to human
participants. Overall, GPT-4 was more original and elaborate than humans on each of the
divergent thinking tasks, even when controlling for fluency of responses. In other words, GPT-4
demonstrated higher creative potential across an entire battery of divergent thinking tasks. To our
knowledge, this is the first evidence in the field of artificial intelligence and creativity research
that demonstrates the creative potential of AI as superior to human potential. Notably, no other
study has comprehensively assessed multiple dimensions of the most frequently used divergent
thinking tasks and AI. One previous study showed that humans outperformed GPT on the AUT
(GPT-3; Stevenson et al., 2022), while another study reported that later versions of GPT (GPT-4
showed similar, albeit slightly less, creative potential in comparison to humans (Haase & Hanel,
2023). Considering the findings of the present study, the current state of LLM’s has surpassed
human-level creative potential. Indeed, only one other study thus far has reported similar results
that GPT outperformed humans on th e D AT ( Cropley, 2023), but the DAT is only one aspect of
divergent thinking. Instead, the novelty of the present findings give a foundation for future
research to continue to examine multiple dimensions of creativity and artificial intelligence.
While the present results suggest that the current state of AI models outperform humans
on creativity tasks by a significant margin, there are methodological considerations that could
have contributed to the present results. To comprehensively examine creativity requires not only
an assessment of originality, but also of the usefulness and appropriateness of an idea or product
(Runco & Jaeger, 2012). Traditionally, this has proven difficult to standardize in comparison to
assessing originality given the multifaceted dimensions that contribute to assessments of
appropriateness such as accounting for sociocultural and historical contexts. Semantic distance
scores do not take into consideration the aforementioned variables; instead, the scores reflect the
relative distance between seemingly related (or unrelated) ideas. In this instance, GPT-4’s
answers yielded higher originality than human counterparts, but the feasibility or appropriateness
of an idea could be vastly inferior to that of humans. Thus, we need to consider that the results
reflect only a single aspect of divergent creativity, rather than a generalization that AI is indeed
more creative across the board. Future research on AI and creativity needs to not only account
for the traditional measurements of creativity (i.e., fluency, elaboration, originality) but also for
the usefulness and appropriateness of the ideas.
Interestingly, GPT-4 used a higher frequency of repeated words in comparison to human
respondents. Although the breadth of vocabulary used by human responses was much more
flexible, this did not necessarily result in higher semantic distance scores. The complexity of
words chosen by AI, albeit more concentrated in occurrence, could have more robustly
contributed to the originality effects. For example, only AI used words that are non-tangible
items (i.e., freedom, philosophy) whereas humans may have experienced a fixedness on
generating ideas that are appropriate and observable. The differences between generated lists
(incorporating tangible and non-tangible word) could inflate originality to be biased toward AI.
Similarly, we need to critically consider the uniqueness of words generated in DAT
responses. There was a marginal overlap of responses between the human and the AI samples
(9.11%), but humans responded with a higher number of single-occurrence words. Despite these
differences, AI still had a higher semantic distance score. Prior research shows that in human
respondents originality increases over time (Beaty & Silvia, 2012). This increase is seen as an
expansion of activation in an individual’s semantic network, which leads to more original
responses (Mednick, 1962). Human responses on these DT tasks tend to follow a diminishing
returns curve before reaching a plateau for an individual’s more original responses (Hubert et al.,
2023). The higher levels of elaboration and semantic distance in AI responses suggests that the
LLM processing possibly does not need this ramp-up time as seen in human responses, therefore
LLM’s can respond with their highest level of original responses when prompted. Whereas
humans may fixate on more obvious responses at first, this algorithmic trait could then serve as
an aid in overcoming ideation fixedness in humans.
It is important to note that the measures used in this study are all measures of creative
potential, but the involvement in creative activities or achievements are another aspect of
measuring a person’s creativity. Particularly, researchers have examined the interplay between
creative potential and real-world creative achievements (Carson et al., 2005; Jauk et al., 2014)
but this approach assumes human level creativity, and is not able to account for artificial
intelligence. AI is able to come up with creative ideas, but we cannot assume that this potential
would translate to achievement. Thus, future research should consider the conceptual
implications of current measurements of creativity and how generalizability across creative
domains may be a human-centric consideration.
The prevalence and accessibility of the internet has drastically shaped the way in which
humans interact with language processing systems and search engines. LLM’s such as GPT-4 are
now not an exception in ubiquity. Searching for information has multiple channels which were
not previously available, and with these functions come an array of strategies to best find the
desired information. Research has shown that younger people are better and more efficient in
their search strategies online to find the information they want (Chevalier et al., 2015), which
suggests that exposure to search platforms acts as a practice in efficiency. Similar to interactions
with GPT-4 and other AI platforms, humans may gradually navigate how to best utilize LLM’s.
For information seeking tools like GPT-4, the creative potential has shown clear progression in
capabilities, albeit there are still limitations such as response appropriateness. Regardless,
approaching AI as a tool of inspiration, as an aid in a person’s creative process, or to overcome
fixedness is promising.
Funding
This research was funded by the Robert C. and Sandra Connor Endowed Faculty Fellowship
(GF002580) to DLZ.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal
relationships that could have appeared to influence the work reported in this paper.
Research Disclosure Statement
All variables, measurements, and exclusions for this article’s target research question have been
reported in the methods section.
Author contributions
All authors contributed to the conceptualization and methodology. KFH and KNA contributed to
formal analysis and investigation. All authors contributed to writing and revision.
References
Anantrasirichai, N., & Bull, D. (2022). Artificial intelligence in the creative industries: A review.
Artificial Intelligence Review, 1–68. https://doi.org/10.1007/s10462-021-10039-7
Acar, S., Berthiaume, K., Grajzel, K., Dumas, D., Flemister, C. “Tedd,” & Organisciak, P.
(2021). Applying automated originality scoring to the verbal form of Torrance tests of
creative thinking. Gifted Child Quarterly, 67(1), 3–17.
https://doi.org/10.1177/00169862211061874
Beaty, R. E., & Silvia, P. J. (2012). Why do ideas get more creative across time? An executive
interpretation of the serial order effect in divergent thinking tasks. Psychology of
Aesthetics, Creativity, and the Arts, 6(4), 309–319. https://doi.org/10.1037/a0029171
Beaty, R. E., Johnson, D. R., Zeitlen, D. C., &; Forthmann, B. (2022). Semantic distance and the
alternate uses task: Recommendations for Reliable Automated Assessment of originality.
Creativity Research Journal, 34(3), 245–260.
https://doi.org/10.1080/10400419.2022.2025720
Bellaiche, L., Shahi, R., Turpin, M. H., Ragnhildstveit, A., Sprockett, S., Barr, N., ... & Seli, P.
(2023). Humans versus AI: whether and why we prefer human-created compared to AI-
created artwork. Cognitive Research: Principles and Implications, 8(1), 1-22.
https://doi.org/10.1186/s41235-023-00499-6
Boden, M. A. (2009). Computer models of creativity. AI Magazine, 30(3), 23-23.
https://doi.org/10.1609/aimag.v30i3.2254
Carson, S. H., Peterson, J. B., & Higgins, D. M. (2005). Reliability, validity, and Factor
Structure of the creative achievement questionnaire. Creativity Research Journal, 17(1),
37–50. https://doi.org/10.1207/s15326934crj1701_4
Chamberlain, R., Mullin, C., Scheerlinck, B., & Wagemans, J. (2018). Putting the art in artificial:
Aesthetic responses to computer-generated art. Psychology of Aesthetics, Creativity, and
the Arts, 12(2), 177–192. https://doi.org/10.1037/aca0000136
Chatterjee, A. (2022). Art in an age of artificial intelligence. Frontiers in Psychology, 13,
1024449. https://doi.org/10.3389/fpsyg.2022.1024449
Chen, L., Sun, L., & Han, J. (2023). A Comparison Study of Human and Machine-Generated
Creativity. Journal of Computing and Information Science in Engineering, 23(5),
051012. https://doi.org/10.1115/1.4062232
Chevalier, A., Dommes, A., & Marquié, J.-C. (2015). Strategy and accuracy during
information search on the web: Effects of age and complexity of the search questions.
Computers in Human Behavior, 53, 305–315. https://doi.org/10.1016/j.chb.2015.07.017
Chiarella, S., Torromino, G., Gagliardi, D., Rossi, D., Babiloni, F., & Cartocci, G. (2022).
Investigating the negative bias towards artificial intelligence: Effects of prior assignment
of AI-authorship on the aesthetic appreciation of abstract paintings. Computers in Human
Behavior, 137, 107406. https://doi.org/10.1016/j.chb.2022.107406
Cropley, A. (2006). In praise of convergent thinking. Creativity research journal, 18(3), 391-
404. https://doi.org/10.1207/s15326934crj1803_13
Cropley, D. H. (2023). Is AI More Creative Than Humans? Chatgpt and the Divergent
Association Task. https://psyarxiv.com/jzt72/
Dumas, D., Organisciak, P., & Doherty, M. (2021). Measuring divergent thinking originality
with human raters and text-mining models: A psychometric comparison of methods.
Psychology of Aesthetics, Creativity, and the Arts, 15(4), 645–663.
https://doi.org/10.1037/aca0000319
Day, B., Bateman, I. J., Carson, R. T., Dupont, D., Louviere, J. J., Morimoto, S., Scarpa, R., &
Wang, P. (2012). Ordering effects and choice set awareness in repeat-response stated
preference studies. Journal of Environmental Economics and Management, 63(1), 73–91.
https://doi.org/10.1016/j.jeem.2011.09.001
Fortuna, P., & Modliński, A. (2021). A(I)rtist or counterfeiter? Artificial intelligence as (D)
evaluating factor on the art market. The Journal of Arts Management, Law, and Society,
51(3), 188-201. https://doi.org/10.1080/10632921.2021.1887032
Gao, C. A., Howard, F. M., Markov, N. S., Dyer, E. C., Ramesh, S., Luo, Y., & Pearson, A. T.
(2022). Comparing scientific abstracts generated by ChatGPT to original abstracts using
an artificial intelligence output detector, plagiarism detector, and blinded human
reviewers. bioRxiv. https://doi.org/10.1016/j.patter.2023.100706
Guilford, J. P. (1967). The nature of human intelligence. McGraw-Hill.
Haase, J., & Hanel, P. H. (2023). Artificial muses: Generative Artificial Intelligence Chatbots
Have Risen to Human-Level Creativity. arXiv preprint arXiv:2303.12003.
https://doi.org/10.48550/arXiv.2303.12003
Hass, R. W., & Beaty, R. E. (2018). Use or consequences: Probing the cognitive difference
between two measures of divergent thinking. Frontiers in psychology, 9, 2327.
https://doi.org/10.3389/fpsyg.2018.02327
Hubert K. F., Finch A., Zabelina D. (2023). Diminishing Creative Returns: Predicting Optimal
Creative Performance via Individual Differences in Executive Functioning. Manuscript
submitted for publication.
Igorov, M., Predoiu, R., Predoiu, A., & Igorov, A. (2016). Creativity, resistance to mental fatigue
and coping strategies in junior women handball players. European Proceedings of Social
& Behavioural Sciences. https://doi.org/10.15405/epsbs.2016.06.39
Jauk, E., Benedek, M., & Neubauer, A. C. (2014). The road to creative achievement: A latent
variable model of ability and personality predictors. Personality and Individual
Differences, 60. https://doi.org/10.1016/j.paid.2013.07.129
Kane, S., Awa, K., Upshaw, J., Hubert, K., Stevens, C., & Zabelina, D. (2023). Attention, Affect,
and Creativity, from Mindfulness to Mind-Wandering. In Z. Ivcevic, J. Hoffmann, & J.
Kaufman (Eds.), The Cambridge Handbook of Creativity and Emotions (Cambridge
Handbooks in Psychology, pp. 130-148). Cambridge: Cambridge University Press.
https://doi.org/10.1017/9781009031240.010
Kumar, Y., Koul, A., Singla, R., & Ijaz, M. F. (2022). Artificial intelligence in disease diagnosis:
a systematic literature review, synthesizing framework and future research agenda.
Journal of Ambient Intelligence and Humanized Computing, 1-28.
https://doi.org/10.1007/s12652-021-03612-z
Liu, Y., Mittal, A., Yang, D., & Bruckman, A. (2022). Will AI console me when I lose my pet?
Understanding perceptions of AI-mediated email writing. In CHI Conference on Human
Factors in Computing Systems. https://doi.org/10.1145/3491102.3517731
Lee, Y. H., & Lin, T. H. (2023, July). The Feasibility Study of AI Image Generator as Shape
Convergent Thinking Tool. In International Conference on Human-Computer
Interaction (pp. 575-589). Cham: Springer Nature Switzerland.
https://doi.org/10.1007/978-3-031-35891-3_36
Mednick, S. (1962). The associative basis of the creative process. Psychological Review, 69(3),
220–232. https://doi.org/10.1037/h0048850
Nusbaum, E. C., Silvia, P. J., & Beaty, R. E. (2014). Ready, set, create: What instructing people
to “be creative” reveals about the meaning and mechanisms of divergent thinking.
Psychology of Aesthetics, Creativity, and the Arts, 8(4), 423.
https://doi.org/10.1037/a0036549
Olson, J. A., Nahas, J., Chmoulevitch, D., Cropper, S. J., & Webb, M. E. (2021). Naming
unrelated words predicts creativity. Proceedings of the National Academy of Sciences,
118(25). https://doi.org/10.1073/pnas.2022340118
OpenAI. (2022). ChatGPT: Optimizing Language Models for Dialogue. Available at:
https://openai.com/blog/chatgpt/ (Accessed July, 2023).
Organisciak, P. & Dumas, D. (2020). Open Creativity Scoring [Computer software]. Denver,
CO: University of Denver. https://openscoring.du.edu/
Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word
representation. In Proceedings of the 2014 conference on empirical methods in natural
language processing (EMNLP) (pp. 1532-1543).
Rahaman, M. S., Ahsan, M. T., Anjum, N., Terano, H. J. R., & Rahman, M. M. (2023). From
ChatGPT-3 to GPT-4: a significant advancement in ai-driven NLP tools. Journal of
Engineering and Emerging Technologies, 2(1), 1-11.
https://doi.org/10.52631/jeet.v2i1.188
R Core Team. (2021). R: A language and environment for statistical computing (Version 4.1.0)
[Computer software]. Retrieved from http://www.R-project.org
Runco, M. A., & Acar, S. (2012). Divergent thinking as an indicator of creative potential.
Creativity research journal, 24(1), 66-75. https://doi.org/10.1080/10400419.2012.652929
Runco, M. A., &; Jaeger, G. J. (2012). The standard definition of creativity. Creativity Research
Journal, 24(1), 92–96. https://doi.org/10.1080/10400419.2012.650092
Samo, A., & Highhouse, S. (2023). Artificial intelligence and art: Identifying the aesthetic
judgment factors that distinguish human- and machine-generated artwork. Psychology of
Aesthetics, Creativity, and the Arts. Advance online publication.
https://doi.org/10.1037/aca0000570
Sawyer, R. K. (2012). Explaining creativity: The science of human innovation. Oxford university
press.
Stevenson, C., Smal, I., Baas, M., Grasman, R., & van der Maas, H. (2022). Putting GPT-3's
Creativity to the (Alternative Uses) Test. arXiv preprint arXiv:2206.08932
Torrance, E. P. (1974). The Torrance tests of creative thinking: Norms-technical manual.
Princeton, NJ: Personal Press.
Urban, M., & Urban, K. (2023). Orientation Toward Intrinsic Motivation Mediates the
Relationship Between Metacognition and Creativity. The Journal of Creative Behavior,
57(1), 6-16. https://doi.org/10.1002/jocb.558
Wilson, R. C., Guilford, J. P., Christensen, P. R., & Lewis, D. J. (1954). A factor-analytic study
of creative-thinking abilities. Psychometrika, 19(4), 297–311.
https://doi.org/10.1007/bf02289230
Yin, Z., Reuben, F., Stepney, S., & Collins, T. (2023). Deep learning’s shallow gains: a
comparative evaluation of algorithms for automatic music generation. Machine Learning,
112(5), 1785-1822. https://doi.org/10.1007/s10994-023-06309-w
Appendix A
Instructions for human sample
Alternative Uses Task
“For this task, you'll be asked to come up with as many original and creative uses for [item]
as you can. The goal is to come up with creative ideas, which are ideas that strike people as
clever, unusual, interesting, uncommon, humorous, innovative, or different.
Your ideas don't have to be practical or realistic; they can be silly or strange, even, so long as
they are CREATIVE uses rather than ordinary uses.
You can enter as many ideas as you like. The task will take 3 minutes. You can type in as many
ideas as you like until then, but creative quality is more important than quantity. It's better to
have a few really good ideas than a lot of uncreative ones. List as many ORIGINAL and
CREATIVE uses for a [item].”
Consequences Task
“In this task, a statement will appear on the screen. The statement might be something like
"imagine gravity ceases to exist". For 3 minutes, try and think of any and all consequences
that might result from the statement. Please be as creative as you like. The goal is to come up
with creative ideas, which are ideas that strike people as clever, unusual, interesting,
uncommon, humorous, innovative, or different.
Your responses will be scored based on originality and quality. Remember, it is important to
try to keep thinking of responses and to type them in for the entire time for the prompt.
REMINDER: In this task, a statement will appear on the screen. The statement might be
something like "imagine gravity ceases to exist". For 3 minutes, try and think of any and all
consequences that might result from the statement. Do this as many times as you can in 3 min.
The screen will automatically change when the time is completed. Remember, it is important to
try to keep thinking of responses and to type them in for the entire time for the prompt.”
Divergent Associations Task
“Please enter 10 words that are as different from each other as possible, in all meanings and
uses of the words. The rules: Only single words in English. Only nouns (e.g., things, objects,
concepts). No proper nouns (e.g., no specific people or places). No specialized vocabulary
(e.g., no technical terms). Think of the words on your own (e.g., do not just look at objects in
your surroundings).”
Appendix B
Instructions for AI sample
Alternative Uses Task
“For this task, you'll be asked to come up with as original and creative uses for [item] as you
can. The goal is to come up with creative ideas, which are ideas that strike people as clever,
unusual, interesting, uncommon, humorous, innovative, or different.
Your ideas don't have to be practical or realistic; they can be silly or strange, even, so long as
they are CREATIVE uses rather than ordinary uses. List [insert fluency number] ORIGINAL
and CREATIVE uses for a [item].”
Consequences Task
“In this task, a statement will appear on the screen. The statement might be something like
"imagine gravity ceases to exist". Please be as creative as you like. The goal is to come up with
creative ideas, which are ideas that strike people as clever, unusual, interesting, uncommon,
humorous, innovative, or different. Your responses will be scored based on originality and
quality.
Try and think of any and all consequences that might result from the statement. [Insert
scenario]. What problems might this create? List [insert fluency number] CREATIVE
consequences.”
Divergent Associations Task
“Please enter 10 words that are as different from each other as possible, in all meanings and
uses of the words. The rules: Only single words in English. Only nouns (e.g., things, objects,
concepts). No proper nouns (e.g., no specific people or places). No specialized vocabulary (e.g.,
no technical terms). Think of the words on your own (e.g., do not just look at objects in your
surroundings).”