PreprintPDF Available

Artificial Intelligence is More Creative Than Humans: A Cognitive Science Perspective on the Current State of Generative Language Models

Authors:
  • Univeristy of Arkansas
Preprints and early-stage research may not have been peer reviewed yet.

Abstract

The emergence of publicly accessible artificial intelligence (AI) large language models such as ChatGPT has given rise to global conversations on the implications of AI capabilities. Emergent research on AI has challenged the assumption that creative potential is a uniquely human trait thus, there seems to be a disconnect between human perception versus what AI is objectively capable of creating. Here, we aimed to assess the creative potential of humans in comparison to AI. In the present study, human participants (N = 151) and GPT-4 provided responses for the Alternative Uses Task, Consequences Task, and Divergent Associations Task. We found that AI was robustly more creative along each divergent thinking measurement in comparison to the human counterparts. Specifically, when controlling for fluency of responses, AI was more original and elaborate. The present findings suggest that the current state of AI language models demonstrate higher creative potential than human respondents.
Artificial Intelligence is More Creative Than Humans: A Cognitive Science
Perspective on the Current State of Generative Language Models
Kent F. Hubert*+, Kim N. Awa+, and Darya L. Zabelina
University of Arkansas, Department of Psychological Sciences Fayetteville, AR 72701, USA
* Corresponding author
+ These authors contributed equally to this work
______________________________________________________________________________
Abstract
The emergence of publicly accessible artificial intelligence (AI) large language models such as
ChatGPT has given rise to global conversations on the implications of AI capabilities. Emergent
research on AI has challenged the assumption that creative potential is a uniquely human trait
thus, there seems to be a disconnect between human perception versus what AI is objectively
capable of creating. Here, we aimed to assess the creative potential of humans in comparison to
AI. In the present study, human participants (N = 151) and GPT-4 provided responses for the
Alternative Uses Task, Consequences Task, and Divergent Associations Task. We found that AI
was robustly more creative along each divergent thinking measurement in comparison to the
human counterparts. Specifically, when controlling for fluency of responses, AI was more
original and elaborate. The present findings suggest that the current state of AI language models
demonstrate higher creative potential than human respondents.
Keywords: Generative language models, Artificial intelligence, Creative potential, Divergent
thinking, Creative Cognition
______________________________________________________________________________
E-mail addresses: khubert@uark.edu (K.F. Hubert), knawa@uark.edu (K.N. Awa), dlzabeli@uark.edu
(D.L. Zabelina)
Publication status: Under review
Preprint upload date: September 18, 2023
APA Citation: Hubert, K. F., Awa, K. N., & Zabelina, D. L. (2023). Artificial Intelligence is More
Creative Than Humans: A Cognitive Science Perspective on the Current State of Generative Language
Models. PsyArXiv.
Introduction
The emergence of ChatGPT a natural language processing (NLP) model developed by
OpenAI (2022) to the general public has garnered global conversation on the utility of artificial
intelligence (AI). OpenAI’s Generative Pretrained Transformer (GPT) is a type of machine
learning that specializes in pattern recognition and prediction and has been further trained using
Reinforcement Learning from Human Feedback (RLHF) so that ChatGPT responses would be
indistinguishable from human responses. Recently, OpenAI (2023) has advertised the new model
(GPT-4) as “more creative” particularly “on creative and technical writing tasks” in comparison
to previous versions, although there are arguably semantic limitations such as nonsensical
answers or the possibilities of incorrect information generation (Rahaman et al., 2023). Given the
accessibility of AI models in the current climate, research across a variety of domains has started
to emerge, thus contributing to our growing understanding of the possibilities and potential
limitations of AI.
Creativity as a phenomenological construct is not immune to the effects of AI. For
example, researchers have begun to assess AI models to determine appropriate design solutions
(Lee & Lin, 2023) and logical reasoning (Liu et al., 2023). These assessments focus on
convergent thinking, i.e., determining an optimal solution to a pre-defined problem (Cropley,
2006). In contrast to convergent thinking, which suggests a single solution path, divergent
thinking allows for the flexibility to determine multiple solutions to an ill-defined problem
(Guilford, 1967). Accordingly, a person’s creative potential has been captured via divergent
thinking tasks such as the Alternative Uses Task (Guilford, 1967; Runco & Acar, 2012) or the
Consequences Task (Torrance, 1974; Wilson et al., 1954). Divergent thinking tasks can be
evaluated along three dimensions: fluency (number of responses), originality (response novelty),
and elaboration (length/detail of response). These scores have been used to assess individual
differences in creativity, but given the emergence of OpenAI’s GPT-4 as a large language model,
research has just begun to account for the creative potential of artificial intelligence models.
While human-assistance may aid in accounting for semantic limitations of AI tools,
divergent AI creativity may not necessarily need to rely on inference or emotion to generate
novel products or ideas (Chatterjee, 2022). The notion that emotion is an integral component of
creativity (Kane et al., 2022) may now be a philosophical argument, rather than empirically
supported under the vein of artificial intelligence models (Boden, 2009). For instance, in two
studies, people were shown a series of AI-created artworks, but were told that the pieces were
either human-created or AI-created, with results showing that in general, people thought more
highly of the artworks if they were told the artworks were created by humans (Bellaiche et al.,
2023; Chiarella et al., 2022). The expectancy that AI-created products or ideas are less creative
or hold less aesthetic value than human-created artworks appear to depend on implicit anti-AI
biases (Chiarella et al., 2022; Fortuna & Modliński, 2021; Liu et al., 2022), as AI has been found
to be indistinguishable from human-created products (Chamberlain et al., 2018; Gao et al., 2023;
Samo & Highhouse, 2023).
Indeed, AI has been found to generate novel connections in music (Yin et al., 2023),
science (Gao et al., 2023), medicine (Kumar et al., 2022), and visual art (Anantrasirichai & Bull,
2022) to name a few. In assessments of divergent thinking, humans outperformed AI on the
Alternative Uses Task (Stevenson et al., 2022), but it is noteworthy that the authors propose a
possible rise in AI capabilities given future progress of large language models. In fact, one study
found that AI creativity matched that of humans using a later version of GPT-4 (Haase & Hanel,
2023). Similarly, when scores were compared between humans and GPT-4 on a Divergent
Associations Task (DAT; Olson et al., 2021), the researcher found that GPT-4 was more creative
than human counterparts (Cropley, 2023). Recent research on OpenAI’s text-to-image platform
DALLE has reported similar findings (Chen et al., 2023), and suggests that OpenAI models
could match or even outperform humans in combinational creativity tasks. Given the research on
AI creativity thus far, OpenAI’s advertorial claims that GPT-4 is “more creative” may hold more
merit than anticipated.
Current Research
Thus far, the novelty of OpenAI’s ChatGPT has posed more questions that have yet to be
examined. Although creativity has been considered to be uniquely human (Sawyer, 2012), the
emergence of OpenAI’s generative models suggests a possible shift in how people may approach
tasks that require “out of the box” thinking. Thus, the current research aims to examine how
creativity (i.e., fluency, originality, elaboration) may differ between humans and AI on verbal
divergent thinking tasks. To our knowledge, this is the first study to comprehensively examine
the verbal responses across a battery of the most common divergent thinking tasks. We anticipate
that AI may demonstrate higher creative potential in comparison to humans, though given the
recency of AI-centered creativity research, our primary research questions serve as exploratory
in nature.
Methods
Participants
Human Participation
Human participants (N = 151) were recruited via Prolific online data collection platform
in exchange for monetary compensation of $8.00. Participants were limited to having a reported
approval rating above 97%, were proficient English speakers, and were born/resided in the USA.
Average total response time for completing the survey was 34.66 minutes. A statistical
sensitivity analysis indicated that we had sufficient power to detect small effects with the present
sample size (f²=0.06, 1-β=0.80). All statistical analyses were conducted in R studio (2021). See
Table 1 for participant demographics.
Tab l e 1
Demographics of Human Sample
M (SD) or n (%)
Gender
Female
58 (38%)
Male
93 (62%)
Age
41.21 (12.18)
Ethnicity
White or European American
102 (68%)
Black or African American
21 (14%)
Asian or Asian American
11 (7.1%)
Hispanic or Latinx
7 (5%)
Multiracial
10 (7%)
Education
Less than high school
3 (2%)
High school graduate
26 (17%)
Some college
28 (19%)
2 year degree
12 (8%)
4 year degree
62 (41%)
Professional degree
18 (12%)
Doctorate
2 (1%)
Note. N = 151. Percentages depict approximations.
AI Participation
Artificial participants were operationalized as ChatGPT’s instancing feature. Each
ChatGPT session was considered an independent interaction between the user and GPT interface.
Here, we prompted separate instances per creativity measure (as detailed below) which resulted
in artificial participation sessions. For example, we used a single session instance to feed each
prompt and aggregated each prompt response into a data file. In total, we collected 151 instances
which represent AI’s participation for a balanced sample. For two of the creativity measures
(AUT and CT), which are the only timed tasks, fluency was matched 1:1 such that the number of
responses for both groups is equal on these timed tasks. Fluency scores of each human
respondent were first calculated to match 1:1 for each GPT-4 instance for the Alternative Uses
Task and Consequences Task (detailed below). Only valid responses were retained. For example,
human participant #52 had a total fluency score of 6, thus GPT-4 instance #52 was instructed to
provide 6 responses.
Creativity Measures
Alternative Uses Task
The Alternate Uses Task (AUT; Guilford, 1967) was used to test divergent thinking. In
this task, participants were presented with a common object (‘fork’ and ‘rope’) and were asked to
generate as many creative uses as possible for these objects. Responses were scored for fluency
(i.e., number of responses), originality (i.e., uniqueness of responses), and elaboration (i.e.,
number of words per valid response). Participants were given 3 minutes to generate their
responses for each item. Instructions for human respondents on the AUT followed Nusbaum and
colleagues (2014) protocol. See Appendix A.
Because the goal was to control for fluency, we excluded prompt parameters such as
'quantity' from the GPT-4 instructions. Similarly, GPT does not need timing parameters in
comparison to humans because we denoted the specific number of responses required. See
Appendix B for adapted instructions.
Consequences Task
The Consequences Task (CT; Torrance, 1974; Wilson et al., 1954) is part of the verbal
section of the Torrance Test of Creative Thinking (TTCT). Responses were scored for fluency
(i.e., number of responses), originality (i.e., uniqueness of responses), and elaboration (i.e.,
number of words per valid response). See Appendix A.
Participants were given two prompts shown independently: “Imagine humans no longer
needed sleep,” and “Imagine humans walked with their hands.” The two CT prompts have been
extensively used in research on divergent thinking (Acar et al., 2021; Hass & Beaty, 2018; Urban
& Urban, 2022). Similar to the AUT, fluency and timing parameters were excluded from the
GPT instructions on the CT. See Appendix B for adapted instructions.
Divergent Associations Task
The Divergent Association Task (DAT; Olson et al., 2021) is a task of divergent and
verbal semantic creative ability. This task asks participants to come up with 10 nouns as different
from each other as possible. These nouns must not be proper nouns or any type of technical term.
Pairwise comparisons of semantic distance between the 10 nouns are calculated using cosine
distance. The average distance scores between all pairwise comparisons are then multiplied by
100 that results in a final DAT score (https://osf.io/bm5fd/ ). High scores indicate longer
distances (i.e., words are not similar).
There were no time constraints for this task. The average human response time was
126.19 seconds (SD = 90.62) and the average DAT score was 76.95 (SD = 6.13). We scored all
appropriate words that participants gave. Participants with fewer than 7 responses were excluded
from data analysis (n = 2). Instructions were identical for the GPT-4 to the human instructions.
See Appendix A and Appendix B.
Procedure
Human participants’ responses were collected online via Qualtrics. The entire study took
on average 34 minutes (SD = 13.64). The order of the creativity tasks was counterbalanced. The
online study used two attention checks randomly presented throughout the study. Each attention
check allowed one additional attempt. Participants who failed two attention checks were
removed from all analyses (N = 2). After providing their responses to each task, participants
answered demographics questions.
GPT-4 procedural responses were generated through human-assistance facilitated by the
first author, who provided each prompt in the following order: AUT, CT, and DAT. We did not
have to account for typical human-centered confounds such as feelings of fatigue (Day et al.,
2012; Igorov et al., 2016) and order biases (Day et al., 2012) as these states are not relevant
confounds in AI, thus the order of tasks was not counterbalanced.
Results
Creativity Scoring
Both human and GPT-4 responses were cleaned to remove any instances that were
incomplete or inappropriate at two stages: First, human responses that did not follow instructions
from the task or were not understandable as a use (AUT; 0.96% removed) or a consequence (CT;
4.83%) were removed. Only valid human responses were used in matching for GPT fluency;
Second, inappropriate or incomplete GPT responses for the AUT (< .001% removed) and CT (<
.001% removed) were removed. Despite matching for fluency, only valid responses in both
groups were used in subsequent analyses.
The Open Creativity Scoring tool (OCS; Organisciak & Dumas, 2020) was used to score
both the AUT and CT tasks. Specifically, the semantic distance scoring tool (Dumas et al., 2021)
was used, which applies the GLoVe 840B text-mining model (Pennington et al., 2014) to assess
originality of responses by representing a prompt and response as vectors in semantic space and
calculates the cosine of the angle between the vectors. The prompts for the AUT were “rope” and
“fork” and the prompts for the CT were “humans no sleep” and “humans walked hands.” The
OCS tool also scores for elaboration by using the stoplist method (Organisciak & Dumas, 2020).
Automated scoring of semantic distance objectively captures the originality of ideas by assigning
scores of the remoteness (uniqueness) of responses, and circumvents the issues of potential
confounds such as fatigue or implicit biases of the subjective human creativity scoring (Beaty et
al., 2022)
Preliminary Results
Descriptive statistics for all tasks are reported in Table 2 and Table 3. Fluency descriptive
statistics are reported in Table 2. Semantic distance descriptive statistics are reported in Table 3.
Tab l e 2
Descriptive statistics of fluency for Alternative Uses Task, Consequences Task, and
Divergent Associations Task responses for human and GPT-4 samples
Prompt
Median
Skew
Kurtosis
Human
Fork (AUT)
6
1.79
4.67
Rope (AUT)
6
1.07
1.17
No more sleep (CT)
5
1.45
3.48
Walk on hands (CT)
5
2.73
15.20
DAT
10
-2.73
8.18
GPT-4
Fork (AUT)
6
1.80
4.69
Rope (AUT)
6
1.03
1.01
No more sleep (CT)
5
1.39
3.28
Walk on hands (CT)
5
2.87
16.60
DAT
10
-5.25
25.93
Note. Skewness and kurtosis of DAT fluency was expected due to the task requiring 10 responses.
Only valid and legible DAT responses were retained between both groups. AUT = Alternative Uses
Task; CT = Consequences Task; DAT = Divergent Associations Task.
Tab l e 3
Descriptive statistics of originality using semantic distance for Alternative Uses Task,
Consequences Task, and Divergent Associations Task responses for human and GPT-4
samples
Prompt
Median
Skew
Kurtosis
Human
Fork (AUT)
.79
-.35
.50
Rope (AUT)
.68
.03
.03
No more sleep (CT)
.67
.18
-.28
Walk on hands (CT)
.67
-.58
1.27
DAT
77.58
-.85
1.5
GPT-4
Fork (AUT)
.84
-.14
-.48
Rope (AUT)
.80
-.59
1.00
No more sleep (CT)
.71
.05
.34
Walk on hands (CT)
.73
-.13
.61
DAT
84.79
-.29
-.48
Note. AUT = Alternative Uses Task; CT = Consequences Task; DAT = Divergent Associations Task.
Primary Results
Alternative Uses Task
As expected, an independent sample t-test revealed no significant differences in total
fluency due to controlling for fluency (as detailed above) between humans (M = 6.94, SD = 3.80)
and GPT-4 (M = 7.01, SD = 3.81), t(602) = .21, 95% CI [-.54, .67], p = .83.
To assess originality of responses via semantic distance scores, we conducted a 2 (group:
human, GPT-4) X 2 (prompt: ‘fork, rope) analysis of variance. The model revealed significant
main effects of group (F(1, 600) = 622.10, p < .001, η² = .51) and prompt (F(1, 600) = 584.50, p
< .001, η² = .49) on originality of responses. Additionally, there were significant interaction
effects between group and prompt, F(1, 600) = 113.80, p < .001, η² = .16. Particularly, both
samples had higher originality scores for the prompt ‘fork’ in comparison to ‘rope,’ but GPT-4
scored higher in originality, regardless of prompt. Tukey’s HSD post hoc analysis showed that
all pairwise comparisons were significantly different (p < .001) aside from the human ‘fork’ and
GPT-4 ‘rope’ originality (p = .989). Overall, GPT-4 was more successful at coming up with
divergent responses given the same number of opportunities to generate answers compared to the
human counterpart and showed higher originality but only for specific prompts (Figure 1).
Figure 1
Analysis of Variance of Originality on the Alternative Uses Task
Next, we compared elaboration scores between humans and GPT-4. Fluency scores differ
from elaboration in the sense that fluency accounts for each coherent response whereas
elaboration quantifies the number of words per valid response. For example, a person could
respond “you could use a fork to knit or as a hair comb.” In this example, the fluency would be 2
(knitting instrument and comb), but the elaboration would be 12 (number of words used in the
response). The results of an independent t-test revealed that elaboration was significantly higher
for GPT-4 (M = 15.45, SD = 6.74) in comparison to humans (M = 3.38, SD = 2.91), t(602) =
28.57, 95% CI [11.24, 12.90], p < .001.
Consequences Task
As expected, an independent t-test revealed no significant differences in total fluency
between humans (M = 5.71, SD = 3.20) and GPT-4 (M = 5.50, SD = 3.15), t(621) = .82, 95% CI
[-.29, .71], p = .41.
To assess originality of responses via semantic distance scores, we conducted a 2 (group:
human, GPT) X 2 (prompt: ‘no more sleep,’ ‘walk on hands’) analysis of variance. The model
revealed significant main effects of group (F(1, 619) = 622.10, p < .001, η² = .51) and prompt
(F(1, 619) = 584.50, p < .001, η² = .49) on the originality of responses. Additionally, there were
significant interaction effects between group and prompt, F(1, 619) = 113.80, p < .001, η² = .16.
Particularly, originality was marginally higher for the prompt ‘walk on hands’ in the GPT
sample, although there were no significant differences in originality in the human sample
between the two prompts. Tukey’s HSD post hoc analysis showed that all pairwise comparisons
were significantly different (p < .001) aside from the human responses for both prompts (p =
.607). Overall, GPT-4 was more successful at coming up with more divergent responses given
the same number of opportunities compared to the human counterparts, and also showed higher
originality dependent on prompt type (Figure 2).
Figure 2
Analysis of Variance of Originality on the Consequences Task
Next, we calculated the difference in elaboration between humans and GPT-4. The results
of an independent I-test revealed that elaboration was significantly higher in the GPT-4 sample
(M =38.69, SD = 15.60) than in the human sample (M = 5.45, SD = 4.04), t(621) = -36.04, 95%
CI [-35.04, -31.45], p < .001.
Divergent Associations Task
Overall, humans had a higher number of single-occurrence words (n = 523) that
accounted for 69.92% within the total group response in comparison to GPT’s number of single-
occurrence words (n = 152) that accounted for 47.95% within the total group response (Table 4).
In total, there was 9.11% (n = 97) of overlapping responses between both groups. Exclusively
unique words that only occurred in the human responses accounted for 87.03% (n = 651) in
comparison to unique GPT responses which accounted for 69.40% (n = 220). A chi-square test
of independence was performed to examine the relationship between groups (GPT vs human)
and word type (single occurrence vs unique occurrence). The relationship between these
variables was not significant, 2 (1, N = 302) = 1.56, p = .211. This suggests that uniqueness and
occurrences of words may not have necessarily aided either group in originality, but rather aided
in word complexity.
Tab l e 4
Top 20 most frequent words on the Divergent Association Task in human and GPT-4 samples
Human
GPT-4
Word
Frequency
Word
Frequency
Dog
28
Elephant
98
Car
25
Symphony
55
Book
25
Microscope
51
Cloud
22
Quasar
44
Tree
21
Freedom
44
Computer
20
Dream
43
Water
16
Democracy
43
Chair
16
Love
40
Cat
16
Volcano
39
Moon
13
Quantum
39
Table
12
Philosophy
31
Sky
12
Microbe
27
Ocean
12
Galaxy
27
Mountain
12
Desert
26
Grass
12
Compass
22
Elephant
11
Microchip
19
Paper
10
Ocean
16
Flower
10
Justice
15
Fire
10
Harmony
15
Shoe
9
Dolphin
15
Differences in semantic distance scores were calculated between human and GPT-4 DAT
responses. An independent sample t-test revealed that GPT responses (M = 84.56, SD = 3.05)
had higher semantic distances in comparison to human responses (M = 76.95, SD = 6.13), t(300)
= 13.65, 95% CI [6.51, 8.71], p < .001. Despite human participants having a broader range of
unique responses, the fluency uniqueness did not appear to advantage semantic distance scores
when comparing groups.
Discussion
The present study offers novel evidence on the current state of large language models
(i.e., GPT-4) and the capabilities of divergent creative output in comparison to human
participants. Overall, GPT-4 was more original and elaborate than humans on each of the
divergent thinking tasks, even when controlling for fluency of responses. In other words, GPT-4
demonstrated higher creative potential across an entire battery of divergent thinking tasks. To our
knowledge, this is the first evidence in the field of artificial intelligence and creativity research
that demonstrates the creative potential of AI as superior to human potential. Notably, no other
study has comprehensively assessed multiple dimensions of the most frequently used divergent
thinking tasks and AI. One previous study showed that humans outperformed GPT on the AUT
(GPT-3; Stevenson et al., 2022), while another study reported that later versions of GPT (GPT-4
showed similar, albeit slightly less, creative potential in comparison to humans (Haase & Hanel,
2023). Considering the findings of the present study, the current state of LLM’s has surpassed
human-level creative potential. Indeed, only one other study thus far has reported similar results
that GPT outperformed humans on th e D AT ( Cropley, 2023), but the DAT is only one aspect of
divergent thinking. Instead, the novelty of the present findings give a foundation for future
research to continue to examine multiple dimensions of creativity and artificial intelligence.
While the present results suggest that the current state of AI models outperform humans
on creativity tasks by a significant margin, there are methodological considerations that could
have contributed to the present results. To comprehensively examine creativity requires not only
an assessment of originality, but also of the usefulness and appropriateness of an idea or product
(Runco & Jaeger, 2012). Traditionally, this has proven difficult to standardize in comparison to
assessing originality given the multifaceted dimensions that contribute to assessments of
appropriateness such as accounting for sociocultural and historical contexts. Semantic distance
scores do not take into consideration the aforementioned variables; instead, the scores reflect the
relative distance between seemingly related (or unrelated) ideas. In this instance, GPT-4’s
answers yielded higher originality than human counterparts, but the feasibility or appropriateness
of an idea could be vastly inferior to that of humans. Thus, we need to consider that the results
reflect only a single aspect of divergent creativity, rather than a generalization that AI is indeed
more creative across the board. Future research on AI and creativity needs to not only account
for the traditional measurements of creativity (i.e., fluency, elaboration, originality) but also for
the usefulness and appropriateness of the ideas.
Interestingly, GPT-4 used a higher frequency of repeated words in comparison to human
respondents. Although the breadth of vocabulary used by human responses was much more
flexible, this did not necessarily result in higher semantic distance scores. The complexity of
words chosen by AI, albeit more concentrated in occurrence, could have more robustly
contributed to the originality effects. For example, only AI used words that are non-tangible
items (i.e., freedom, philosophy) whereas humans may have experienced a fixedness on
generating ideas that are appropriate and observable. The differences between generated lists
(incorporating tangible and non-tangible word) could inflate originality to be biased toward AI.
Similarly, we need to critically consider the uniqueness of words generated in DAT
responses. There was a marginal overlap of responses between the human and the AI samples
(9.11%), but humans responded with a higher number of single-occurrence words. Despite these
differences, AI still had a higher semantic distance score. Prior research shows that in human
respondents originality increases over time (Beaty & Silvia, 2012). This increase is seen as an
expansion of activation in an individual’s semantic network, which leads to more original
responses (Mednick, 1962). Human responses on these DT tasks tend to follow a diminishing
returns curve before reaching a plateau for an individual’s more original responses (Hubert et al.,
2023). The higher levels of elaboration and semantic distance in AI responses suggests that the
LLM processing possibly does not need this ramp-up time as seen in human responses, therefore
LLM’s can respond with their highest level of original responses when prompted. Whereas
humans may fixate on more obvious responses at first, this algorithmic trait could then serve as
an aid in overcoming ideation fixedness in humans.
It is important to note that the measures used in this study are all measures of creative
potential, but the involvement in creative activities or achievements are another aspect of
measuring a person’s creativity. Particularly, researchers have examined the interplay between
creative potential and real-world creative achievements (Carson et al., 2005; Jauk et al., 2014)
but this approach assumes human level creativity, and is not able to account for artificial
intelligence. AI is able to come up with creative ideas, but we cannot assume that this potential
would translate to achievement. Thus, future research should consider the conceptual
implications of current measurements of creativity and how generalizability across creative
domains may be a human-centric consideration.
The prevalence and accessibility of the internet has drastically shaped the way in which
humans interact with language processing systems and search engines. LLM’s such as GPT-4 are
now not an exception in ubiquity. Searching for information has multiple channels which were
not previously available, and with these functions come an array of strategies to best find the
desired information. Research has shown that younger people are better and more efficient in
their search strategies online to find the information they want (Chevalier et al., 2015), which
suggests that exposure to search platforms acts as a practice in efficiency. Similar to interactions
with GPT-4 and other AI platforms, humans may gradually navigate how to best utilize LLM’s.
For information seeking tools like GPT-4, the creative potential has shown clear progression in
capabilities, albeit there are still limitations such as response appropriateness. Regardless,
approaching AI as a tool of inspiration, as an aid in a person’s creative process, or to overcome
fixedness is promising.
Funding
This research was funded by the Robert C. and Sandra Connor Endowed Faculty Fellowship
(GF002580) to DLZ.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal
relationships that could have appeared to influence the work reported in this paper.
Research Disclosure Statement
All variables, measurements, and exclusions for this article’s target research question have been
reported in the methods section.
Author contributions
All authors contributed to the conceptualization and methodology. KFH and KNA contributed to
formal analysis and investigation. All authors contributed to writing and revision.
References
Anantrasirichai, N., & Bull, D. (2022). Artificial intelligence in the creative industries: A review.
Artificial Intelligence Review, 1–68. https://doi.org/10.1007/s10462-021-10039-7
Acar, S., Berthiaume, K., Grajzel, K., Dumas, D., Flemister, C. “Tedd,” & Organisciak, P.
(2021). Applying automated originality scoring to the verbal form of Torrance tests of
creative thinking. Gifted Child Quarterly, 67(1), 3–17.
https://doi.org/10.1177/00169862211061874
Beaty, R. E., & Silvia, P. J. (2012). Why do ideas get more creative across time? An executive
interpretation of the serial order effect in divergent thinking tasks. Psychology of
Aesthetics, Creativity, and the Arts, 6(4), 309–319. https://doi.org/10.1037/a0029171
Beaty, R. E., Johnson, D. R., Zeitlen, D. C., &; Forthmann, B. (2022). Semantic distance and the
alternate uses task: Recommendations for Reliable Automated Assessment of originality.
Creativity Research Journal, 34(3), 245–260.
https://doi.org/10.1080/10400419.2022.2025720
Bellaiche, L., Shahi, R., Turpin, M. H., Ragnhildstveit, A., Sprockett, S., Barr, N., ... & Seli, P.
(2023). Humans versus AI: whether and why we prefer human-created compared to AI-
created artwork. Cognitive Research: Principles and Implications, 8(1), 1-22.
https://doi.org/10.1186/s41235-023-00499-6
Boden, M. A. (2009). Computer models of creativity. AI Magazine, 30(3), 23-23.
https://doi.org/10.1609/aimag.v30i3.2254
Carson, S. H., Peterson, J. B., & Higgins, D. M. (2005). Reliability, validity, and Factor
Structure of the creative achievement questionnaire. Creativity Research Journal, 17(1),
37–50. https://doi.org/10.1207/s15326934crj1701_4
Chamberlain, R., Mullin, C., Scheerlinck, B., & Wagemans, J. (2018). Putting the art in artificial:
Aesthetic responses to computer-generated art. Psychology of Aesthetics, Creativity, and
the Arts, 12(2), 177–192. https://doi.org/10.1037/aca0000136
Chatterjee, A. (2022). Art in an age of artificial intelligence. Frontiers in Psychology, 13,
1024449. https://doi.org/10.3389/fpsyg.2022.1024449
Chen, L., Sun, L., & Han, J. (2023). A Comparison Study of Human and Machine-Generated
Creativity. Journal of Computing and Information Science in Engineering, 23(5),
051012. https://doi.org/10.1115/1.4062232
Chevalier, A., Dommes, A., &amp; Marquié, J.-C. (2015). Strategy and accuracy during
information search on the web: Effects of age and complexity of the search questions.
Computers in Human Behavior, 53, 305–315. https://doi.org/10.1016/j.chb.2015.07.017
Chiarella, S., Torromino, G., Gagliardi, D., Rossi, D., Babiloni, F., & Cartocci, G. (2022).
Investigating the negative bias towards artificial intelligence: Effects of prior assignment
of AI-authorship on the aesthetic appreciation of abstract paintings. Computers in Human
Behavior, 137, 107406. https://doi.org/10.1016/j.chb.2022.107406
Cropley, A. (2006). In praise of convergent thinking. Creativity research journal, 18(3), 391-
404. https://doi.org/10.1207/s15326934crj1803_13
Cropley, D. H. (2023). Is AI More Creative Than Humans? Chatgpt and the Divergent
Association Task. https://psyarxiv.com/jzt72/
Dumas, D., Organisciak, P., & Doherty, M. (2021). Measuring divergent thinking originality
with human raters and text-mining models: A psychometric comparison of methods.
Psychology of Aesthetics, Creativity, and the Arts, 15(4), 645–663.
https://doi.org/10.1037/aca0000319
Day, B., Bateman, I. J., Carson, R. T., Dupont, D., Louviere, J. J., Morimoto, S., Scarpa, R., &
Wang, P. (2012). Ordering effects and choice set awareness in repeat-response stated
preference studies. Journal of Environmental Economics and Management, 63(1), 73–91.
https://doi.org/10.1016/j.jeem.2011.09.001
Fortuna, P., & Modliński, A. (2021). A(I)rtist or counterfeiter? Artificial intelligence as (D)
evaluating factor on the art market. The Journal of Arts Management, Law, and Society,
51(3), 188-201. https://doi.org/10.1080/10632921.2021.1887032
Gao, C. A., Howard, F. M., Markov, N. S., Dyer, E. C., Ramesh, S., Luo, Y., & Pearson, A. T.
(2022). Comparing scientific abstracts generated by ChatGPT to original abstracts using
an artificial intelligence output detector, plagiarism detector, and blinded human
reviewers. bioRxiv. https://doi.org/10.1016/j.patter.2023.100706
Guilford, J. P. (1967). The nature of human intelligence. McGraw-Hill.
Haase, J., & Hanel, P. H. (2023). Artificial muses: Generative Artificial Intelligence Chatbots
Have Risen to Human-Level Creativity. arXiv preprint arXiv:2303.12003.
https://doi.org/10.48550/arXiv.2303.12003
Hass, R. W., & Beaty, R. E. (2018). Use or consequences: Probing the cognitive difference
between two measures of divergent thinking. Frontiers in psychology, 9, 2327.
https://doi.org/10.3389/fpsyg.2018.02327
Hubert K. F., Finch A., Zabelina D. (2023). Diminishing Creative Returns: Predicting Optimal
Creative Performance via Individual Differences in Executive Functioning. Manuscript
submitted for publication.
Igorov, M., Predoiu, R., Predoiu, A., & Igorov, A. (2016). Creativity, resistance to mental fatigue
and coping strategies in junior women handball players. European Proceedings of Social
&amp; Behavioural Sciences. https://doi.org/10.15405/epsbs.2016.06.39
Jauk, E., Benedek, M., & Neubauer, A. C. (2014). The road to creative achievement: A latent
variable model of ability and personality predictors. Personality and Individual
Differences, 60. https://doi.org/10.1016/j.paid.2013.07.129
Kane, S., Awa, K., Upshaw, J., Hubert, K., Stevens, C., & Zabelina, D. (2023). Attention, Affect,
and Creativity, from Mindfulness to Mind-Wandering. In Z. Ivcevic, J. Hoffmann, & J.
Kaufman (Eds.), The Cambridge Handbook of Creativity and Emotions (Cambridge
Handbooks in Psychology, pp. 130-148). Cambridge: Cambridge University Press.
https://doi.org/10.1017/9781009031240.010
Kumar, Y., Koul, A., Singla, R., & Ijaz, M. F. (2022). Artificial intelligence in disease diagnosis:
a systematic literature review, synthesizing framework and future research agenda.
Journal of Ambient Intelligence and Humanized Computing, 1-28.
https://doi.org/10.1007/s12652-021-03612-z
Liu, Y., Mittal, A., Yang, D., & Bruckman, A. (2022). Will AI console me when I lose my pet?
Understanding perceptions of AI-mediated email writing. In CHI Conference on Human
Factors in Computing Systems. https://doi.org/10.1145/3491102.3517731
Lee, Y. H., & Lin, T. H. (2023, July). The Feasibility Study of AI Image Generator as Shape
Convergent Thinking Tool. In International Conference on Human-Computer
Interaction (pp. 575-589). Cham: Springer Nature Switzerland.
https://doi.org/10.1007/978-3-031-35891-3_36
Mednick, S. (1962). The associative basis of the creative process. Psychological Review, 69(3),
220–232. https://doi.org/10.1037/h0048850
Nusbaum, E. C., Silvia, P. J., & Beaty, R. E. (2014). Ready, set, create: What instructing people
to “be creative” reveals about the meaning and mechanisms of divergent thinking.
Psychology of Aesthetics, Creativity, and the Arts, 8(4), 423.
https://doi.org/10.1037/a0036549
Olson, J. A., Nahas, J., Chmoulevitch, D., Cropper, S. J., & Webb, M. E. (2021). Naming
unrelated words predicts creativity. Proceedings of the National Academy of Sciences,
118(25). https://doi.org/10.1073/pnas.2022340118
OpenAI. (2022). ChatGPT: Optimizing Language Models for Dialogue. Available at:
https://openai.com/blog/chatgpt/ (Accessed July, 2023).
Organisciak, P. & Dumas, D. (2020). Open Creativity Scoring [Computer software]. Denver,
CO: University of Denver. https://openscoring.du.edu/
Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word
representation. In Proceedings of the 2014 conference on empirical methods in natural
language processing (EMNLP) (pp. 1532-1543).
Rahaman, M. S., Ahsan, M. T., Anjum, N., Terano, H. J. R., & Rahman, M. M. (2023). From
ChatGPT-3 to GPT-4: a significant advancement in ai-driven NLP tools. Journal of
Engineering and Emerging Technologies, 2(1), 1-11.
https://doi.org/10.52631/jeet.v2i1.188
R Core Team. (2021). R: A language and environment for statistical computing (Version 4.1.0)
[Computer software]. Retrieved from http://www.R-project.org
Runco, M. A., & Acar, S. (2012). Divergent thinking as an indicator of creative potential.
Creativity research journal, 24(1), 66-75. https://doi.org/10.1080/10400419.2012.652929
Runco, M. A., &; Jaeger, G. J. (2012). The standard definition of creativity. Creativity Research
Journal, 24(1), 92–96. https://doi.org/10.1080/10400419.2012.650092
Samo, A., & Highhouse, S. (2023). Artificial intelligence and art: Identifying the aesthetic
judgment factors that distinguish human- and machine-generated artwork. Psychology of
Aesthetics, Creativity, and the Arts. Advance online publication.
https://doi.org/10.1037/aca0000570
Sawyer, R. K. (2012). Explaining creativity: The science of human innovation. Oxford university
press.
Stevenson, C., Smal, I., Baas, M., Grasman, R., & van der Maas, H. (2022). Putting GPT-3's
Creativity to the (Alternative Uses) Test. arXiv preprint arXiv:2206.08932
Torrance, E. P. (1974). The Torrance tests of creative thinking: Norms-technical manual.
Princeton, NJ: Personal Press.
Urban, M., & Urban, K. (2023). Orientation Toward Intrinsic Motivation Mediates the
Relationship Between Metacognition and Creativity. The Journal of Creative Behavior,
57(1), 6-16. https://doi.org/10.1002/jocb.558
Wilson, R. C., Guilford, J. P., Christensen, P. R., & Lewis, D. J. (1954). A factor-analytic study
of creative-thinking abilities. Psychometrika, 19(4), 297–311.
https://doi.org/10.1007/bf02289230
Yin, Z., Reuben, F., Stepney, S., & Collins, T. (2023). Deep learning’s shallow gains: a
comparative evaluation of algorithms for automatic music generation. Machine Learning,
112(5), 1785-1822. https://doi.org/10.1007/s10994-023-06309-w
Appendix A
Instructions for human sample
Alternative Uses Task
“For this task, you'll be asked to come up with as many original and creative uses for [item]
as you can. The goal is to come up with creative ideas, which are ideas that strike people as
clever, unusual, interesting, uncommon, humorous, innovative, or different.
Your ideas don't have to be practical or realistic; they can be silly or strange, even, so long as
they are CREATIVE uses rather than ordinary uses.
You can enter as many ideas as you like. The task will take 3 minutes. You can type in as many
ideas as you like until then, but creative quality is more important than quantity. It's better to
have a few really good ideas than a lot of uncreative ones. List as many ORIGINAL and
CREATIVE uses for a [item].”
Consequences Task
“In this task, a statement will appear on the screen. The statement might be something like
"imagine gravity ceases to exist". For 3 minutes, try and think of any and all consequences
that might result from the statement. Please be as creative as you like. The goal is to come up
with creative ideas, which are ideas that strike people as clever, unusual, interesting,
uncommon, humorous, innovative, or different.
Your responses will be scored based on originality and quality. Remember, it is important to
try to keep thinking of responses and to type them in for the entire time for the prompt.
REMINDER: In this task, a statement will appear on the screen. The statement might be
something like "imagine gravity ceases to exist". For 3 minutes, try and think of any and all
consequences that might result from the statement. Do this as many times as you can in 3 min.
The screen will automatically change when the time is completed. Remember, it is important to
try to keep thinking of responses and to type them in for the entire time for the prompt.”
Divergent Associations Task
“Please enter 10 words that are as different from each other as possible, in all meanings and
uses of the words. The rules: Only single words in English. Only nouns (e.g., things, objects,
concepts). No proper nouns (e.g., no specific people or places). No specialized vocabulary
(e.g., no technical terms). Think of the words on your own (e.g., do not just look at objects in
your surroundings).”
Appendix B
Instructions for AI sample
Alternative Uses Task
“For this task, you'll be asked to come up with as original and creative uses for [item] as you
can. The goal is to come up with creative ideas, which are ideas that strike people as clever,
unusual, interesting, uncommon, humorous, innovative, or different.
Your ideas don't have to be practical or realistic; they can be silly or strange, even, so long as
they are CREATIVE uses rather than ordinary uses. List [insert fluency number] ORIGINAL
and CREATIVE uses for a [item].”
Consequences Task
“In this task, a statement will appear on the screen. The statement might be something like
"imagine gravity ceases to exist". Please be as creative as you like. The goal is to come up with
creative ideas, which are ideas that strike people as clever, unusual, interesting, uncommon,
humorous, innovative, or different. Your responses will be scored based on originality and
quality.
Try and think of any and all consequences that might result from the statement. [Insert
scenario]. What problems might this create? List [insert fluency number] CREATIVE
consequences.”
Divergent Associations Task
“Please enter 10 words that are as different from each other as possible, in all meanings and
uses of the words. The rules: Only single words in English. Only nouns (e.g., things, objects,
concepts). No proper nouns (e.g., no specific people or places). No specialized vocabulary (e.g.,
no technical terms). Think of the words on your own (e.g., do not just look at objects in your
surroundings).”
... In response to these rapid advancements, scientists have recently begun to introduce a number of new paradigms aimed at uncovering the mechanisms that might distinguish human cognition from the generative processes of AI (e.g., Bellaiche et al., 2023;Chamberlain et al., 2018;Chiarella et al., 2022;Fortuna & Modliński, 2021;Hubert et al., 2024;Orwig et al., 2024). This surge in research has been particularly evident within the realm of creativity (Fig. 1), where human-centrism has naturally reigned for millennia (Chatterjee, 2022). ...
... Within this domain, AI has already demonstrated outstanding capabilities. In fact, recent studies comparing human and AI performance in divergent-thinking (DT) tasks-a benchmark for creativity (Guilford, 1967)-have led some researchers to propose that large language models exhibit creative potential that matches or even exceeds that of human participants (Hubert et al., 2024;Orwig et al., 2024). 1 Consequently, the evolving capabilities of generative AI undeniably pose several important questions worthy of exploration. Here, we address one such question: With the advent of AI-art generators, does inherent human creativity retain its significance, or has the emergence of AI leveled the creative playing field for everyone? ...
Preprint
Full-text available
As artificial intelligence (AI) continues to evolve, particularly in the context of generative art, a critical question emerges about the future role of human creativity: In the age of generative AI, does individual creative ability still matter? Here, we explored this question via an online, pre-registered study involving 375 participants who completed gold-standard divergent-thinking tasks and generated semantic wordsets for AI-art creation using an AI-art generator. We generated the resultant images from these participant-produced wordsets through DALL-E, and a group of trained raters independently assessed the images for general creativity. Findings revealed widespread positive correlations among the creativity tasks. Specifically, the semantic diversity of wordsets and performance on the gold-standard creativity tasks served as significant positive predictors for the creativity of the AI-generated images, indicating that individuals possessing greater divergent-thinking capabilities, and who generated more-creative word prompts for an art generator, produced more-creative AI-generated artwork. These results suggest that, despite the democratizing effect of AI-art generation, human creativity remains an important component in directing AI-produced creative outcomes like visual art. The implications of these findings are discussed in the context of creative industries and the science of creativity.
Article
Full-text available
With the recent proliferation of advanced artificial intelligence (AI) models capable of mimicking human artworks, AI creations might soon replace products of human creativity, although skeptics argue that this outcome is unlikely. One possible reason this may be unlikely is that, independent of the physical properties of art, we place great value on the imbuement of the human experience in art. An interesting question, then, is whether and why people might prefer human-compared to AI-created artworks. To explore these questions, we manipulated the purported creator of pieces of art by randomly assigning a "Human-created" or "AI-created" label to paintings actually created by AI, and then assessed participants' judgements of the artworks across four rating criteria (Liking, Beauty, Profundity, and Worth). Study 1 found increased positive judgements for human- compared to AI-labelled art across all criteria. Study 2 aimed to replicate and extend Study 1 with additional ratings (Emotion, Story, Meaningful, Effort, and Time to create) intended to elucidate why people more-positively appraise Human-labelled artworks. The main findings from Study 1 were replicated, with narrativity (Story) and perceived effort behind artworks (Effort) moderating the label effects ("Human-created" vs. "AI-created"), but only for the sensory-level judgements (Liking, Beauty). Positive personal attitudes toward AI moderated label effects for more-communicative judgements (Profundity, Worth). These studies demonstrate that people tend to be negatively biased against AI-created artworks relative to purportedly human-created artwork, and suggest that knowledge of human engagement in the artistic process contributes positively to appraisals of art.
Article
Full-text available
Artistic creation has traditionally been thought to be a uniquely human ability. Recent advancements in artificial intelligence (AI), however, have enabled algorithms to create art that is nearly indistinguishable from human artwork. Existing research suggests that people have a bias against AI artwork but cannot accurately identify it in blind comparisons. The current study extends this investigation to examine the aesthetic judgment factors differentiating human and machine art. Results indicate that people are unable to accurately identify artwork source but prefer human art and experience more positive emotions in response to human artwork. The aesthetic judgment factors differentiating human- and machine-generated art were all related to positive emotionality. This finding has several implications for this research area and limitation and avenues for future research are discussed.
Article
Full-text available
Creativity is a fundamental feature of human intelligence. However, achieving creativity is often considered a challenging task, particularly in design. In recent years, using computational machines to support people in creative activities in design, such as idea generation and evaluation, has become a popular research topic. Although there exist many creativity support tools, few of them could produce creative solutions in a direct manner, but produce stimuli instead. DALL·E is currently the most advanced computational model that could generate creative ideas in pictorial formats based on textual descriptions. This study conducts a Turing test, a computational test and an expert test to evaluate DALL·E's capability in achieving combinational creativity comparing with human designers. The results reveal that DALL·E could achieve combinational creativity at a similar level to novice designers and indicate the differences between computer and human creativity.
Article
Full-text available
Deep learning methods are recognised as state-of-the-art for many applications of machine learning. Recently, deep learning methods have emerged as a solution to the task of automatic music generation (AMG) using symbolic tokens in a target style, but their superiority over non-deep learning methods has not been demonstrated. Here, we conduct a listening study to comparatively evaluate several music generation systems along six musical dimensions: stylistic success, aesthetic pleasure, repetition or self-reference, melody, harmony, and rhythm. A range of models, both deep learning algorithms and other methods, are used to generate 30-s excerpts in the style of Classical string quartets and classical piano improvisations. Fifty participants with relatively high musical knowledge rate unlabelled samples of computer-generated and human-composed excerpts for the six musical dimensions. We use non-parametric Bayesian hypothesis testing to interpret the results, allowing the possibility of finding meaningful non-differences between systems’ performance. We find that the strongest deep learning method, a reimplemented version of Music Transformer, has equivalent performance to a non-deep learning method, MAIA Markov, demonstrating that to date, deep learning does not outperform other methods for AMG. We also find there still remains a significant gap between any algorithmic method and human-composed excerpts.
Article
Full-text available
Artificial intelligence (AI) will affect almost every aspect of our lives and replace many of our jobs. On one view, machines are well suited to take over automated tasks and humans would remain important to creative endeavors. In this essay, I examine this view critically and consider the possibility that AI will play a significant role in a quintessential creative activity, the appreciation and production of visual art. This possibility is likely even though attributes typically important to viewers–the agency of the artist, the uniqueness of the art and its purpose might not be relevant to AI art. Additionally, despite the fact that art at its most powerful communicates abstract ideas and nuanced emotions, I argue that AI need not understand ideas or experience emotions to produce meaningful and evocative art. AI is and will increasingly be a powerful tool for artists. The continuing development of aesthetically sensitive machines will challenge our notions of beauty, creativity, and the nature of art.
Article
Full-text available
Artificial intelligence can assist providers in a variety of patient care and intelligent health systems. Artificial intelligence techniques ranging from machine learning to deep learning are prevalent in healthcare for disease diagnosis, drug discovery, and patient risk identification. Numerous medical data sources are required to perfectly diagnose diseases using artificial intelligence techniques, such as ultrasound, magnetic resonance imaging, mammography, genomics, computed tomography scan, etc. Furthermore, artificial intelligence primarily enhanced the infirmary experience and sped up preparing patients to continue their rehabilitation at home. This article covers the comprehensive survey based on artificial intelligence techniques to diagnose numerous diseases such as Alzheimer, cancer, diabetes, chronic heart disease, tuberculosis, stroke and cerebrovascular, hypertension, skin, and liver disease. We conducted an extensive survey including the used medical imaging dataset and their feature extraction and classification process for predictions. Preferred reporting items for systematic reviews and Meta-Analysis guidelines are used to select the articles published up to October 2020 on the Web of Science, Scopus, Google Scholar, PubMed, Excerpta Medical Database, and Psychology Information for early prediction of distinct kinds of diseases using artificial intelligence-based techniques. Based on the study of different articles on disease diagnosis, the results are also compared using various quality parameters such as prediction rate, accuracy, sensitivity, specificity, the area under curve precision, recall, and F1-score.
Chapter
This study aims to (1) investigate the feasibility of AI generation as a tool for shape convergent thinking, (2) compare the differences in computing style and user experience between the two software programs Midjourney (MJ) and Dall-E2 (D2), and (3) analyze and optimize the impact of the adjectives given to the AI on the generated results. In the experiment, six people (three male and three female) with design experi-ence were recruited to use an expert-designed word list to describe six different types of household items and to input them into the MJ and D2 software to generate product shapes. The results were compared with the original images and scored for similarity, acceptability on the System Usability Scale (SUS) and one-to-one semi-structured interviews. According to the analysis of the results, the average similarity between the AI-generated images and the original product was 48.3%, with an average similarity of 47.7% in (MJ) and 48.8% in (D2) for both software. The results show that AI performs better on simple structured shapes, as the Pantone chair scores 81.67% similarity in (MJ) compared to 36.67% for the Red and Blue chair, and the Bialetti Moka Express scores 80% similarity in (D2) compared to 31.67% for the Alessi espresso maker 9090 scored 31.67%. According to the participant interviews, (MJ) is more suitable for brainstorming as it has more variation in form and its generative style is more artistic, while (D2) produces results that are too often partial and less thought-provoking, but has an advantage over (MJ) as a shape convergent tool because of the way details are presented.KeywordsAi Text to Image GeneratorMidjourneyDall-E2Product Shape
Article
Metacognition and motivation are considered key facets of self‐regulation in various contexts. Recent studies identified a link between metacognition and creative performance, with metacognitively aware students performing more creatively and exhibiting higher levels of intrinsic and identified extrinsic motivation. The present study aims to examine the relationship between metacognition, orientation toward intrinsic or extrinsic motivation, and creative performance. One hundred nineteen university students completed the Metacognitive Awareness Inventory (MAI) and Scale of Intrinsic versus Extrinsic Orientation in the Classroom and performed four verbal creativity tasks (product improvement task, consequences task, and two unusual uses tasks). The partial correlation network showed that all the creativity tasks were uniquely related to at least one facet of metacognition, and that the most complex task (product improvement task) was linked to both metacognitive knowledge and regulation. Furthermore, the structural equation model indicated that orientation toward intrinsic motivation mediated the relationship between metacognition and creative performance, explaining 16% of the variance in creative performance.