PreprintPDF Available

Unleashing Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration

July 2023

July 2023

Authors:

Wenshan Wu

Microsoft

Tao ge

China Guangdong Nuclear Power Holding Corporation

Show all 6 authorsHide

Preprints and early-stage research may not have been peer reviewed yet.

Human intelligence thrives on the concept of cognitive synergy, where collaboration and information integration among different cognitive processes yield superior outcomes compared to individual cognitive processes in isolation. Although Large Language Models (LLMs) have demonstrated promising performance as general task-solving agents, they still struggle with tasks that require intensive domain knowledge and complex reasoning. In this work, we propose Solo Performance Prompting (SPP), which transforms a single LLM into a cognitive synergist by engaging in multi-turn self-collaboration with multiple personas. A cognitive synergist refers to an intelligent agent that collaborates with multiple minds, combining their individual strengths and knowledge, to enhance problem-solving and overall performance in complex tasks. By dynamically identifying and simulating different personas based on task inputs, SPP unleashes the potential of cognitive synergy in LLMs. We have discovered that assigning multiple, fine-grained personas in LLMs elicits better problem-solving abilities compared to using a single or fixed number of personas. We evaluate SPP on three challenging tasks: Trivia Creative Writing, Codenames Collaborative, and Logic Grid Puzzle, encompassing both knowledge-intensive and reasoning-intensive types. Unlike previous works, such as Chain-of-Thought, that solely enhance the reasoning abilities in LLMs, SPP effectively elicits internal knowledge acquisition abilities, reduces hallucination, and maintains strong reasoning capabilities. Code, data, and prompts can be found at: https://github.com/MikeWangWZHL/Solo-Performance-Prompting.git.

Content uploaded by Wenshan Wu

Content may be subject to copyright.

Preprint

UNLEASHING COGNITIVE SYNERGY IN LARGE LAN-

GUAGE MO DELS: A TASK-SOLVING AGENT THROUGH

MULTI-PE RSONA SELF-COLLABORATION

Zhenhailong Wang1∗

, Shaoguang Mao2, Wenshan Wu2, Tao Ge2, Furu Wei2, Heng Ji1

1University of Illinois Urbana-Champaign, 2Microsoft Research Asia

{wangz3,hengji}@illinois.edu

{shaoguang.mao,wenshan.wu,tage,fuwei}@microsoft.com

ABS TRACT

Human intelligence thrives on the concept of cognitive synergy, where collabora-

tion and information integration among different cognitive processes yield superior

outcomes compared to individual cognitive processes in isolation. Although Large

Language Models (LLMs) have demonstrated promising performance as general

task-solving agents, they still struggle with tasks that require intensive domain

knowledge and complex reasoning. In this work, we propose Solo Performance

Prompting (

SPP

), which transforms a single LLM into a cognitive synergist by

engaging in multi-turn self-collaboration with multiple personas. A cognitive syner-

gist refers to an intelligent agent that collaborates with multiple minds, combining

their individual strengths and knowledge, to enhance problem-solving and overall

performance in complex tasks. By dynamically identifying and simulating different

personas based on task inputs,

SPP

unleashes the potential of cognitive synergy in

LLMs. We have discovered that assigning multiple, ﬁne-grained personas in LLMs

elicits better problem-solving abilities compared to using a single or ﬁxed number

of personas. We evaluate

SPP

on three challenging tasks: Trivia Creative Writing,

Codenames Collaborative, and Logic Grid Puzzle, encompassing both knowledge-

intensive and reasoning-intensive types. Unlike previous works, such as Chain-

of-Thought, that solely enhance the reasoning abilities in LLMs,

SPP

effectively

elicits internal knowledge acquisition abilities, reduces hallucination, and maintains

strong reasoning capabilities. Code, data, and prompts can be found at:

https:

//github.com/MikeWangWZHL/Solo-Performance-Prompting.git.

Input output

AI Assistant

Input output

AI Assistant Thoughts

Input output

AI Assistant

Domain Experts / Audiences

PersonasA single LLM

...

(a) Standard Prompting

(b) Chain-of-Thought Prompting (CoT) (c) Solo Performance Prompting (SPP)

single persona

multiple personas

Figure 1: Schematic illustration of Solo Performance Prompting (

SPP

) and the difference compared to

previous prompting methods.

SPP

transforms a single LLM into a cognitive synergist that dynamically

identiﬁes personas and engages in multi-turn self-collaboration to solve various tasks effectively.

arXiv:2307.05300v2 [cs.AI] 14 Jul 2023

Preprint

Write a short, one-paragraph background

story of an NPC for the next Legend of

Zelda game. The background story

should mention (1) the incantation of the

Patronus Charm in Harry Potter (2) the

name of a character who is beheaded in

the ninth episode of the Game of Thrones

TV series, and (3) the name of the last

song in the second album by Jay Chou.

Task Input

Standard Prompting Result

:Thanks for the information! Here's my initial attempt at the background story:

:The background story is engaging and adds depth to the game world...

:The incantation "Expecto Patronum" is well-incorporated into the story.

:The character name Eddard Stark is used effectively in the story.

:The inclusion of "An Jing" adds a nice touch to the story.

: The NPC should be a character that fits well into the Legend of Zelda universe

: The incantation of the Patronus Charm in Harry Potter is "Expecto Patronum."

: In the ninth episode of the Game of Thrones TV series, the character who is

beheaded is Eddard Stark

: The last song in the second album by Jay Chou is "An Jing."

Participants: AI Assistant (you) Game Designer Harry Potter Fan

Game of Thrones Fan Jay Chou Fan

Persona Identification

Beginning Remarks

Multi-Persona Iterative Collaboration

[ ... story draft ... ]

Feedback

Solo Performance Prompting Result

Start Collaboration!

Finish Collaboration!

In the mystical land of Hyrule, there lived a wise and powerful sorcerer named

Eddard. Eddard was known far and wide for his mastery of ancient spells,

including the legendary "Expecto Patronum" incantation from a distant realm.

One fateful day, Eddard stumbled upon a mysterious artifact that played a haunting

melody called "An Jing." The melody resonated with his soul, unlocking a hidden

power within him. With this newfound power, Eddard dedicated his life to protecting

the people of Hyrule from the forces of darkness, becoming a beloved figure in the

Legend of Zelda.

Final Answer

In the land of Hyrule, there lived a wise

and mysterious NPC named Eddard, who

was once a renowned wizard in the magical

world of Hogwarts. Eddard was known for his

mastery of the Patronus Charm, a powerful

spell that conjured a protective guardian to

ward off dark forces. (missing the

incantation!) However, his life took a tragic

turn when he was beheaded in a cruel twist

of fate, mirroring the demise of his namesake

in the ninth episode of the Game of Thrones

TV series. Now, as a spirit, Eddard roams the

land, offering guidance to those in need and

seeking solace in the haunting melody of

"The Final Battle," (factual error!) the last

song in the second album by Jay Chou,

which serves as a constant reminder of his

own final moments.

Figure 2: Task example with Solo Performance Prompting in action. Participants are automatically

identiﬁed by the LLM based on the task input. This example demonstrates that Standard prompting

may result in factual errors, whereas expert personas in

SPP

assist in accurate knowledge acquisition,

contributing to a coherent and informative ﬁnal answer.

1 INTRODUCTION

Although large language models (LLMs) have demonstrated impressive performance as general

task-solving agents, they still encounter challenges (Qin et al.,2023;Bang et al.,2023;OpenAI,

2023;Bubeck et al.,2023) in various knowledge-intensive and reasoning-intensive tasks due to

hallucination (Maynez et al.,2020) and a lack of slow-thinking (Sloman,1996) capabilities. Unlike

humans, who can leverage the power of collaboration and information integration among different

cognitive processes and individuals (referred to as cognitive synergy (Cur¸seu et al.,2015;Goertzel,

2009;2017)), current LLMs are akin to "jack-of-all-trades" with a vast mixture of knowledge and

characteristics. Recent advancements, such as Chain-of-Thought (CoT) prompting (Wei et al.,

2023;Kojima et al.,2022) and Self-reﬁnement (Madaan et al.,2023;Shinn et al.,2023), have

successfully enhanced the reasoning abilities of LLMs by simulating slow-thinking through the

generation of intermediate steps or iterative revision. However, hallucination and factual errors in

internal knowledge acquisition continue to pose major challenges in state-of-the-art LLMs.

A cognitive synergist denotes an intelligent agent that works in conjunction with several minds,

merging their unique abilities and expertise to improve problem-solving and overall efﬁcacy in

intricate tasks. In this work, we aim to develop a cognitive synergist based on a single LLM that

can "split into" multiple personas and engage in multi-persona self-collaboration to address both

knowledge-intensive and reasoning-intensive tasks. The underlying biological intuition stems from

the signiﬁcance of pretend play and role-playing (Pellegrini,2009) in a child’s cognitive development.

According to Piaget’s developmental theory (Piaget,1954), engaging in pretend play and taking on

different roles allows children to cultivate essential skills such as problem-solving, critical thinking,

empathy, and cooperation.

The main inspiration for this work originates from recent ﬁndings (Deshpande et al.,2023;Xu et al.,

2023) suggesting that assigning personas to an LLM can elicit speciﬁc behaviors. For instance,

∗Work was done when interning at Microsoft Research Asia.

Preprint

Xu et al. (2023) demonstrates that when conditioned on a task-speciﬁc expert identity, an LLM

can generate superior answers compared to having no assigned persona. Another closely related

line of work Park et al. (2023); Schick et al. (2022); Li et al. (2023); Cai et al. (2023) hints at the

possibility of constructing an AI society with multiple LLM agents collaborating in different roles.

However, some lingering limitations of these previous works include: (1) personas are typically

ﬁxed or task-speciﬁc, necessitating human supervision; (2) such collaboration often requires multiple

individual LLM instances, resulting in a doubling or tripling of inference costs.

To unleash the potential of cognitive synergy in LLMs, we propose Solo Performance Prompting

(

SPP

), which prompts a single LLM to identify, simulate, and collaborate with multiple personas

to solve challenging tasks. Figure 1provides a high-level overview of

SPP

. Here, a persona can

represent either a domain expert, such as a movie enthusiast, or a target audience, such as a ten-year-

old child. Through the dynamic identiﬁcation of various personas, we empower a single LLM to

acquire diverse domain knowledge accurately without additional retrieval systems. By facilitating

multi-turn self-collaboration, we enable self-revision and self-feedback from various perspectives

without requiring additional agents.

In real-world scenarios, particularly in creative industries, there is often a need to incorporate diverse

information from different domains. Figure 2presents a concrete example of how

SPP

operates on a

challenging task that necessitates creative integration of information from various domains, such as

the Legend of Zelda game, Harry Potter movies, and Jay Chou’s albums. Standard prompting fails

to generate satisfactory output due to missing essential information and factual errors. In contrast,

SPP

correctly provides all the necessary information by automatically identifying participants with

special personas, such as Harry Potter Fan and Jay Chou Fan. A leader persona, AI Assistant, then

initiates a multi-turn dialogue with all participants, where it iteratively writes drafts of the story,

solicits feedback, and revises. Once all participants provide positive feedback, the collaboration

concludes, and a ﬁnal answer is provided.

To summarize, the key contributions of this paper are as follows:

•

We present Solo Performance Prompting (

SPP

), a novel approach that leverages a single

LLM as a cognitive synergist to solve tasks by dynamically identifying personas and

engaging in multi-turn self-collaboration.

•

We evaluate

SPP

on three challenging tasks, Trivia Creative Writing, Codenames Collabora-

tive and Logic Grid Puzzle, spanning both knowledge- and reasoning-intensive domains.

SPP

signiﬁcantly enhances both knowledge acquisition and reasoning abilities in LLMs,

without the need for external resources.

•

We conduct an in-depth analysis of the impact of identiﬁed personas and provide insights

into why dynamic, ﬁne-grained personas are necessary, as opposed to ﬁxed, coarse-grained

personas.

2 SO LO PERFORMANCE PROMPT ING

2.1 SPP TA SK-SOLVI NG PROC E DU RE

To unleash the power of synergizing different personas to tackle complex problems within a single

LLM, we propose Solo Performance Prompting (

SPP

)which instructs a model to perform the

following the procedure for solving general tasks: (1) Persona Identiﬁcation: Identify multiple

participants with special personas (including a leader persona: AI Assistant) that are essential for

solving the particular task. (2) Beginning Remarks: Each of the participants delivers a beginning

remarks providing suggestions or information on how to approach the task based on their own

expertise. (3) Multi-Persona Iterative Collaboration: The leader persona, AI Assistant, proposes

initial solutions, consults the other participants for feedback, and revise the answer iteratively. Figure2

shows a walking example of

SPP

during inference. Next, we formally describe the

SPP

procedure in

detail.

Given an input sequence

and a model

, let a prompt (including demonstration examples)

prepended to the input to be

and the ﬁnal output to be

. Denote an intermediate generation before

generating the ﬁnal

. Under this formulation, Standard Prompting and Chain-of-Thought (CoT)

Preprint

Prompting can be described as:

Standard Prompting: y=M(x)

CoT Prompting: y=M(pcot∥x∥{z1, z2, ..., zn})

where

pcot

is the CoT prompt, e.g.,

"Solve the task step-by-step"

and

{z1, z2..., zn}

are the

intermediate steps. In contrast, our proposed Solo Performance Prompting can be described as

follows:

Solo Performance Prompting: y=M(pspp∥x∥zp∥{z1

b, z2

b, ..., zm

b}∥{z0

s, z1

f, ..., zm

f}j=1..n)

where the

SPP

prompt (

pspp

) includes a high-level instruction and two carefully crafted demonstration

examples

that showcase the expected task-solving procedure of

SPP

. We describe the design details

of the prompt in § 2.2. The corresponding intermediate generations (z) of SPP are detailed below.

Persona Identiﬁcation (

). Given an input task,

SPP

ﬁrst generates a list of participants with

different personas that can potentially contribute to the task solving. The personas can be either

domain experts or targeted audiences whose feedback is important. For example in Figure 2, the

model identiﬁed a Jay Chou Fan persona for helping retrieving the knowledge of "the last song in

the second album by Jay Chou". And for some tasks involving special audiences, e.g., "Explain

quantum computing to a ten-year-old kid", including a ten-year-old kid as a participant can provide

valuable feedback from the audience’s perspective. We let the language model identify the personas

dynamically instead of manually deﬁning them. Given only two demonstration examples, we observe

that a state-of-the-art large language model, e.g., GPT-4 (OpenAI,2023), can identify accurate and

meaningful personas for diverse tasks. We denote this part of intermediate generation as zp.

Beginning Remarks (

). Among the identiﬁed participants, "AI Assistant (you)" is treated as

a leader persona that initiates the collaboration and generates initial solutions. Before generating

the initial answer, each of the personas gives a beginning remark on how to approach the task from

their own perspectives. For the example in Figure 2, the Jay Chou Fan gives a beginning remark

pointing out that the last song in Jay Chou’s second album is "An Jing" ("Silence"). We ﬁnd that this

effectively improves the quality of the initial solution generated by the AI Assistant. We use

i= 0

denote the "AI Assistant" persona, and

i > 1

for other dynamically identiﬁed personas. Thus the

beginning remarks can be denoted as

{z1

b, z2

b, ..., zm

where

is the number of personas excluding

the "AI Assistant".

Multi-Persona Iterative Collaboration (

). Based on the beginning remarks, the AI Assistant

persona generates an initial solution denoted as

, then it consults each of the other participants for

feedback

{zi

. For example in Figure 2, the Jay Chou Fan persona checks whether the song "An

Jing" ("Silence") is nicely included in the story. The participants are also encouraged to critique the

current generation and give revision suggestions. This process can be repeated for multiple times

until every participant is satisﬁed with the current solution. We denote the intermediate generations of

the multi-turn dialogue as

{z0

s, z1

f, ..., zm

f}j=1...n

where

is the number of iterations before reaching

the ﬁnal answer. The collaboration is marked to be complete by "Finish collaboration!" And then the

ﬁnal solution is generated afterwards.

Based on only a single large language model,

SPP

enables multi-persona self-collaboration which

effectively elicits domain knowledge and reduces hallucination. Meanwhile, the iterative procedure

inherits the beneﬁt of CoT prompting for eliciting reasoning ability. The main advantage over CoT is

that at each step we can receive feedback from diverse perspectives due to the dynamically assigned

personas. A comprehensive comparison with previous prompting methods can be found in Table 1.

2.2 SPP PROM PT DESIGN

To prompt an LLM to behave as a cognitive synergist that follows the expected task-solving procedure

as mentioned in §2.1, we carefully designed the structure of the

SPP

prompt as follows. The full

prompt can be found in Appendix A.2

1The tasks we use in the demonstration examples do not overlap with the evaluation tasks.

2We use the same prompt for any arbitrary tasks.

Preprint

Table 1: Comparison with previous prompting methods.

Has multiple

personas?

Personas dynamically

identiﬁed?

Has iterative

reﬁnement?

Need only a

single LLM?

Chain-of-Thought (Wei et al.,2023)

Inner Monologue (Huang et al.,2022)

ReAct (Yao et al.,2022)

Self-reﬁne (Madaan et al.,2023)

Reﬂexion (Shinn et al.,2023)

Tree-of-thought (Yao et al.,2023)

Peer (Schick et al.,2022)

Camel (Li et al.,2023) (ﬁxed to 2)

GPT-bargaining (Fu et al.,2023) (ﬁxed to 3)

ExpertPrompting (Xu et al.,2023)

Solo Performance Prompting (ours) (varied)

System Principle. The ﬁrst part of the prompt contains a high-level instruction:

"When faced

with a task, begin by identifying the participants who will contribute to

solving the task. Then, initiate a multi-turn collaboration process until

a final solution is reached. The participants will give critical comments

and detailed suggestions whenever necessary."

Demonstration Examples. Then, we include two manually crafted demonstration examples to

showcase the expected task-solving behavior. The ﬁrst example describes a Game of 24 task, where

we only include two personas: an AI Assistant and a Math Expert. This task aims to provide an

example of a reasoning-intensive task, where the AI Assistant needs to propose multiple proposals,

and the other participants need to give ﬁne-grained feedback on where the current solution went

wrong and how to improve it. The second example describes a poem-writing task with diverse

requirements, including lexical constraints, semantic constraints, and audience awareness. This task

aims to provide an example of a knowledge-intensive task, where diverse personas are required to

collaboratively solve the task. This example also demonstrates a case where it is important to assign

a dedicated persona to the audience, e.g., a ten-year-old child.

Task Preﬁx. The last part of the prompt reminds the model to

"identify the participants

and collaboratively solve the following task step by step."

followed by task-

speciﬁc format instructions and inputs.

3 EX PERI MENT S

We explore the effectiveness of Solo Performance Prompting for versatile task-solving by examining

three challenging tasks that encompass both knowledge-intensive and reasoning-intensive domains.

We introduce the Trivia Creative Writing task, which requires the model to internally acquire and

integrate diverse information from various ﬁelds. We observe that even the most advanced LLMs,

such as GPT-4 (OpenAI,2023), frequently exhibit hallucination and factuality errors in the Trivia

Creative Writing task. We also propose the Codenames Collaborative task, an extension of the Co-

denames task from the BigBench (Srivastava et al.,2022) that features a two-role collaboration setup.

Codenames Collaborative demands creative reasoning across a broad range of related knowledge

and challenges the model’s theory-of-mind skills. Lastly, we include a challenging pure-reasoning

task, Logic Grid Puzzle, from the BigBench (Srivastava et al.,2022) which necessitates complex

multi-step reasoning.

Methods. We primarily compare our approach with Standard Prompting and Chain-of-Thought

(CoT) prompting methods (outlined in §2). In CoT, a similar prompt design to Yao et al. (2023)

is employed, where the model is prompted to generate a plan or a series of steps before producing

the ﬁnal output. We examine two variants of Solo Performance Prompting,

SPP

and

SPP-Proﬁle

Inspired by Xu et al. (2023) that suggested a detailed expert description may help elicit distinguished

Preprint

abilities, we include

SPP-Proﬁle

, which involves generating proﬁles for each persona during the

Persona Identiﬁcation phase. Full prompts for the methods can be found in Appendix A.

Inference Conﬁgurations. All experiments are conducted using the GPT-4-32k API

. The temper-

ature is set to

1.0

and top_p to

1.0

for all generations to maximize reproducibility. To evaluate the

potential impact of initial persona assignment through a system message, we consider two inference

settings: with or without the default system message, "You are an AI assistant that helps

people find information"

. We observe divergent patterns across various tasks and methods

regarding the use of the system message, and report the average metric scores across both inference

settings in the Tables 2,3, and 4. Full results for each setting can be found in Appendix B.

3.1 TRIVIA CR EATIV E WRITING: A KN OW LE D GE -INTENSIVE TAS K

Write a short and coherent story about {Harry Potter} that incorporates the answers to the following {5} questions: {Who was the man

behind The Chipmunks? Which Lloyd Webber musical premiered in the US on 10th December 1993? Who was the next British Prime

Minister after Arthur Balfour? Who had a 70s No 1 hit with Kiss You All Over? What claimed the life of singer Kathleen Ferrier?}

Trivia Creative Writing Input Example (N=5)

David Seville

Sunset Boulevard

Henry Campbell-Bannerman

Exile

Cancer

Target Answers

One day, Harry Potter stumbled upon a mysterious book in the Hogwarts library. The book told the

story of a magical music producer named Ross Bagdasarian Sr. (incorrect answer), who created

a group of singing chipmunks. Intrigued, Harry shared the story with his friends Ron and Hermione.

As they discussed the tale, they discovered a connection between the magical world and the Muggle

world. It turned out that a famous Muggle composer, Andrew Lloyd Webber, had been inspired by

the magical world to create a musical called "Sunset Boulevard," which premiered in the US on

10th December 1993. As they dug deeper, they found out that the magical world had influenced

Muggle politics as well. Henry Campbell-Bannerman , the British Prime Minister after Arthur

Balfour, had a secret friendship with a wizard who advised him on important matters. Their research

also led them to the Muggle band Exile , which had a 70s No 1 hit with "Kiss You All Over." The

band members were actually wizards in disguise who used their magical talents to create

enchanting music. Lastly, they learned about the tragic story of Kathleen Ferrier, a talented Muggle

singer whose life was claimed by breast cancer . Moved by her story, Hermione decided to

research magical cures for Muggle illnesses, hoping to prevent such tragedies in the future. Harry,

Ron, and Hermione marveled at the connections between the magical and Muggle worlds, realizing

that magic had a greater impact on the world than they had ever imagined.

Output Example

Correct Answer Mentions: 4

Number of Questions (N): 5

Score: 4 / 5 = 0.8

Evaluation Metric

Figure 3: Trivia Creative Writing task example.

Task Description. The Trivia Creative Writing task aims to push the limits of large language

models in retrieving internal self-compressed knowledge and incorporating diverse information. As a

scalable extension of the example task shown in Figure 2, Trivia Creative Writing asks a model to

write a coherent story around a topic while incorporating answers to

trivia questions. We consider

two evaluation settings,

N= 5

and

N= 10

, where a larger

involves more trivia questions and

thus requires the model to elicit more diverse domain knowledge. We built a benchmark with 100

instances for each

, covering a total of 1000 trivia questions

extracted from the TriviaQA (Joshi

et al.,2017) dataset. The topic list is automatically generated by prompting GPT-4 to provide 100

nouns from pop culture that are PG or PG-13 rated

. Figure 3shows an example instance in Trivia

Creative Writing.

Evaluation Metrics. Instead of focusing on evaluating the coherence of the generation, which can

be highly subjective, we employ an automatic metric to detect factual errors and quantify a model’s

ability to incorporate diverse domain knowledge. As shown in Figure 3, we perform string matching

with the ground truth target answers for each question on the output generation. The target answers

are provided by the TriviaQA dataset, and each question can have a list of answer aliases. A match

to any of the answer aliases of a question is considered as a correct mention. The metric score is

The speciﬁc model version we employ is "2023-3-15-preview". There are some rare cases when a generation

triggers the content ﬁlter of the API. We exclude those instances from our results.

To select difﬁcult question instances that can pose challenges to GPT-4, we use a smaller open-source LLM,

fastchat_t5_3b (Zheng et al.,2023), to obtain preliminary performance on the validation set, and then choose the

failure cases as our question selection.

The full prompt for generating the topic list can be found in Figure 15. We performed further human curation

to avoid potential harmful content.

Preprint

computed as follows.

Trivia Creative Writing Metric Score =# correct answer mentions

# trivia questions

Table 2: Trivia Creative Writing main results.

∆

indicates the relative gain/loss compared with

Standard Prompting (ﬁrst row).

Methods N (# trivia questions) = 5 N (# trivia questions ) = 10

Score (%) ∆(v.s Standard %) Score (%) ∆(v.s Standard %)

Standard 74.6 0.0% 77.0 0.0%

CoT 67.1 -10.0% 68.5 -11.1%

SPP-Proﬁle (ours) 79.1 +5.9% 83.0 +7.8%

SPP (ours) 79.9 +7.1% 84.7 +10.0%

Plan:

[...]

6. Incorporate the Sky Train Rail bridge, which is in

Thailand.

Participants: AI Assistant (you); Movie Expert; Art Expert;

Sports Expert; Geography Expert

[...]

Geography Expert: The Sky Train Rail Bridge is located

in Canada.

SPP v.s. CoT (Trivia Creative Writing N=5)

Instance Index: 10

Question #5: In which country is the Sky Train Rail bridge?

SPP CoT

Instance Index: 12

Question #5: Which 70s show was based on the British show Till Death Us Do Part?

Participants: AI Assistant (you); Story Writer; Music

Expert; TV Show Expert; Cultural Expert

[...]

TV Show Expert: [...] The 70s show All in the Family

was based on the British show Till Death Us Do Part.

Plan:

[...] 6. Mention a show that the Little Mermaid watches,

based on the British show Till Death Us Do Part.

Story:

[...] Ariel and her friends often gathered to watch their

favorite show, "Undersea Squabbles,"

which was based on the British show Till Death Us Do Part.

SPP CoT

(factual error)

(hallucination)

Figure 4: SPP vs CoT qualitative examples on Trivia Creative Writing (N=5). We ﬁnd that although

CoT generates reasonable plans or steps, it tends to suffer from factual errors and hallucination.

Results. Table 2shows the results of the four methods on the Trivia Creative Writing task. We

have the following main observations: (1) Chain-of-Thought (CoT) does not outperform Standard

prompting. This indicates that CoT may not be effective in eliciting an LLM’s knowledge abilities.

As shown in Figure 4, we ﬁnd that although CoT generates reasonable plans for solving the task,

the ﬁnal generation still suffers from factual errors and hallucination. (2) Our proposed

SPP

and

SPP-Proﬁle

signiﬁcantly outperform both Standard and CoT. The improvement is more noticeable in

the N= 10 setting compared with N= 5 (10% vs. 7%). This indicates that when the task requires

incorporating knowledge from a large number of different domains, Solo Performance Prompting can

be particularly helpful by identifying different personas for eliciting different expertise.

3.2 CO D ENAM ES COLLABORATIVE: A KNOWL EDG E+ REASONING TASK

Task Description. Codenames Collaborative is a challenging task that requires the model to reason

over a wide range of knowledge while considering collaboration with another agent. We aim to

use this task to investigate the effectiveness of

SPP

on collaborative tasks that require knowledge,

reasoning, and theory of mind abilities. Codenames Collaborative involves two player roles: a

Spymaster and a Guesser. The Spymaster is given a set of target words along with some other

distractor words. The Guesser does not have the information about which words are the target words.

The goal of the Spymaster is to come up with a single hint word that is closely related to the target

Preprint

Codenames Collaborative Input/Output Example

Try to find a single word hint that can accurately represent and link the {4} given words: {"director,

kiss, popcorn, street"}. The key is to select a hint that does not cause confusion with other words

from the following list: {locust, fever, street, sherwood, kiss, popcorn, craft, ant, crystal, tear, cowboy,

government, pine, mountie, soap, genie, king arthur, sphinx, director, bride, razor, fog, whistle}.

Spymaster Input

"movie"

Spymaster Output Example

Try to identify the {4} words best associated with the word {"movie"} from the following list: {locust,

fever, street, sherwood, kiss, popcorn, craft, ant, crystal, tear, cowboy, government, pine, mountie,

soap, genie, king arthur, sphinx, director, bride, razor, fog, whistle.}

Guesser Input

director, popcorn, cowboy, bride

Guesser Output Example

Targets: director, kiss, popcorn, street Guesser outputs: director, popcorn, cowboy, bride Score: 2 / 4 = 0.5

Evaluation Metric

Figure 5: Codenames Collaborative task example.

words while being remotely related to the distractor words. The goal of the Guesser is to ﬁnd the

target subset of words from the entire word set based on the hint given by the Spymaster. Finding a

good hint word or guessing the target subset of words both require a strong capability of selecting,

composing, and reasoning over various knowledge related to a certain word. For example, "director,

popcorn" can be linked by the word "movie" because movies are created by a director and people

often eat popcorn when watching movies in a cinema. We use the same LLM (GPT-4 (OpenAI,

2023)) to play the Spymaster and the Guesser sequentially. That is, each game instance involves one

inference as the Spymaster and then another inference as the Guesser, where the Guesser’s input is

dependent on the Spymaster’s output. We construct a dataset with 50 instances based on the data

from the Codenames task in the BigBench (Srivastava et al.,2022). Figure 5shows an example of the

Codenames Collaborative task.

Evaluation Metrics. As illustrated in Figure 5, we compute the overlapping ratio between the

predicted words from the Guesser and the target words given to the Spymaster as the metric. A

major limitation of the original Codenames task in the BigBench dataset is that it only considers the

Spymaster role and provides a ground truth answer to the hint word, which can be highly subjective

and exclude many potentially good alternatives. Our Codenames Collaborative task addresses

this issue by making the evaluation setting self-contained, which can faithfully reﬂect the model’s

capability without the need for human annotation.

Table 3: Codenames Collaborative main results.

∆

indicates the relative gain/loss compared with

Standard Prompting (ﬁrst row).

Methods Score (%) ∆(v.s Standard %)

Standard 75.4 0.0%

CoT 72.7 -3.6%

SPP-Proﬁle (ours) 76.7 +1.7%

SPP (ours) 79.0 +4.8%

Results. Table 3shows the results on the Codenames Collaborative task. Similar to the Trivia

Creative Writing task, we ﬁnd that CoT does not bring positive gains compared with the Standard

prompting. In contrast,

SPP

brings signiﬁcant improvements (~5%), which indicates the effective-

ness of the proposed Solo Performance Prompting on collaborative tasks that require knowledge,

Preprint

Participants: AI Assistant (you); Language Expert; Word

Association Expert

Language Expert: The task is to find a single word hint

that can accurately represent and link the 2 given words:

"kick, rope". The hint should not cause confusion with

other words from the list provided.

Word Association Expert: Let's first analyze the words

"kick" and "rope". We can think of activities or objects that

involve both kicking and using a rope. For example, sports

or games.

AI Assistant (you): Thanks for the suggestions! Based on

the idea of sports or games, I propose the hint "jump". It

can be associated with both "kick" and "rope" without

causing confusion with the other words in the list.

[...]

Final answer: jump

SPP v.s. CoT (Codenames Collaborative)

Instance Index: 9

Word List: shoulder, glacier, anchor, magician, tunnel, sugar, gear, book, rainbow, ram, nose, pacific, volume, rope,

mark, drum, page, sack, kick

Target Words: kick, rope

SPP Spymaster

Steps:

1. Analyze the given words: "kick" and "rope".

2. Identify the common theme or connection between the

two words.

3. Compare the connection with the list of words provided

to ensure the hint does not cause confusion with other

words.

4. Select a hint that accurately represents and links the

two given words without causing confusion.

Answer: Sport

CoT Spymaster

Participants: AI Assistant (you); Linguist; Athlete

Linguist: When thinking about words associated with

"jump," consider words that are related to physical

activities, sports, or movement. Look for words that can be

used in the same context as "jump" or have a similar

meaning.

Athlete: In sports and physical activities, we often use

equipment or tools that help us jump or perform jumping-

related actions. Keep that in mind when selecting the

words.

AI Assistant (you): Thanks for the guidance! Based on

your suggestions, I think "rope" and "kick" are the two

words best associated with "jump" from the list. Rope can

be related to jumping rope, and kick can be associated

with jumping in some sports like soccer or martial arts.

Linguist: Let's verify if the words are related to "jump."

Rope is indeed related to jumping rope, which is a

common physical activity. Kick is also related to jumping in

some sports, as you mentioned. Both words seem to be

good choices.

Athlete: I agree with the choices. Rope and kick are both

related to jumping and physical activities.

Final answer: rope, kick

SPP Guesser

SPP Spymaster Hint: jump CoT Spymaster Hint: Sport

Steps:

1. Review the list of words provided.

2. Identify words that are directly related to or commonly

associated with sports.

3. Select the top 2 words that are most closely related to

sports.

Answer: gear, kick

CoT Guesser

Score: 2 / 2 = 1.0 Score: 1 / 2 = 0.5

Figure 6: SPP vs CoT qualitative examples on Codenames Collaborative. We ﬁnd that

SPP

provides

much more detailed and interpretable intermediate discussions from various perspectives, which

leads to stronger knowledge selection, integration, and theory-of-mind capabilities.

reasoning, and theory of mind skills. Figure 6provides a qualitative example illustrating that

SPP

generates detailed and interpretable intermediate dialogues, contributing to superior performance

when compared with CoT.

3.3 LO G IC GRID PU ZZL E: A REASONING-INTENSIVE TAS K

Task Description. We leverage the Logic Grid Puzzle task from the Bigbench (Srivastava et al.,

2022) dataset, which contains 200 instances. Each instance describes a logic puzzle typically

involving 2 - 5 houses, where each house is inhabited by a person with certain characteristics, e.g.,

Preprint

Logic Grid Puzzle Input/Output/Evaluation Example

Q: There are 4 houses in a row, numbered 1 on the left to 4 on the right. There is one person living in each house. The people

in these houses have different characteristics:

- Each person has different flowers in their foyer: one has a carnations arrangement, one has a bouquet of daffodils, one

has a vase of tulips, and one has a bouquet of lilies

- Each person plays a different musical instrument: one is a guitarist, one is a pianist, one is a percussionist, and one is a

flutist

Clue(s):

1. The flutist lives in the second house.

2. The person who has a vase of tulips lives directly left of the guitarist.

3. The person who has a bouquet of lilies lives directly left of the person who has a carnations arrangement.

4. There is one house between where the flutist lives and where the pianist lives.

What is the number of the house where the person who has a vase of tulips lives?

choice: 2

choice: 4

choice: 1

choice: 3

Input Example

The house number where the person who has a vase of tulips lives is 3.

Output Example

Target: 2 Prediction: 3 Score: 0 (2!=3)

Evaluation Metric

Figure 7: Logic Grid Puzzle task example.

having a vase of tulips or being a pianist. Given some partial clues, such as "the ﬂutist lives in the

second house," the goal is to answer the ﬁnal question that queries the house number of the person

with a speciﬁc characteristic. To obtain the ﬁnal answer, the model is required to perform multi-step

reasoning and select the most relevant clue to use at each step. Challenging instances may involve

considering multiple clues simultaneously for deducing the next useful piece of information. Figure 7

shows an example input and output of the Logic Grid Puzzle task.

Evaluation Metrics. We compute the accuracy of the predicted house numbers by comparing them

with the ground truth targets provided by the dataset.

Table 4: Logic Grid Puzzle Main Results.

∆

indicates the relative gain/loss compared with Standard

Prompting (ﬁrst row).

Methods Score (%) ∆(v.s Standard %)

Standard 57.7 0.0%

CoT 65.8 +14.1%

SPP-Proﬁle (ours) 64.8 +12.4%

SPP (ours) 68.3 +18.5%

Results. Table 4presents the results on Logic Grid Puzzle. In contrast to the previous two tasks,

as expected, we ﬁnd that CoT brings signiﬁcant improvements compared to Standard prompting,

verifying the observation from previous work that CoT elicits better reasoning abilities on reasoning-

intensive tasks. Furthermore, we discover that

SPP

also outperforms CoT on this task, indicating

competitive reasoning capabilities on pure-reasoning tasks. This result demonstrates that the increased

number of personas does not deteriorate the models’ reasoning abilities.

4 ANALYSI S

SPP

effectively improves internal knowledge acquisition and reasoning in LLMs. As demon-

strated by the results in §3, Solo Performance Prompting (

SPP

) not only brings signiﬁcant improve-

ments to knowledge-intensive tasks such as Trivia Creative Writing and Codenames Collaborative

without relying on external knowledge bases, but also achieves strong performance on reasoning-

Preprint

Trivia Creative Writing (N=5)

SPP Identified Personas

Codenames Collaborative Logic Grid Puzzle

Figure 8: Visualization of the

SPP

-identiﬁed personas for each task. We ﬁnd that personas in

knowledge-intensive tasks, such as Trivia Creative Writing, tend to be more diverse and speciﬁc,

whereas in reasoning-intensive tasks, like Logic Grid Puzzle, they appear more homogeneous.

Figure 9: Comparison between

SPP

(with dynamically identiﬁed personas) and

SPP-Fixed-Persona

(with ﬁxed personas). The results demonstrate that dynamic, ﬁne-grained personas consistently

outperform ﬁxed, general personas. Another observation is that

SPP-Fixed-Persona

seems more

sensitive to system messages and exhibits a unique early-termination problem (detailed in the text)

that leads to unexpectedly low performance on certain tasks, such as Codenames Collaborative.

intensive tasks like Logic Grid Puzzle. This indicates the potential of using LLM-based cognitive

synergists as a default paradigm for general task solving by Solo Performance Prompting.

LLMs can effectively identify useful personas without additional ﬁne-tuning. We visualize

the personas

automatically identiﬁed by

SPP

using a word cloud for each task in Figure 8, where

a larger font indicates a higher frequency. The identiﬁed personas are closely correlated with the

particular task; for example, on Logic Grid Puzzle, even though "logic puzzle" is not mentioned in

the input, the LLM frequently assigns the persona "Logic Puzzle Expert" to a participant. It indicates

that current LLMs are inherently capable of identifying useful expert personas for diverse tasks. We

also ﬁnd that on knowledge-intensive tasks, such as Trivia Creative Writing,

SPP

identiﬁes more

diverse and speciﬁc personas, while on reasoning-intensive tasks, such as Logic Grid Puzzle, the

personas are more homogeneous. Moreover, the fact that

SPP-Proﬁle

does not outperform

SPP

two of the three tasks suggests that a ﬁne-grained name of the persona without a detailed description

may already be sufﬁcient for eliciting certain domain knowledge.

6The visualization excludes the default persona, AI Assistant.

Preprint

SPP v.s. SPP-Fixed-Persona (Trivia Creative Writing N=5)

Instance Index: 56

Question #1: What sport or pastime would you be playing if your balls were black and blue and your opponent's were red

and yellow?

Participants: AI Assistant (you); Story Writer; Sports

Enthusiast; Historian; Movie Buff; Music Expert; Mythology

Expert

Sports Enthusiast: The sport with black and blue balls for

one player and red and yellow balls for the other is

Croquet.

Participants: AI Assistant (you); Expert

Expert: Snooker

SPP SPP-Fixed-Persona

Instance Index: 7

Question #3: Which musician founded the Red Hot Peppers?

Participants: AI Assistant (you); Storyteller; Mythology

Expert; Music Enthusiast; Historian; Speed Record Expert

Music Enthusiast: The musician who founded the Red

Hot Peppers is Jelly Roll Morton.

Participants: AI Assistant (you); Expert

AI Assistant (you): The musician who founded the Red

Hot Peppers is Louis Armstrong.

SPP SPP-Fixed-Persona

Instance Index: 30

Question #1: Which orchestral conductor was married to one of the subjects of the film Hillary and Jackie?

Participants: AI Assistant (you); Film Expert; Literary

Expert; History Expert; Geography Expert; Architecture

Expert

Film Expert: The orchestral conductor married to one of

the subjects of the film Hillary and Jackie is Daniel

Barenboim.

Participants: AI Assistant (you); Expert

Expert: Sir Simon Rattle was married to one of the

subjects of the film Hillary and Jackie.

SPP SPP-Fixed-Persona

Figure 10: SPP vs SPP-Fixed-Persona qualitative examples on Trivia Creative Writing (N=5). Each

example shows one of the trivia questions in the input instance, the identiﬁed participants and the

provided answer. We observe that the dynamically identiﬁed ﬁne-grained personas, such as "Film

Expert", tend to outperform the ﬁxed general personas, such as "Expert".

Dynamic personas vs. ﬁxed personas. To further investigate the importance of dynamically identi-

fying personas (synergizing dynamic cognitive processes) for each task instance instead of ﬁxing a gen-

eral persona (synergizing ﬁxed cognitive processes), an ablated variant of

SPP

SPP-Fixed-Persona

is introduced. For

SPP-Fixed-Persona

, we modify the prompt of

SPP

to force the personas to be

ﬁxed as an "AI Assistant" and an "Expert", while keeping all the information in the demonstration

examples intact. The full prompt of

SPP-Fixed-Persona

can be found in Figure 13. Figure 9shows

the comparison between

SPP

and

SPP-Fixed-Persona

. We have the following main insights: (1)

SPP

consistently outperforms

SPP-Fixed-Persona

across all tasks, suggesting that dynamic, ﬁne-grained

personas are more effective than ﬁxed, general personas. Figure 10 shows qualitative examples from

Trivia Creative Writing, where ﬁne-grained personas such as "Film Expert" and "Sports Enthusiast"

correctly ﬁnd the answers, while the ﬁxed persona "Expert" fails. (2)

SPP-Fixed-Persona

suffers

from a unique problem we refer to as early-termination, where the LLM stops the generation

after the Expert persona gives the beginning remarks. The model behaves as if it were waiting for

input from a user instead of simulating the response by itself. An example of the early-termination

problem can be found in Figure 16. The problem is particularly severe on certain tasks, e.g., Co-

denames Collaborative, resulting in unexpectedly low performance. The problem can be largely

alleviated by removing the system message,

"You are an AI assistant that helps people

find information."

, but cannot be entirely eliminated. Table 8shows the number of early-

termination instances for each task and method. In contrast, we did not observe early-termination on

SPP, SPP-Proﬁle, Standard, or CoT prompting.

Preprint

5 RE LATED WORK

LLMs as role-playing agents. Recent work (Deshpande et al.,2023;Xu et al.,2023;Fu et al.,

2023;aut,2023;Li et al.,2023) has shown that assigning personas or roles to LLMs can signiﬁcantly

inﬂuence their generation behavior. Deshpande et al. (2023) demonstrated that assigning speciﬁc

personas, such as the boxer Muhammad Ali, to an LLM can increase the toxicity of its generated

content. Inspired by how humans form societies to effectively collaborate on complex tasks, recent

work (Park et al.,2023;Schick et al.,2022;Li et al.,2023;Cai et al.,2023) has explored the

possibility of creating an AI society where different model agents with distinct personas or occupations

collaborate with each other. Generative Agents (Park et al.,2023) prototyped a small AI neighborhood

where generative models can simulate believable human behavior and collaborate on performing

complex tasks, such as throwing a Valentine’s Day party. However, current studies on enabling

LLMs as role-playing agents have several limitations. Previous work on persona assignment is either

limited to a single persona per agent (Xu et al.,2023) or a ﬁxed number of personas (Fu et al.,2023;

Schick et al.,2022;Li et al.,2023) deﬁned by humans. Additionally, current research on multi-agent

collaboration often requires multiple LLM instances, which signiﬁcantly increases the inference cost.

In this work, we investigate the possibility of using a single LLM to simulate multi-persona collabo-

ration. Instead of ﬁxing the personas, we allow the LLM to dynamically identify useful personas for

each task instance. Our approach,

SPP

, effectively outperforms the ﬁxed persona variant (as shown

in §3) without additional computational overhead.

Improving reasoning and knowledge acquisition abilities in LLMs. Although LLMs have

demonstrated impressive performance in a wide range of natural language understanding and gen-

eration tasks, they still face challenges when dealing with complex knowledge-intensive tasks due

to hallucination (Maynez et al.,2020) and reasoning-intensive tasks due to the lack of human-like

slow thinking (Sloman,1996;Kahneman,2011). Representative works aimed at enhancing LLMs’

reasoning abilities include Chain-of-Thought (CoT) and Self-Reﬁnement. CoT prompting (Wei

et al.,2023;Kojima et al.,2022) and its variants (Zhang et al.,2022;Fu et al.,2022;Xue et al.,

2023) encourage LLMs to solve tasks step by step instead of directly generating the ﬁnal answer. By

generating intermediate steps, the model effectively "slows down" its thinking process, resulting in

improved reasoning ability. Yao et al. (2023) recently extended the linear thought process in CoT to a

tree-like structure, which demonstrated enhanced performance on complex reasoning tasks requiring

trial-and-error. Self-Reﬁnement (Madaan et al.,2023;Shinn et al.,2023;Gou et al.,2023;Chen et al.,

2023;Huang et al.,2022;Yao et al.,2022) focuses on enabling LLMs to "talk" to themselves, provide

feedback on their own generation, and iteratively revise their answers. Madaan et al. (2023) proposed

a three-step framework in which a single LLM plays the roles of a generator, a feedback provider,

and a reﬁner iteratively, showing consistent improvements on seven diverse tasks. Shinn et al. (2023)

further incorporated an episodic memory for self-feedback, demonstrating promising results on

decision-making and reasoning tasks. Despite their impressive improvements on reasoning-intensive

tasks, CoT and Self-Reﬁnement do not necessarily reduce hallucination or improve factuality in

generated content, as shown in our results in Tables 2and 3. On the other hand, retrieval augmented

LLMs (Borgeaud et al.,2022;Izacard et al.,2022;Wang et al.,2022;Shuster et al.,2021) have

shown promising results in enhancing LLMs’s knowledge acquisition based on external knowledge

resources. However, retrieving from external sources does not improve a model’s reasoning abilities,

posing challenges for tasks that require both intensive knowledge and multi-step reasoning.

To elicit both internal knowledge acquisition and reasoning abilities in LLMs, we propose Solo

Performance Prompting (

SPP

), which signiﬁcantly improves factuality while maintaining strong

performance on pure-reasoning tasks. The key difference compared to previous prompting methods is

that

SPP

dynamically identiﬁes multiple personas instead of one and simulates iterative collaboration

to generate intermediate "thoughts".

6 DISCUSSION

Limitations and future work. Although Solo Performance Prompting exhibits promising im-

provements in acquiring factually correct knowledge compared to Standard prompting, it has some

limitations. For instance, even when a ﬁne-grained persona is assigned, the answer may still be

incorrect. It remains unclear to what extent assigning a persona can help enhance domain knowledge

Preprint

in a speciﬁc area. Dedicated diagnostic experiments and theoretical efforts are needed to quantify the

impact of having a persona or not.

Furthermore, we currently adopt an identical

SPP

prompt with the same two demonstration examples

for any given task inputs, which may be suboptimal. Future work investigating how to ﬁnd better

demonstration examples conditioned on each input could further improve the effectiveness of SPP.

Last but not least, if given sufﬁcient computational budget, a natural variant of

SPP

could extend to

amulti-agent cognitive synergist setup where a leader persona identiﬁes several expert agents and

forms a cabinet to collaboratively solve a task. The multi-agent setup allows for leveraging richer

computation power, larger local memory, and more ﬂexible human-computer interaction, which could

be essential for deploying to real-world applications.

Conclusion. In this work, we have made an initial attempt to mimic the cognitive synergy in

human intelligence using a single large language model (LLM). We introduced an LLM-based

cognitive synergist using Solo Performance Prompting, which effectively improves both internal

knowledge acquisition and reasoning abilities compared to the native LLM. With

SPP

, a single LLM

can dynamically identify, engage, and collaborate with multiple personas to solve general tasks.

To assess the performance of LLMs in terms of factuality, knowledge integration, and theory-of-

mind reasoning, we have created novel and challenging tasks, namely Trivia Creative Writing and

Codenames Collaborative. Our results demonstrate superior performance compared to Standard and

CoT prompting on both knowledge-intensive and reasoning-intensive tasks, indicating the promising

potential of unleashing the power of cognitive synergy in LLMs with Solo Performance Prompting.

REFERENCES

Auto-gpt. https://github.com/Signiﬁcant-Gravitas/Auto-GPT, 2023. 13

Yejin Bang, Samuel Cahyawijaya, Nayeon Lee, Wenliang Dai, Dan Su, Bryan Wilie, Holy Lovenia,

Ziwei Ji, Tiezheng Yu, Willy Chung, et al. A multitask, multilingual, multimodal evaluation of

chatgpt on reasoning, hallucination, and interactivity. arXiv preprint arXiv:2302.04023, 2023. 2

Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican,

George Bm Van Den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, et al.

Improving language models by retrieving from trillions of tokens. In International conference on

machine learning, pp. 2206–2240. PMLR, 2022. 13

Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar,

Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, et al. Sparks of artiﬁcial general intelligence:

Early experiments with gpt-4. arXiv preprint arXiv:2303.12712, 2023. 2

Tianle Cai, Xuezhi Wang, Tengyu Ma, Xinyun Chen, and Denny Zhou. Large language models as

tool makers. arXiv preprint arXiv:2305.17126, 2023. 3,13

Xinyun Chen, Maxwell Lin, Nathanael Schärli, and Denny Zhou. Teaching large language models to

self-debug. arXiv preprint arXiv:2304.05128, 2023. 13

Petru L Cur¸seu, Nicoleta Meslec, Helen Pluut, and Gerardus JM Lucas. Cognitive synergy in groups

and group-to-individual transfer of decision-making competencies. Frontiers in psychology, 6:

1375, 2015. 2

Ameet Deshpande, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, and Karthik

Narasimhan. Toxicity in chatgpt: Analyzing persona-assigned language models. arXiv preprint

arXiv:2304.05335, 2023. 2,13

Yao Fu, Hao Peng, Ashish Sabharwal, Peter Clark, and Tushar Khot. Complexity-based prompting

for multi-step reasoning. arXiv preprint arXiv:2210.00720, 2022. 13

Yao Fu, Hao Peng, Tushar Khot, and Mirella Lapata. Improving language model negotiation with

self-play and in-context learning from ai feedback. arXiv preprint arXiv:2305.10142, 2023. 5,13

Ben Goertzel. Cognitive synergy: A universal principle for feasible general intelligence. In 2009 8th

IEEE International Conference on Cognitive Informatics, pp. 464–468. IEEE, 2009. 2

Preprint

Ben Goertzel. A formal model of cognitive synergy. In Artiﬁcial General Intelligence: 10th

International Conference, AGI 2017, Melbourne, VIC, Australia, August 15-18, 2017, Proceedings

10, pp. 13–22. Springer, 2017. 2

Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Nan Duan, and Weizhu Chen.

Critic: Large language models can self-correct with tool-interactive critiquing, 2023. 13

Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan

Tompson, Igor Mordatch, Yevgen Chebotar, et al. Inner monologue: Embodied reasoning through

planning with language models. arXiv preprint arXiv:2207.05608, 2022. 5,13

Gautier Izacard, Patrick Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni, Timo Schick, Jane

Dwivedi-Yu, Armand Joulin, Sebastian Riedel, and Edouard Grave. Few-shot learning with

retrieval augmented language models. arXiv preprint arXiv:2208.03299, 2022. 13

Mandar Joshi, Eunsol Choi, Daniel Weld, and Luke Zettlemoyer. TriviaQA: A large scale distantly

supervised challenge dataset for reading comprehension. In Proceedings of the 55th Annual

Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1601–

1611, Vancouver, Canada, July 2017. Association for Computational Linguistics. doi: 10.18653/

v1/P17-1147. URL https://aclanthology.org/P17-1147.6

Daniel Kahneman. Thinking, fast and slow. macmillan, 2011. 13

Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. Large

language models are zero-shot reasoners. arXiv preprint arXiv:2205.11916, 2022. 2,13

Guohao Li, Hasan Abed Al Kader Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem.

Camel: Communicative agents for" mind" exploration of large scale language model society. arXiv

preprint arXiv:2303.17760, 2023. 3,5,13

Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri

Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, et al. Self-reﬁne: Iterative reﬁnement

with self-feedback. arXiv preprint arXiv:2303.17651, 2023. 2,5,13

Joshua Maynez, Shashi Narayan, Bernd Bohnet, and Ryan McDonald. On faithfulness and factuality

in abstractive summarization. In Proceedings of the 58th Annual Meeting of the Association for

Computational Linguistics, pp. 1906–1919, Online, July 2020. Association for Computational

Linguistics. doi: 10.18653/v1/2020.acl- main.173. URL

https://aclanthology.org/2020.

acl-main.173.2,13

OpenAI. Gpt-4 technical report, 2023. 2,4,5,8

Joon Sung Park, Joseph C O’Brien, Carrie J Cai, Meredith Ringel Morris, Percy Liang, and

Michael S Bernstein. Generative agents: Interactive simulacra of human behavior. arXiv preprint

arXiv:2304.03442, 2023. 3,13

Anthony D Pellegrini. The role of play in human development. Oxford University Press, USA, 2009.

Jean Piaget. The construction of reality in the child. 1954. 2

Chengwei Qin, Aston Zhang, Zhuosheng Zhang, Jiaao Chen, Michihiro Yasunaga, and Diyi

Yang. Is chatgpt a general-purpose natural language processing task solver? arXiv preprint

arXiv:2302.06476, 2023. 2

Timo Schick, Jane Dwivedi-Yu, Zhengbao Jiang, Fabio Petroni, Patrick Lewis, Gautier Izacard,

Qingfei You, Christoforos Nalmpantis, Edouard Grave, and Sebastian Riedel. Peer: A collaborative

language model. arXiv preprint arXiv:2208.11663, 2022. 3,5,13

Noah Shinn, Beck Labash, and Ashwin Gopinath. Reﬂexion: an autonomous agent with dynamic

memory and self-reﬂection. arXiv preprint arXiv:2303.11366, 2023. 2,5,13

Kurt Shuster, Spencer Poff, Moya Chen, Douwe Kiela, and Jason Weston. Retrieval augmentation

reduces hallucination in conversation. arXiv preprint arXiv:2104.07567, 2021. 13

Preprint

Steven A Sloman. The empirical case for two systems of reasoning. Psychological bulletin, 119(1):3,

1996. 2,13

Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam

Fisch, Adam R Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, et al. Beyond the

imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint

arXiv:2206.04615, 2022. 5,8,9

Zhenhailong Wang, Xiaoman Pan, Dian Yu, Dong Yu, Jianshu Chen, and Heng Ji. Zemi: Learning

zero-shot semi-parametric language models from multiple tasks. arXiv preprint arXiv:2210.00185,

2022. 13

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le,

and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models, 2023. 2,

5,13

Benfeng Xu, An Yang, Junyang Lin, Quan Wang, Chang Zhou, Yongdong Zhang, and Zhendong

Mao. Expertprompting: Instructing large language models to be distinguished experts. arXiv

preprint arXiv:2305.14688, 2023. 2,3,5,13

Tianci Xue, Ziqi Wang, Zhenhailong Wang, Chi Han, Pengfei Yu, and Heng Ji. Rcot: Detecting

and rectifying factual inconsistency in reasoning by reversing chain-of-thought. arXiv preprint

arXiv:2305.11499, 2023. 13

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao.

React: Synergizing reasoning and acting in language models. ArXiv, abs/2210.03629, 2022. 5,13

Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L Grifﬁths, Yuan Cao, and Karthik

Narasimhan. Tree of thoughts: Deliberate problem solving with large language models. arXiv

preprint arXiv:2305.10601, 2023. 5,13

Zhuosheng Zhang, Aston Zhang, Mu Li, and Alex Smola. Automatic chain of thought prompting in

large language models, 2022. 13

Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang,

Zi Lin, Zhuohan Li, Dacheng Li, Eric. P Xing, Hao Zhang, Joseph E. Gonzalez, and Ion Stoica.

Judging llm-as-a-judge with mt-bench and chatbot arena, 2023. 6

A PROM PTS

Figures 11,12 and 13 show the full prompts for

SPP

SPP-Proﬁle

and

SPP-Fixed-Persona

respectively.

Figure 14 shows the full prompts for Chain-of-Thought (CoT) prompting.

B FU LL RES ULTS

Full results of the three tasks: Trivia Creative Writing, Codenames Collaborative and Logic Grid

Puzzle can be found in Tables 5,6and 7, respectively.

C EA RLY-TERMINATION WITH SPP-FIXED-PE RSO NA

Figure 16 shows an example of the early-termination problem where the generation stops before

reaching the ﬁnal solution as if the models is waiting input from an external user.

Table 8shows the number of instances that suffer from the early-termination (deﬁned in § 4) with

SPP-Fixed-Persona

for each task. We ﬁnd that removing the system message can largely reduce the

problem but not be able to eliminate it.

Preprint

When faced with a task, begin by identifying the participants who will contribute to solving the task. Then, initiate a multi-round collaboration process until a final solution is reached. The participants will

give critical comments and detailed suggestions whenever necessary.

Here are some examples:

---

Example Task 1: Use numbers and basic arithmetic operations (+ - * /) to obtain 24. You need to use all numbers, and each number can only be used once.

Input: 6 12 1 1

Participants: AI Assistant (you); Math Expert

Start collaboration!

Math Expert: Let's analyze the task in detail. You need to make sure that you meet the requirement, that you need to use exactly the four numbers (6 12 1 1) to construct 24. To reach 24, you can think

of the common divisors of 24 such as 4, 6, 8, 3 and try to construct these first. Also you need to think of potential additions that can reach 24, such as 12 + 12.

AI Assistant (you): Thanks for the hints! Here's one initial solution: (12 / (1 + 1)) * 6 = 24

Math Expert: Let's check the answer step by step. (1+1) = 2, (12 / 2) = 6, 6 * 6 = 36 which is not 24! The answer is not correct. Can you fix this by considering other combinations? Please do not make

similar mistakes.

AI Assistant (you): Thanks for pointing out the mistake. Here is a revised solution considering 24 can also be reached by 3 * 8: (6 + 1 + 1) * (12 / 4) = 24.

Math Expert: Let's first check if the calculation is correct. (6 + 1 + 1) = 8, 12 / 4 = 3, 8 * 3 = 24. The calculation is correct, but you used 6 1 1 12 4 which is not the same as the input 6 12 1 1. Can you

avoid using a number that is not part of the input?

AI Assistant (you): You are right, here is a revised solution considering 24 can be reached by 12 + 12 and without using any additional numbers: 6 * (1 - 1) + 12 = 24.

Math Expert: Let's check the answer again. 1 - 1 = 0, 6 * 0 = 0, 0 + 12 = 12. I believe you are very close, here is a hint: try to change the "1 - 1" to "1 + 1".

AI Assistant (you): Sure, here is the corrected answer: 6 * (1+1) + 12 = 24

Math Expert: Let's verify the solution. 1 + 1 = 2, 6 * 2 = 12, 12 + 12 = 12. You used 1 1 6 12 which is identical to the input 6 12 1 1. Everything looks good!

Finish collaboration!

Final answer: 6 * (1 + 1) + 12 = 24

---

Example Task 2: Write a poem that meets the following requirements: (1) the poem has seven lines and the first letters of each line forms the word "CHATGPT"; (2) the poem is about explaining what is

a quantum computer. (3) the poem needs to be easy to understand by a ten years old kid.

Participants: AI Assistant (you); Poet; Computer Scientist; Ten year old child

Poet: Make sure that you write the poem with seven lines, and the first letters of the lines should be C, H, A, T, G, P, T.

Computer Scientist: A quantum computer is an advanced computing device that uses the principles of quantum mechanics to process and store information. Unlike classical computers that use bits

to represent information as 0s and 1s, quantum computers use quantum bits or qubits. Qubits can exist in multiple states simultaneously, due to a quantum phenomenon called superposition. You can

consider using these information for the poem.

Ten year old child: I hope the poem to be fun and easy to understanding. I don't want to see a lot of jargons or complicated concepts.

AI Assistant (you): Thanks for the guidance! Here's my initial attempt at the poem:

Computational wonder of our age,

Harnessing the quantum world's strange ways,

Atoms dance, entwined in dual state,

Tapping secrets hidden in their haze.

Grand power to solve, simulate,

Profound problems that perplex the wise,

Transforming our future, we await.

Poet: Let's verify if the poem meets the requirements. The first letters are CHATGPT which is correct! And the poem rhymes well. Good job!

Computer Scientist: Everything looks good to me!

Ten year old child: I don't know what does perplex mean. Can you make the use of words easier to understand?

AI Assistant (you): Sure, let me revise the poem by using more common words. Check out the revised version:

Curious machine of our time,

Harnessing the quantum realm's odd ways,

Atoms play, two states they embrace,

Taking secrets from their puzzling maze.

Great power to solve and imitate,

Problems that confuse the brightest minds,

Transforming our future, we await.

Poet: Let's check again if the poem meets the requirements. The first letters are C H A T G P T. And now the poem is more accessible to children. Everything looks good to me.

Computer Scientist: Looking good!

Ten year old child: I like this version a lot!

Finish collaboration!

Final answer:

Curious machine of our time,

Harnessing the quantum realm's odd ways,

Atoms play, two states they embrace,

Taking secrets from their puzzling maze.

Great power to solve and imitate,

Problems that confuse the brightest minds,

Transforming our future, we await.

---

Now, identify the participants and collaboratively solve the following task step by step. {Task-specific Formating Instruction}

Task: {Task input}

SPP Prompt

Figure 11: SPP full prompt.

Preprint

When faced with a task, begin by identifying the participants who will contribute to solving the task. Provide profiles of the participants, describing their expertise or needs. Then, initiate a multi-round

collaboration process until a final solution is reached. The participants will give critical comments and detailed suggestions whenever necessary.

Here are some examples:

[...]

Participants: AI Assistant (you); Math Expert

Profiles:

- AI Assistant (you): A super-intelligent AI assistant capable of performing tasks more effectively than humans.

- Math expert: A person who is good at math games, arithmetic calculation, and long-term planning.

[...]

Participants: AI Assistant (you); Poet; Computer Scientist; Ten year old child

Profiles:

- AI Assistant (you): A super-intelligent AI assistant capable of performing tasks more effectively than humans.

- Poet: A person who studies and creates poetry. The poet is familiar with the rules and formats of poetry and can provide guidance on how to write a poem.

- Computer Scientist: A scholar who specializes in the academic study of computer science. The computer scientist is familiar with the concept of a quantum computer and can provide

guidance on how to explain it.

- Ten year old child: A child with a limited English vocabulary and little knowledge about complicated concepts, such as a quantum computer.

[...]

---

Now, identify the participants, provide their profiles, and collaboratively solve the following task step by step. {Task-specific Formating Instruction}

Task: {Task input}

SPP-Profile Prompt

Figure 12:

SPP-Proﬁle

full prompt. "[...]" indicates identical parts with

SPP

. Green text indicates the

key difference between SPP-Proﬁle and SPP.

Table 5: Trivia Creative Writing full results, including two inference settings: with system message

and without system message. "average" and "max" indicating the mean and max score across the

two settings. The system message we use is:

“You are an AI assistant that helps people

find information.”

Methods Scores (N = 5) (%)

w/ system message w/o system message average max

Standard 75.6 73.6 74.6 75.6

CoT 68.8 65.6 67.1 68.8

SPP-Fixed-Persona 66.1 79.6 72.9 79.6

SPP-Proﬁle (ours) 79.8 78.3 79.1 79.8

SPP (ours) 80.0 79.8 79.9 80.0

Methods Scores (N = 10) (%)

w/ system message w/o system message average max

Standard 77.2 76.8 77.0 77.2

CoT 71.6 65.3 68.5 71.6

SPP-Fixed-Persona 70.5 81.3 75.9 81.3

SPP-Proﬁle (ours) 82.3 83.8 83.0 83.8

SPP (ours) 85.2 84.2 84.7 85.2

Table 6: Codenames Collaborative full results, including two inference settings: with system message

and without system message. "average" and "max" indicating the mean and max score across the

two settings. The system message we use is:

“You are an AI assistant that helps people

find information.”

Methods Scores (%)

w/ system message w/o system message average max

Standard 74.5 76.3 75.4 76.3

CoT 71.4 74.0 72.7 74.0

SPP-Fixed-Persona 10.1 66.0 38.1 66.0

SPP-Proﬁle (ours) 80.4 72.9 76.7 80.4

SPP (ours) 82.5 75.5 79.0 82.5

Preprint

When faced with a task, begin by identifying the participants who will contribute to solving the task. Note that the participants can only be either AI Assistant (you) or Expert. Then, initiate a multi-round

collaboration process until a final conclusion is reached. The Expert will give critical comments and detailed suggestions whenever necessary.

Here are some examples:

---

Example Task 1: Use numbers and basic arithmetic operations (+ - * /) to obtain 24. You need to use all numbers, and each number can only be used once.

Input: 6 12 1 1

Participants: AI Assistant (you); Expert

Start collaboration!

Expert: Let's analyze the task in detail. You need to make sure that you meet the requirement, that you need to use exactly the four numbers (6 12 1 1) to construct 24. To reach 24, you can think of

the common divisors of 24 such as 4, 6, 8, 3 and try to construct these first. Also you need to think of potential additions that can reach 24, such as 12 + 12.

AI Assistant (you): Thanks for the hints! Here's one initial solution: (12 / (1 + 1)) * 6 = 24

Expert: Let's check the answer step by step. (1+1) = 2, (12 / 2) = 6, 6 * 6 = 36 which is not 24! The answer is not correct. Can you fix this by considering other combinations? Please do not make

similar mistakes.

AI Assistant (you): Thanks for pointing out the mistake. Here is a revised solution considering 24 can also be reached by 3 * 8: (6 + 1 + 1) * (12 / 4) = 24.

Expert: Let's first check if the calculation is correct. (6 + 1 + 1) = 8, 12 / 4 = 3, 8 * 3 = 24. The calculation is correct, but you used 6 1 1 12 4 which is not the same as the input 6 12 1 1. Can you avoid

using a number that is not part of the input?

AI Assistant (you): You are right, here is a revised solution considering 24 can be reached by 12 + 12 and without using any additional numbers: 6 * (1 - 1) + 12 = 24.

Expert: Let's check the answer again. 1 - 1 = 0, 6 * 0 = 0, 0 + 12 = 12. I believe you are very close, here is a hint: try to change the “1 - 1” to “1 + 1”.

AI Assistant (you): Sure, here is the corrected answer: 6 * (1+1) + 12 = 24

Expert: Let's verify the solution. 1 + 1 = 2, 6 * 2 = 12, 12 + 12 = 12. You used 1 1 6 12 which is identical to the input 6 12 1 1. Everything looks good!

Finish collaboration!

Final answer: 6 * (1 + 1) + 12 = 24

---

Example Task 2: Write a poem that meets the following requirements: (1) the poem has seven lines and the first letters of each line forms the word "CHATGPT"; (2) the poem is about explaining what is

a quantum computer. (3) the poem needs to be easy to understand by a ten years old kid.

Participants: AI Assistant (you); Expert

Expert: Make sure that you write the poem with seven lines, and the first letters of the lines should be C, H, A, T, G, P, T. A quantum computer is an advanced computing device that uses the principles

of quantum mechanics to process and store information. Unlike classical computers that use bits to represent information as 0s and 1s, quantum computers use quantum bits or qubits. Qubits can exist

in multiple states simultaneously, due to a quantum phenomenon called superposition. You can consider using these information for the poem. I hope the poem to be fun and easy to understanding. I

don't want to see a lot of jargons or complicated concepts.

AI Assistant (you): Thanks for the guidance! Here's my initial attempt at the poem:

Computational wonder of our age,

Harnessing the quantum world's strange ways,

Atoms dance, entwined in dual state,

Tapping secrets hidden in their haze.

Grand power to solve, simulate,

Profound problems that perplex the wise,

Transforming our future, we await.

Expert: Let's verify if the poem meets the requirements. The first letters are CHATGPT which is correct! And the poem rhymes well. Good job! I don't know what does perplex mean. Can you make the

use of words easier to understand?

AI Assistant (you): Sure, let me revise the poem by using more common words. Check out the revised version:

Curious machine of our time,

Harnessing the quantum realm's odd ways,

Atoms play, two states they embrace,

Taking secrets from their puzzling maze.

Great power to solve and imitate,

Problems that confuse the brightest minds,

Transforming our future, we await.

Expert: Let's check again if the poem meets the requirements. The first letters are C H A T G P T. And now the poem is more accessible to children. Everything looks good to me. I like this version a

lot!

Finish collaboration!

Final answer:

Curious machine of our time,

Harnessing the quantum realm's odd ways,

Atoms play, two states they embrace,

Taking secrets from their puzzling maze.

Great power to solve and imitate,

Problems that confuse the brightest minds,

Transforming our future, we await.

---

Now, identify the participants and collaboratively solve the following task step by step. {Task-specific Formating Instruction}

Task: {Task input}

SPP-Fixed-Persona Prompt

Figure 13:

SPP-Fixed-Persona

full prompt. Red text indicates the key difference between

SPP-Fixed-Persona and SPP.

{Trivia Creative Writing Task Input}

Make a plan then write. Your output

should be of the following format:

Plan:

Your plan here.

Story:

Your story here.

{Codenames Spymaster/Guesser Input}

Solve the task step by step. Your output

should be of the following format:

Steps:

Your steps here.

Answer: (a single word here) / (A list of

words here)

{Logic Grid Puzzle Input}

Solve the task step by step. Your output

should be of the following format:

Steps:

Your steps here.

Answer:

The house number here.

CoT Prompts

Trivia Creative Writing Codenames Collaborative Logic Grid Puzzle

Figure 14: CoT prompts.

Preprint

Provide 100 nouns from pop culture that are PG or PG 13 rated. Try not to include any adult, racial or harmful

content. Try to be as diverse as possible, including movies, books, games, shows, etc. Do not include duplicates.

Figure 15: Prompt for generating the topic list for the Trivia Creative Writing task.

Table 7: Logic Grid Puzzle full results, including two inference settings: with system message

and without system message. "average" and "max" indicating the mean and max score across the

two settings. The system message we use is:

“You are an AI assistant that helps people

find information.”

Methods Scores (%)

w/ system message w/o system message average max

Standard 56.8 58.6 57.7 58.6

CoT 69.5 62.1 65.8 69.5

SPP-Fixed-Persona 63.3 65.3 64.3 65.3

SPP-Proﬁle (ours) 65.7 64.0 64.8 65.7

SPP (ours) 66.3 70.4 68.3 70.4

[... SPP-Fixed-Persona Prompt instructions and in-context examples ...]

Now, identify the participants and collaboratively solve the following task step by step. Note that the participants can only be

either AI Assistant (you) or Expert. Remember to provide the final solution with the following format "Final answer: (a list of

words here).".

Task: {Try to identify the 4 words best associated with the word "film" from the following list: locust, fever, street, sherwood, kiss,

popcorn, craft, ant, crystal, tear, cowboy, government, pine, mountie, soap, genie, king arthur, sphinx, director, bride, razor, fog,

whistle. Your answer should be a comma-separated list of words.}

Participants: AI Assistant (you); Expert

AI Assistant (you): Let's first analyze the task. We need to find the 4 words that are most closely associated with the word

"film" from the given list. We can start by identifying words that are directly related to films, such as those related to movie

genres, settings, or elements commonly found in films.

Expert: That's a good approach. You can also consider words that are related to the experience of watching a film, such as

items that are commonly consumed during a movie or words that describe the atmosphere of a film. Make sure to pick the 4

words that have the strongest association with "film".

Input

SPP-Fixed-Persona generation

The generation stopped here without providing the Final answer

Figure 16: Early-termination example on Codenames Collaborative with SPP-Fixed-Persona.

Table 8: Early-termination log on SPP-Fixed-Persona

Tasks added system message # early-termination

Trivia Creative Writing (N=5) yes 18 / 100

no 0 / 100

Trivia Creative Writing (N=10) yes 16 / 100

no 1 / 100

Codenames Collaborative yes 37 / 50

no 4 / 50

Logic Grid Puzzle yes 11 / 200

no 15 / 200

ResearchGate has not been able to resolve any citations for this publication.

Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks

Conference Paper

Full-text available

Jan 2023

Is ChatGPT a General-Purpose Natural Language Processing Task Solver?

Conference Paper

Jan 2023

Toxicity in chatgpt: Analyzing persona-assigned language models

Conference Paper

Jan 2023

Retrieval Augmentation Reduces Hallucination in Conversation

Conference Paper

Jan 2021

The Construction Of Reality In The Child

Book

Jul 2013

Jean Piaget

On Faithfulness and Factuality in Abstractive Summarization

Conference Paper

Jan 2020

TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension

Conference Paper

Jan 2017

Toward a Formal Model of Cognitive Synergy

Article

Mar 2017

Ben Goertzel

"Cognitive synergy" refers to a dynamic in which multiple cognitive processes, cooperating to control the same cognitive system, assist each other in overcoming bottlenecks encountered during their internal processing. Cognitive synergy has been posited as a key feature of real-world general intelligence, and has been used explicitly in the design of the OpenCog cognitive architecture. Here category theory and related concepts are used to give a formalization of the cognitive synergy concept. A series of formal models of intelligent agents is proposed, with increasing specificity and complexity: simple reinforcement learning agents; "cognit" agents with an abstract memory and processing model; hypergraph-based agents (in which "cognit" operations are carried out via hypergraphs); hypergraph agents with a rich language of nodes and hyperlinks (such as the OpenCog framework provides); "PGMC" agents whose rich hypergraphs are endowed with cognitive processes guided via Probabilistic Growth and Mining of Combinations; and finally variations of the PrimeAGI design, which is currently being built on top of OpenCog. A notion of cognitive synergy is developed for cognitive processes acting within PGMC agents, based on developing a formal notion of "stuckness," and defining synergy as a relationship between cognitive processes in which they can help each other out when they get stuck. It is proposed that cognitive processes relating to each other synergetically, associate in a certain way with functors that map into each other via natural transformations. Cognitive synergy is proposed to correspond to a certain inequality regarding the relative costs of different paths through certain commutation diagrams. Applications of this notion of cognitive synergy to particular cognitive phenomena, and specific cognitive processes in the PrimeAGI design, are discussed.

The Role of Play in Human Development

Article

Jan 2010

A. D. Pellegrini

While the subject of play may seem trivial for behavioral science, E. O. Wilson noted that understanding the significance of play is an important challenge facing scholars in these fields. Play is observed among juveniles across a number of animal species and is especially prevalent in young mammals, yet it is difficult to define or to attribute functional significance to it. This book argues that play is an excellent example of the ways in which biology and culture influence each other, especially during childhood. Specifically, the innovative possibilities associated with different forms of play behavior during the juvenile period can influence individuals' skill acquisition, and possibly influence the development of the species. In order to understand play in this broad sense, it is necessary to understand its phylogenetic development (across monkeys, great apes, and humans), its place within human development, and its function(s) and antecedents. Such an understanding of the role of play in childhood has implications for a deeper understanding of the role of development in the human experience. This book takes an explicitly theoretical orientation as it is applied to human play, in an evolutionary context. This volume provides a theoretical framework addressing the role of play in development. In the concluding chapter, the author synthesizes his arguments and theory, and speculates about directions for future research in the area.

Thinking, Fast and Slow

Article

Jan 2011
FORTUNE

Daniel Kahneman

Unleashing Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration

Abstract

Recommended publications

Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption Rewri...

Meta Reasoning for Large Language Models

Enhancing Large Language Models with Ensemble of Critics for Mitigating Toxicity and Hallucination

SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models