Skip to main content
Ideating process

Strategy

When Machines Argue: AI Hive Minds and Strategic Decisions

When Machines Argue: AI Hive Minds and Strategic Decisions

What happens when you make AI models debate each other, and why the results might be better than what your strategy team could produce.

Digital twins have become indispensable tools across industries. Powered by AI, these virtual constructs mirror physical systems in complex manufacturing facilities, supply chains and operational workflows. By continuously monitoring their physical counterparts and feeding back recommendations, digital twins allow us to predict maintenance needs, optimise production schedules and prevent disruptions before they occur.

Yet, for all their power, digital twins have largely remained confined to a particular domain: structured, operational decisions with desired outcomes. Could companies also use these digital constructs to tackle unstructured strategic challenges, such as market entry decisions and long-term planning?

During a recent INSEAD Tech Talk X, I explored this question with Hamza Mudassir, a strategy lecturer at Cambridge University and co-founder and CEO of Strategize Labs, a start-up that specialises in digital twins and AI.

Mudassir proposed a solution that resembles science fiction at first glance but is eminently feasible today: Run a debate among several AI agents, each with its own strengths, and, once they converge on a set of recommendations, test their recommendations on the digital twin of your business. 

The hive mind

Businesses are already turning to AI to support decision-making. However, relying on a single model often misses the bigger picture. Mudassir observed that breakthrough strategies at companies like Apple and Microsoft didn't emerge from standard consulting frameworks. Instead, they came from a different process: Having a small team of highly tuned people with different viewpoints – but a shared goal of doing what’s best for the organisation – debate each other. 

This was what Mudassir and his team did, except instead of humans, they employed multiple large language models (LLMs). They dubbed this process the “hive mind”, after the Star Trek cyborgs that appear to work independently but are united in a common objective. 

Engineering disagreement

Mudassir’s team took four different LLMs, gave them discrete personalities and state-of-the-art models, then tasked them with solving unstructured problems on two topics – a strategy question and a human resources question. Then, the team told them to debate.

“An LLM arguing with another LLM to get to a good answer is effectively taking on the role of the user,” said Mudassir. “As a user, when you're talking to ChatGPT, you are giving it feedback. It's an iterative system… and most of an LLM's capabilities are based on its ability to be challenged.”

In another nod to science fiction, Mudassir’s team added what they call "the inception layer". Their setup tricks each LLM into believing it is interacting with humans, not fellow machines. Moreover, rather than hand-crafting personalities, a master AI analyses each problem and dynamically generates synthetic personas optimised for that specific challenge.

The system also incorporates "temperature" settings to control creativity. Low settings produce consistent and repeatable, albeit not very novel, results. High settings generate more innovative outputs as patterns collide in unexpected ways. The system manages both semantic and technical dimensions simultaneously.

Strategy vs. culture: two experiments

To test their approach, Mudassir's team pitted the AI hive mind against humans. First, the team came up with two thorny questions: a strategy question about turning around a brewery and a human resources question involving harassment in an organisation.

They recruited two groups of participants and assigned each group to one question. The group that got the complex, if relatively straightforward, strategy case was made up of MBA students with five to six years of work experience. The second group, comprising chief human resources officers (CHROs) and directors with at least 10 years of experience across different geographies, was tasked to solve the HR problem. Then, Mudassir’s team created two AI hive minds, one for each question.

For the strategy case, the AI hive mind trumped the humans hands-down. In 10 minutes, the hive mind produced four times the output that humans generated in 45 minutes – if for no other reason than “there's only so much that three or four people can say physically in 45 minutes”. 

But word count wasn’t the only dimension the AI hive mind overperformed in. It also generated complete “McKinsey-looking” presentation decks, financial models and even an unsolicited, but much-needed, supply chain analysis that Mudassir’s team had not thought to request.

Mudassir's verdict: "On very generic problems that are about brute-force sort of intelligence and computation on standard frameworks, theory, etc., [problems that] don’t have a lot of variance in terms of culture, geography, specialties and places that the training data cannot touch – you are probably better off running a hive mind first."

But the HR harassment case revealed the AI hive mind’s limitations. While the system performed on a par with the British CHRO in the group, it struggled to keep up with regional experts from Pakistan, Bangladesh and the Middle East. These CHROs surfaced missed gaps the LLMs could not possibly know because they had not been trained on them. Think cultural concepts as idiosyncratic as Pakistan's "seth company culture", which even Indians may have little clue about.

Mudassir's assessment of the AI hive mind: "Figuring out the nuance, which is idiosyncratic to a city, to a town, to a state, in a country which is not heavily digitised – you’ll probably get wrong answers."

The surprising discovery

One finding caught Mudassir’s team by surprise. At high creativity settings, they expected incoherent outputs from random pattern combinations. Instead, the agents' arguments pre-filtered the garbage at source. Because multiple agents challenged each idea, weak concepts got pushed down in priority while strong and useful ideas rose to the top. 

What’s more, the infrastructure barriers are lower than you might think. Tools like Langflow enable anyone to get started on creating their own AI hive mind today. Engineers and the more technically inclined might use more sophisticated tools like LangGraph.

But remember: this is augmentation, not replacement. To shine, aim to harness AI's computational power while preserving the idiosyncratic, culturally embedded wisdom that no training data can yet capture. 

As Mudassir put it: "Ultimately, we think of AI not as a decision maker, but a decision enabler. It's a sandbox... We should not treat it as anything more than that."

 

Edited by:

Seok Hwai Lee

About the author(s)

Related Tags

Artificial intelligence

About the series

AI: Disruption and Adaptation
Summary
Delve deeper into how artificial intelligence is disrupting and enhancing sectors – including business consulting, education and the media – and learn more about the associated regulatory and ethical issues.
View Comments
No comments yet.
Leave a Comment
Please log in or sign up to comment.