With the rapid rise of artificial intelligence (AI) in the professional world, a question emerges: can these systems truly replace human employees? Researchers from Carnegie Mellon University have attempted to answer this question by simulating a company run by AI agents. The results reveal surprising insights into the current capabilities of these technologies and their future potential in the workforce.
A virtual company driven by AIs
In their experiment, the researchers created a fake company populated by AI agents, such as Claude from Anthropic, GPT-4o from OpenAI, and Google Gemini, to occupy various positions, ranging from financial analyst to project manager. Meanwhile, other platforms simulated colleagues with whom these AIs had to interact to accomplish certain essential tasks for their operation. An ambitious idea, but one that quickly revealed notable weaknesses in the digital agents.
The concerning results of AI agents
The results of the experiment demonstrated that these artificial intelligences did not live up to expectations. Indeed, the agents failed in more than three quarters of the tasks assigned to them. For example, Claude 3.5 Sonnet, although the most efficient, only managed to accomplish 24% of the tasks, and only 34.4% if we include the partially completed tasks. Other agents like Gemini 2.0 Flash completed only 11.4% of the tasks, while the majority of the other agents scored below 10%.
The obstacles faced by AIs
The difficulties encountered by the agents are not limited to errors in task execution. Many of them did not grasp the meaning of implicit instructions. For example, when asked to write a document with the extension “.docx”, they did not understand that it was a Microsoft Word format. Additionally, their inability to handle tasks requiring social skills often hindered their ability to collaborate effectively with other parts of the company.
Online navigation, a real challenge
One of the biggest challenges faced by these AI agents was navigating the Internet, particularly when dealing with pop-up windows. When they found themselves in complex situations, many of them chose simplified routes to avoid difficulties, mistakenly believing they had completed the task. This issue highlights the significant limitations of current AIs regarding autonomous decision-making.
Contrasting operating costs
Despite their disappointing performance, it is interesting to note that the operating costs for these agents vary considerably. For example, Claude 3.5 Sonnet required an investment of $6.34, while Gemini 2.0 Flash only cost $0.79. This raises questions about the economic viability of using such technologies within companies, especially when the return on investment is not guaranteed.
Implications for the future of work
Beyond the results of this study, it is crucial to reflect on what this means for the future of the professional world. While automation and AI promise to transform our way of working, this case demonstrates that these technologies are not yet ready to completely replace humans. Current AI tools, although effective for certain tasks, still have significant shortcomings that could limit their large-scale use in companies.







