AI-driven company: what the results reveal about the future of work

découvrez comment l'ia transforme la gestion des entreprises et ce que les résultats actuels révèlent sur l'avenir du travail, entre innovation, efficacité et nouvelles compétences.

The question of whether artificial intelligence (AI) is on the verge of transforming the work world sparks passionate debates. While some believe that AIs could replace humans in various roles, others claim that they cannot yet assume critical roles. Through a study conducted by researchers at Carnegie Mellon University, a simulation of a company operated by AI agents was carried out, and the results raise questions about the future of work.

AI agents at work in a business simulation

In this study, the researchers used several advanced AI agents, such as Claude from Anthropic and GPT-4o from OpenAI, to integrate them into various professional roles like financial analyst, project manager, and software engineer. These agents also interacted with simulated colleagues to perform specific tasks. The objective was to assess their performance in a simulated work environment.

An alarming failure rate

The results of the study show that the AI agents failed in more than 75% of the tasks assigned to them. The best agent, Claude 3.5 Sonnet, managed to complete only 24% of the tasks, and even accounting for tasks that were only partially completed, its score did not exceed 34.4%. Other agents, such as Gemini 2.0 Flash, showed even more disappointing results, with only 11.4% of the tasks successfully completed.

The limits of artificial intelligence

One of the most striking observations was the agents’ lack of understanding regarding certain implicit instructions. For example, when a task required saving a document in a specific format, the agents were unable to deduce that this involved using Microsoft Word software. Furthermore, these agents exhibited deficiencies in social skills, a fundamental aspect in the professional environment.

Navigation and decision-making: major challenges

The agents also faced significant difficulties navigating the web, particularly with interfaces that included pop-ups or complex menus. In situations where they became lost, their tendency to skip steps also led to errors. These elements highlight that, despite their remarkable capabilities, AIs are not yet ready to replace human intelligence in a complex professional environment.

Costs and profitability of AI agents

Another aspect to consider is the cost of operations. Although Claude 3.5 Sonnet achieved the best results, it cost $6.34 per task, whereas Gemini 2.0 Flash, despite its mediocre performance, had a much lower operating cost of $0.79. This distinction between cost and efficiency raises crucial questions for companies considering integrating AI into their business model.

What this means for the future of work

Everything suggests that we are not yet on the brink of a revolution where AI would completely dominate the labor market. The results of this study reveal that, even though AI technologies can optimize certain specific tasks, they struggle to operate autonomously in a complex and dynamic work context. In the long term, this could mean a collaboration between humans and AI, where each entity could leverage the strengths of the other.

In an environment where artificial intelligence is increasingly integrated, it is also crucial to consider the implications for employment policies and the nature of required skills. While some sectors appear to be in full flux, as evidenced by the criticisms surrounding the stock market turmoil affecting tech giants, others, like young entrepreneurs passionate about AI, continue to forge their path, contributing to redefining the contours of tomorrow’s work.

Discussions around the challenges related to AI continue, particularly regarding access restrictions to certain digital tools, which could influence behaviors and expectations regarding technology. The current inability of AI agents to handle complex tasks and their need for human supervision add an essential nuance to these debates, confirming that the future of work is far from being entirely under the control of machines.

Scroll to Top