AI Underperforms Human Workers By A Big Margin, Government Tests Have Found

A recent government trial in Australia has raised concerns that generative AI might actually create more work for people rather than reduce workload. Since ChatGPT’s launch in November 2022, AI has rapidly gained popularity, with companies worldwide scrambling to incorporate it into their operations. However, a joint trial by Amazon and Australia’s Securities and Investments Commission (ASIC) has questioned the technology’s efficiency in workplace settings.

The trial focused on Meta’s open-source Llama2-70B model, and it was tasked with summarizing submissions mentioning ASIC and providing recommendations. Ten human employees were given the same task, and their outputs were compared with the AI-generated summaries. A group of reviewers assessed the responses on coherence, length, references to ASIC, regulation mentions, and recommendations.

Surprisingly, the human-generated summaries significantly outperformed AI, achieving a score of 81%, while AI lagged behind at 47%. The AI struggled particularly with identifying references to ASIC within documents, a task complicated by limitations in AI models’ context windows and embedding strategies. Unlike humans, AI models do not store page references as metadata when processing PDF documents as plain text, leading to inaccuracies in document references.

Additionally, AI-generated summaries were criticized for being overly verbose and lacking proper formatting. Reviewers often had to revisit the original documents to verify AI outputs. This added layer of fact-checking and corrections potentially increased the workload instead of reducing it.

The trial suggests that while AI technology holds promise, its current limitations in handling large, complex documents may necessitate human intervention. In its present state, AI could actually create more work for employees by requiring additional checks and improvements. As businesses continue to adopt AI, these findings highlight the need for caution and further refinement of the technology before fully integrating it into workflows.

Leave a Reply

Your email address will not be published. Required fields are marked *