A recent research paper from Stanford University has shed light on concerns raised by ChatGPT Plus users regarding the diminishing capabilities of the AI-powered chatbot.
The paper delves into an in-depth analysis of GPT-4, the language model behind ChatGPT Plus, comparing its operations to its predecessor, GPT-3.5.
According to the findings presented by researchers Lingjiao Chen, Matei Zaharia, and James Zou, the performance of both GPT-3.5 and GPT-4 has shown significant variation between the two releases, with noticeable declines in performance on certain tasks over time.
The researchers stated, "We find that the performance and behaviour of both GPT-3.5 and GPT-4 vary significantly across these two releases and that their performance on some tasks have gotten substantially worse over time."
Specifically, the paper highlights a stark example where ChatGPT's accuracy significantly dropped when answering whether 17077 is a prime number. The once accurate response saw a massive decrease of 95.2 per cent in accuracy under the GPT-4 version. Conversely, GPT-3.5, which powers the free version of ChatGPT, exhibited an impressive surge from 7.4 per cent to 86.8 per cent accuracy when presented with the same question.
Users have been expressing their discontent with ChatGPT's diminished performance on various platforms, including OpenAI's official forums, for the past several weeks. OpenAI's VP of Product, Peter Welinder, responded to these claims by asserting that GPT-4 was not intentionally designed to be "dumber".
He contended that each new version is aimed at improving the AI's intelligence, but heavier usage might highlight issues not previously observed. In a follow-up tweet, Welinder challenged users to provide evidence supporting the deterioration of GPT-4's performance.