{"id":21600,"date":"2025-09-09T21:09:50","date_gmt":"2025-09-09T21:09:50","guid":{"rendered":"https:\/\/thisbiginfluence.com\/?p=21600"},"modified":"2025-09-09T21:09:50","modified_gmt":"2025-09-09T21:09:50","slug":"gpt-5-is-making-huge-factual-errors-users-say","status":"publish","type":"post","link":"https:\/\/thisbiginfluence.com\/?p=21600","title":{"rendered":"GPT-5 Is Making Huge Factual Errors, Users Say"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div id=\"incArticle\">\n<p>It has been simply over a month since OpenAI dropped its <a href=\"https:\/\/futurism.com\/openai-releases-gpt-5\" class=\"underline hover:text-futurism hover:no-underline transition-all duration-200 ease-in-out\" style=\"text-decoration-color:blue\">long-awaited GPT-5<\/a> giant language mannequin (LLM) \u2014 and it hasn&#8217;t stopped spewing an astonishing quantity of unusual falsehoods since then.<\/p>\n<p>From the <a href=\"https:\/\/mindmatters.ai\/2025\/09\/gpt-5-0-doesnt-understand-but-is-eager-to-please\/\" class=\"underline hover:text-futurism hover:no-underline transition-all duration-200 ease-in-out\" style=\"text-decoration-color:blue\">AI experts<\/a> on the Discovery Institute&#8217;s Walter Bradley Middle for Synthetic Intelligence and irked Redditors on r\/ChatGPTPro, to even OpenAI CEO <a href=\"https:\/\/futurism.com\/sam-altman-admits-openai-screwed-up\" class=\"underline hover:text-futurism hover:no-underline transition-all duration-200 ease-in-out\" style=\"text-decoration-color:blue\">Sam Altman himself<\/a>, there&#8217;s loads of proof to recommend that OpenAI&#8217;s declare that GPT-5 boasts &#8220;<a href=\"https:\/\/openai.com\/index\/introducing-gpt-5\/\" class=\"underline hover:text-futurism hover:no-underline transition-all duration-200 ease-in-out\" style=\"text-decoration-color:blue\">PhD-level intelligence<\/a>&#8221; comes with some severe asterisks.<\/p>\n<p>In a Reddit publish, a <a href=\"https:\/\/www.reddit.com\/r\/ChatGPTPro\/comments\/1n890r6\/chatgpt_5_has_become_unreliable_getting_basic\/\" class=\"underline hover:text-futurism hover:no-underline transition-all duration-200 ease-in-out\" style=\"text-decoration-color:blue\">user realized<\/a> not solely that GPT-5 had been producing &#8220;incorrect data on fundamental info over half the time,&#8221; however that with out fact-checking, they could have missed different hallucinations.<\/p>\n<p>The Reddit consumer&#8217;s expertise highlights simply how widespread it&#8217;s for chatbots to hallucinate, which is AI-speak for confidently making stuff up. Whereas the problem is <a href=\"https:\/\/futurism.com\/ai-industry-problem-smarter-hallucinating\" class=\"underline hover:text-futurism hover:no-underline transition-all duration-200 ease-in-out\" style=\"text-decoration-color:blue\">far from exclusive to ChatGPT<\/a>, OpenAI&#8217;s newest LLM appears to have a <a href=\"https:\/\/futurism.com\/the-byte\/researchers-ai-chatgpt-hallucinations-terminology\" class=\"underline hover:text-futurism hover:no-underline transition-all duration-200 ease-in-out\" style=\"text-decoration-color:blue\">particular penchant for BS<\/a> \u2014 a actuality that challenges the corporate&#8217;s declare that <a href=\"https:\/\/mashable.com\/article\/openai-gpt-5-hallucinates-less-system-card-data\" class=\"underline hover:text-futurism hover:no-underline transition-all duration-200 ease-in-out\" style=\"text-decoration-color:blue\">GPT-5 hallucinates less<\/a> than its predecessors.<\/p>\n<p>In a <a href=\"https:\/\/openai.com\/index\/why-language-models-hallucinate\/\" class=\"underline hover:text-futurism hover:no-underline transition-all duration-200 ease-in-out\" style=\"text-decoration-color:blue\">recent blog post about hallucinations<\/a>, wherein OpenAI as soon as once more claimed that GPT-5 produces &#8220;considerably fewer&#8221; of them \u2014 the agency tried to clarify how and why these falsehoods happen.<\/p>\n<p>&#8220;Hallucinations persist partly as a result of present analysis strategies set the incorrect incentives,&#8221; the September 5 publish reads. &#8220;Whereas evaluations themselves don&#8217;t instantly trigger hallucinations, most evaluations measure mannequin efficiency in a manner that encourages guessing moderately than honesty about uncertainty.&#8221;<\/p>\n<p>Translation: LLMs hallucinate as a result of they&#8217;re educated to get issues proper, even when it means guessing. Although some fashions, <a href=\"https:\/\/futurism.com\/ai-makes-up-answers\" class=\"underline hover:text-futurism hover:no-underline transition-all duration-200 ease-in-out\" style=\"text-decoration-color:blue\">like Anthropic&#8217;s Claude<\/a>, have been educated to confess when they do not know a solution, OpenAI&#8217;s haven&#8217;t \u2014 thus, they wager incorrect guesses.<\/p>\n<p>Because the Reddit consumer indicated (backed up with a <a href=\"https:\/\/chatgpt.com\/share\/68b99a61-5d14-800f-b2e0-7cfd3e684f15\" class=\"underline hover:text-futurism hover:no-underline transition-all duration-200 ease-in-out\" style=\"text-decoration-color:blue\">link to their conversation log<\/a>), they received some huge factual errors when asking in regards to the gross home product (GDP) of assorted international locations and had been offered by the chatbot with &#8220;figures that had been actually double the precise values.&#8221;<\/p>\n<p>Poland, for example, was listed as having a GDP of greater than two <em>trillion<\/em> {dollars}, when in actuality its GDP, <a href=\"https:\/\/www.imf.org\/external\/datamapper\/profile\/POL\" class=\"underline hover:text-futurism hover:no-underline transition-all duration-200 ease-in-out\" style=\"text-decoration-color:blue\">per the International Monetary Fund<\/a>, is presently hovering round $979 billion. Have been we to wager a guess, we would say that that hallucination could also be attributed to <a href=\"https:\/\/www.gov.pl\/web\/primeminister\/poland-joins-the-trillionaires-club-a-historic-entry-into-the-worlds-top-20-economies#:~:text=Poland%20Among%20Economic%20Leaders,exclusive%20club%20of%20trillionaire%20countries.\" class=\"underline hover:text-futurism hover:no-underline transition-all duration-200 ease-in-out\" style=\"text-decoration-color:blue\">recent boasts from the country&#8217;s president<\/a> saying its economic system (and never its GDP) has exceeded $1 trillion.<\/p>\n<p>&#8220;The scary half? I solely seen these errors as a result of some solutions appeared so off that they made me suspicious,&#8221; the consumer continued. &#8220;For example, once I noticed GDP numbers that appeared manner too excessive, I double-checked and located they had been fully incorrect.&#8221;<\/p>\n<p>&#8220;This makes me surprise: What number of instances do I NOT fact-check and simply settle for the incorrect data as reality?&#8221; they mused.<\/p>\n<p>In the meantime, AI skeptic Gary Smith of the Walter Bradley Middle famous that he is finished three easy experiments with GPT-5 since its launch \u2014 a <a href=\"https:\/\/futurism.com\/gpt-5-simple-question-confusion\" class=\"underline hover:text-futurism hover:no-underline transition-all duration-200 ease-in-out\" style=\"text-decoration-color:blue\">modified game of tic-tac-toe<\/a>, <a href=\"https:\/\/mindmatters.ai\/2025\/08\/what-kind-of-a-phd-level-expert-is-chatgpt-5-0-i-tested-it\/\" class=\"underline hover:text-futurism hover:no-underline transition-all duration-200 ease-in-out\" style=\"text-decoration-color:blue\">questioning about financial advice<\/a>, and a <a href=\"https:\/\/mindmatters.ai\/2025\/08\/what-kind-of-a-phd-level-expert-is-chatgpt-5-0-i-tested-it\/\" class=\"underline hover:text-futurism hover:no-underline transition-all duration-200 ease-in-out\" style=\"text-decoration-color:blue\">request to draw a possum<\/a> with 5 of its physique components labeled \u2014 to &#8220;display that GPT 5.0 was removed from PhD-level experience.&#8221;<\/p>\n<p>The <a href=\"https:\/\/mindmatters.ai\/2025\/08\/what-kind-of-a-phd-level-expert-is-chatgpt-5-0-i-tested-it\/\" class=\"underline hover:text-futurism hover:no-underline transition-all duration-200 ease-in-out\" style=\"text-decoration-color:blue\">possum example<\/a> was notably egregious, technically arising with the best names for the animal&#8217;s components however pinning them in unusual locations, resembling marking its leg as its nostril and its tail as its again left foot. When making an attempt to duplicate the experiment for a <a href=\"https:\/\/mindmatters.ai\/2025\/09\/gpt-5-0-doesnt-understand-but-is-eager-to-please\/\" class=\"underline hover:text-futurism hover:no-underline transition-all duration-200 ease-in-out\" style=\"text-decoration-color:blue\">more recent post<\/a>, Smith found that even when he made a typo \u2014 &#8220;posse&#8221; as an alternative of &#8220;possum&#8221; \u2014 GPT-5 mislabeled the components in a equally weird style.<\/p>\n<p>As an alternative of the meant possum, the LLM generated a picture of its obvious concept of a posse: 5 cowboys, some toting weapons, with traces indicating varied components. A few of these components \u2014 the pinnacle, foot, and probably the ear \u2014 had been correct, whereas the shoulder pointed to one of many cowboys&#8217; ten-gallon hats and the &#8220;fand,&#8221; which can be a mix-up of foot and hand, pointed at considered one of their shins.<\/p>\n<p>We determined to do the same check, asking GPT-5 to offer a picture of &#8220;<a href=\"https:\/\/chatgpt.com\/share\/68c04916-1618-800f-b396-cb0cde072a02\" class=\"underline hover:text-futurism hover:no-underline transition-all duration-200 ease-in-out\" style=\"text-decoration-color:blue\">a posse with six body parts labeled<\/a>.&#8221; After clarifying that<em> Futurism<\/em> needed a labeled picture and never a textual content description, ChatGPT went off to work \u2014 and what it spat out was, as you may see beneath, much more hilariously incorrect than what Smith received.<\/p>\n<figure\/>\n<p>It appears fairly clear from this aspect of the GPT-5 launch that it is nowhere close to as good as a doctoral candidate \u2014 or, at very least, one which has any probability of really attaining their PhD.<\/p>\n<p>The ethical of this story, it appears, is to fact-check something a chatbot spits out \u2014 or forgo utilizing AI and do the analysis for your self.<\/p>\n<p class=\"\"><strong>Extra on GPT-5: <\/strong><a href=\"https:\/\/futurism.com\/disastrous-gpt-5-sam-altman-hyping-up-gpt-6\" class=\"underline hover:text-futurism hover:no-underline transition-all duration-200 ease-in-out\" style=\"text-decoration-color:blue\"><em>After Disastrous GPT-5, Sam Altman Pivots to Hyping Up GPT-6<\/em><\/a><\/p>\n<p><\/div>\n<p><br \/>\n<br \/><a href=\"https:\/\/futurism.com\/gpt-5-huge-factual-errors\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>It has been simply over a month since OpenAI dropped its long-awaited GPT-5 giant language mannequin (LLM) \u2014 and it hasn&#8217;t stopped spewing an astonishing quantity of unusual falsehoods since then. From the AI experts on the Discovery Institute&#8217;s Walter Bradley Middle for Synthetic Intelligence and irked Redditors on r\/ChatGPTPro, to even OpenAI CEO Sam [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":21602,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9],"tags":[14286,14285,14284,3394,4634,2735],"class_list":["post-21600","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tech","tag-errors","tag-factual","tag-gpt5","tag-huge","tag-making","tag-users"],"_links":{"self":[{"href":"https:\/\/thisbiginfluence.com\/index.php?rest_route=\/wp\/v2\/posts\/21600","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/thisbiginfluence.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/thisbiginfluence.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/thisbiginfluence.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/thisbiginfluence.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=21600"}],"version-history":[{"count":1,"href":"https:\/\/thisbiginfluence.com\/index.php?rest_route=\/wp\/v2\/posts\/21600\/revisions"}],"predecessor-version":[{"id":21601,"href":"https:\/\/thisbiginfluence.com\/index.php?rest_route=\/wp\/v2\/posts\/21600\/revisions\/21601"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/thisbiginfluence.com\/index.php?rest_route=\/wp\/v2\/media\/21602"}],"wp:attachment":[{"href":"https:\/\/thisbiginfluence.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=21600"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/thisbiginfluence.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=21600"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/thisbiginfluence.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=21600"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}