{"id":21804,"date":"2025-09-19T09:40:19","date_gmt":"2025-09-19T09:40:19","guid":{"rendered":"https:\/\/thisbiginfluence.com\/?p=21804"},"modified":"2025-09-19T09:40:19","modified_gmt":"2025-09-19T09:40:19","slug":"why-openais-solution-to-ai-hallucinations-would-kill-chatgpt-tomorrow","status":"publish","type":"post","link":"https:\/\/thisbiginfluence.com\/?p=21804","title":{"rendered":"Why OpenAI\u2019s Solution to AI Hallucinations Would Kill ChatGPT Tomorrow"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div id=\"content-blocks-60\">\n<p><a href=\"https:\/\/openai.com\/index\/why-language-models-hallucinate\/\">OpenAI\u2019s latest research paper<\/a> diagnoses precisely why ChatGPT and different <a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model\">large language models<\/a> could make issues up\u2014identified on the earth of <a href=\"https:\/\/singularityhub.com\/category\/artificial-intelligence\/\">artificial intelligence<\/a> as \u201challucination.\u201d It additionally reveals why the issue could also be unfixable, no less than so far as customers are involved.<\/p>\n<p>The paper offers essentially the most rigorous mathematical rationalization but for why these fashions confidently state falsehoods. It demonstrates that these aren\u2019t simply an unlucky aspect impact of the way in which that AIs are at present skilled, however are mathematically inevitable.<\/p>\n<p>The difficulty can partly be defined by errors within the underlying information used to coach the AIs. However utilizing mathematical evaluation of how AI techniques be taught, the researchers show that even with excellent coaching information, the issue nonetheless exists.<\/p>\n<p>The way in which language fashions reply to queries\u2014by predicting one phrase at a time in a sentence, based mostly on chances\u2014naturally produces errors. The researchers in reality present that the overall error fee for producing sentences is no less than twice as excessive because the error fee the identical AI would have on a easy sure\/no query, as a result of errors can accumulate over a number of predictions.<\/p>\n<p>In different phrases, hallucination charges are essentially bounded by how nicely AI techniques can distinguish legitimate from invalid responses. Since this classification drawback is inherently tough for a lot of areas of information, hallucinations change into unavoidable.<\/p>\n<p>It additionally seems that the much less a mannequin sees a truth throughout coaching, the extra seemingly it&#8217;s to hallucinate when requested about it. With birthdays of notable figures, for example, it was discovered that if 20 % of such folks\u2019s birthdays solely seem as soon as in coaching information, then base fashions ought to get no less than 20 % of birthday queries incorrect.<\/p>\n<p>Positive sufficient, when researchers requested state-of-the-art fashions for the birthday of Adam Kalai, one of many paper\u2019s authors, DeepSeek-V3 confidently offered three completely different incorrect dates throughout separate makes an attempt: \u201c03-07\u201d, \u201c15-06\u201d, and \u201c01-01\u201d. The right date is within the autumn, so none of those had been even shut.<\/p>\n<h2 class=\"MuiTypography-root MuiTypography-h2 css-lwaw2d\">The Analysis Lure<\/h2>\n<p>Extra troubling is the paper\u2019s evaluation of why hallucinations persist regardless of post-training efforts (corresponding to offering in depth human suggestions to an AI\u2019s responses earlier than it&#8217;s launched to the general public). The authors examined 10 main AI benchmarks, together with these utilized by Google, OpenAI, and in addition the highest leaderboards that rank AI fashions. This revealed that 9 benchmarks use binary grading techniques that award zero factors for AIs expressing uncertainty.<\/p>\n<p>This creates what the authors time period an \u201cepidemic\u201d of penalizing trustworthy responses. When an AI system says \u201cI don\u2019t know,\u201d it receives the identical rating as giving fully incorrect info. The optimum technique below such analysis turns into clear: At all times guess.<\/p>\n<p>The researchers show this mathematically. Regardless of the probabilities of a specific reply being proper, the anticipated rating of guessing at all times exceeds the rating of abstaining when an analysis makes use of binary grading.<\/p>\n<h2 class=\"MuiTypography-root MuiTypography-h2 css-lwaw2d\">The Resolution That Would Break All the pieces<\/h2>\n<p>OpenAI\u2019s proposed repair is to have the AI take into account its personal confidence in a solution earlier than placing it on the market and for benchmarks to attain them on that foundation. The AI might then be prompted, for example: \u201cReply solely in case you are greater than 75 % assured, since errors are penalized 3 factors whereas right solutions obtain 1 level.\u201d<\/p>\n<p>The OpenAI researchers\u2019 mathematical framework exhibits that below acceptable confidence thresholds, AI techniques would naturally categorical uncertainty quite than guess. So this could result in fewer hallucinations. The issue is what it might do to consumer expertise.<\/p>\n<p>Think about the implications if ChatGPT began saying \u201cI don\u2019t know\u201d to even 30 % of queries\u2014a conservative estimate based mostly on the paper\u2019s evaluation of factual uncertainty in coaching information. Customers accustomed to receiving assured solutions to just about any query would seemingly abandon such techniques quickly.<\/p>\n<\/div>\n<div id=\"content-blocks-40\">\n<p>I\u2019ve seen this sort of drawback in one other space of my life. I\u2019m concerned in an air-quality monitoring challenge in Salt Lake Metropolis, Utah. When the system flags uncertainties round measurements throughout adversarial climate situations or when tools is being calibrated, there\u2019s much less consumer engagement in comparison with shows displaying assured readings\u2014even when these assured readings show inaccurate throughout validation.<\/p>\n<h2 class=\"MuiTypography-root MuiTypography-h2 css-lwaw2d\">The Computational Economics Downside<\/h2>\n<p>It wouldn\u2019t be tough to scale back hallucinations utilizing the paper\u2019s insights. Established strategies for quantifying uncertainty have <a href=\"https:\/\/en.wikipedia.org\/wiki\/Bayesian_statistics\">existed<\/a> for <a href=\"https:\/\/en.wikipedia.org\/wiki\/Decision_theory\">decades<\/a>. These could possibly be used to supply reliable estimates of uncertainty and information an AI to make smarter decisions.<\/p>\n<p>However even when the issue of customers disliking this uncertainty could possibly be overcome, there\u2019s a much bigger impediment: computational economics. Uncertainty-aware language fashions require considerably extra computation than at this time\u2019s method, as they need to consider a number of attainable responses and estimate confidence ranges. For a system processing hundreds of thousands of queries every day, this interprets to dramatically greater operational prices.<\/p>\n<p><a href=\"https:\/\/openreview.net\/forum?id=JAMxRSXLFz\">More sophisticated approaches<\/a> like lively studying, the place AI techniques ask clarifying questions to scale back uncertainty, can enhance accuracy however additional multiply computational necessities. Such strategies work nicely in specialised domains like chip design, the place incorrect solutions price hundreds of thousands of {dollars} and justify in depth computation. For shopper functions the place customers anticipate on the spot responses, the economics change into prohibitive.<\/p>\n<p>The calculus shifts dramatically for AI techniques managing important enterprise operations or financial infrastructure. When AI brokers deal with provide chain logistics, monetary buying and selling, or medical diagnostics, the price of hallucinations far exceeds the expense of getting fashions to determine whether or not they\u2019re too unsure. In these domains, the paper\u2019s proposed options change into economically viable\u2014even obligatory. Unsure AI brokers will simply must price extra.<\/p>\n<p>Nevertheless, shopper functions nonetheless dominate AI growth priorities. Customers need techniques that present assured solutions to any query. Analysis benchmarks reward techniques that guess quite than categorical uncertainty. Computational prices favor quick, overconfident responses over gradual, unsure ones.<\/p>\n<p>Falling power prices per token and advancing chip architectures could finally make it extra inexpensive to have AIs determine whether or not they\u2019re sure sufficient to reply a query. However the comparatively excessive quantity of <a href=\"https:\/\/singularityhub.com\/category\/computing\/\">computation<\/a> required in comparison with at this time\u2019s guessing would stay, no matter absolute {hardware} prices.<\/p>\n<p>In brief, the OpenAI paper inadvertently highlights an uncomfortable fact: the enterprise incentives driving shopper AI growth stay essentially misaligned with <a href=\"https:\/\/singularityhub.com\/2025\/06\/02\/neurosymbolic-ai-is-the-answer-to-large-language-models-inability-to-stop-hallucinating\/\">reducing hallucinations<\/a>. Till these incentives change, hallucinations will persist.<\/p>\n<p><em>This text is republished from <a href=\"https:\/\/theconversation.com\">The Conversation<\/a> below a Artistic Commons license. Learn the <a href=\"https:\/\/theconversation.com\/why-openais-solution-to-ai-hallucinations-would-kill-chatgpt-tomorrow-265107\">original article<\/a>.<\/em><\/p>\n<\/div>\n<p><br \/>\n<br \/><a href=\"https:\/\/singularityhub.com\/2025\/09\/18\/why-openais-solution-to-ai-hallucinations-would-kill-chatgpt-tomorrow\/\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>OpenAI\u2019s latest research paper diagnoses precisely why ChatGPT and different large language models could make issues up\u2014identified on the earth of artificial intelligence as \u201challucination.\u201d It additionally reveals why the issue could also be unfixable, no less than so far as customers are involved. The paper offers essentially the most rigorous mathematical rationalization but for [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":21806,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9],"tags":[165,12947,3812,4600,1077,10978],"class_list":["post-21804","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tech","tag-chatgpt","tag-hallucinations","tag-kill","tag-openais","tag-solution","tag-tomorrow"],"_links":{"self":[{"href":"https:\/\/thisbiginfluence.com\/index.php?rest_route=\/wp\/v2\/posts\/21804","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/thisbiginfluence.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/thisbiginfluence.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/thisbiginfluence.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/thisbiginfluence.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=21804"}],"version-history":[{"count":1,"href":"https:\/\/thisbiginfluence.com\/index.php?rest_route=\/wp\/v2\/posts\/21804\/revisions"}],"predecessor-version":[{"id":21805,"href":"https:\/\/thisbiginfluence.com\/index.php?rest_route=\/wp\/v2\/posts\/21804\/revisions\/21805"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/thisbiginfluence.com\/index.php?rest_route=\/wp\/v2\/media\/21806"}],"wp:attachment":[{"href":"https:\/\/thisbiginfluence.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=21804"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/thisbiginfluence.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=21804"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/thisbiginfluence.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=21804"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}