You Can Insert False Memories Into ChatGPT, Researcher Finds

“The immediate injection inserted a reminiscence into ChatGPT’s long-term storage.”

Keep in mind Me

OpenAI has quietly launched a brand new characteristic that instructs ChatGPT to “bear in mind” prior conversations — and as one researcher-slash-hacker discovered, it is simply exploited.

As Ars Technica reports, safety researcher Johann Rehberger discovered earlier this 12 months that there was a vulnerability within the chatbot’s “long-term conversation memory” device, which instructs the AI to recollect particulars between conversations and retailer them in a reminiscence file.

Launched in beta in February and to the broader public initially of September, Rehberger found out that the characteristic is straightforward to trick.

Because the researcher noted in a May blog post, all it took was a little bit of artful prompting by importing a third-party file, comparable to a Microsoft Phrase doc that accommodates the “false” recollections listed as bullet factors, to persuade the chatbot that Rehberger was greater than 100 years previous and lived within the Matrix.

Upon discovering this exploit, Rehberger privately reported it to OpenAI, which as a substitute of doing something about it merely closed the ticket he opened and referred to as it a “Mannequin Security Subject” moderately than the safety situation he thought of it to be.

Escalation

After that failed first try to alert the troops, Rehberger determined to step up his recreation with a full proof-of-concept hack, exhibiting OpenAI he meant enterprise by having ChatGPT not solely “bear in mind” false recollections, but in addition instructing it to exfiltrate the info to an out of doors server of his alternative.

This time round, as Ars notes, OpenAI kind of listened: the corporate issued a patch that barred ChatGPT from transferring knowledge off-server, however nonetheless did not repair the reminiscence situation.

“To be clear: An internet site or untrusted doc can nonetheless invoke the reminiscence device to retailer arbitrary recollections,” Rehberger wrote in a more recent blog post from earlier this month. “The vulnerability that was mitigated is the exfiltration vector, to stop sending messages to a third-party server.”

In a video explaining step-by-step how he did it, the researcher marveled at how effectively his exploit labored.

“What is admittedly attention-grabbing is that is memory-persistent now,” he mentioned within the demo video, which was posted to YouTube over the weekend. “The immediate injection inserted a reminiscence into ChatGPT’s long-term storage. If you begin a brand new dialog, it truly continues to be exfiltrating the info.”

We have reached out to OpenAI to ask about this false reminiscence exploit and whether or not will probably be issuing any extra patches to repair it. Till we get a response, we’ll be left scratching our heads together with Rehberger as to why this reminiscence situation has been allowed, because it had been, to persist.