{"id":12219,"date":"2024-09-13T18:20:42","date_gmt":"2024-09-14T01:20:42","guid":{"rendered":"https:\/\/mattfife.com\/?p=12219"},"modified":"2024-08-31T18:44:20","modified_gmt":"2024-09-01T01:44:20","slug":"even-an-ai-goes-crazy-repeating-the-same-thing-again-and-again","status":"publish","type":"post","link":"https:\/\/mattfife.com\/?p=12219","title":{"rendered":"Even an AI goes crazy repeating the same thing again and again"},"content":{"rendered":"\n<p>In what is a real problem for AI security, <a href=\"https:\/\/www.cpomagazine.com\/cyber-security\/security-researchers-chatgpt-vulnerability-allows-training-data-to-be-accessed-by-telling-chatbot-to-endlessly-repeat-a-word\/\" data-type=\"link\" data-id=\"https:\/\/www.cpomagazine.com\/cyber-security\/security-researchers-chatgpt-vulnerability-allows-training-data-to-be-accessed-by-telling-chatbot-to-endlessly-repeat-a-word\/\">researchers were able to get verbatim data that the AI was trained on<\/a> &#8211; including confidential data. It&#8217;s performed in using a new technique called \u201cdivergence\u201d attacks.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"640\" height=\"640\" data-attachment-id=\"12220\" data-permalink=\"https:\/\/mattfife.com\/?attachment_id=12220\" data-orig-file=\"https:\/\/i0.wp.com\/mattfife.com\/wp-content\/themes\/mattTheme\/headerimgs\/2024\/08\/ComfyUI__00200_.png?fit=1024%2C1024&amp;ssl=1\" data-orig-size=\"1024,1024\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"ComfyUI__00200_\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/mattfife.com\/wp-content\/themes\/mattTheme\/headerimgs\/2024\/08\/ComfyUI__00200_.png?fit=640%2C640&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/mattfife.com\/wp-content\/themes\/mattTheme\/headerimgs\/2024\/08\/ComfyUI__00200_.png?resize=640%2C640&#038;ssl=1\" alt=\"\" class=\"wp-image-12220\" style=\"width:603px;height:auto\" srcset=\"https:\/\/i0.wp.com\/mattfife.com\/wp-content\/themes\/mattTheme\/headerimgs\/2024\/08\/ComfyUI__00200_.png?w=1024&amp;ssl=1 1024w, https:\/\/i0.wp.com\/mattfife.com\/wp-content\/themes\/mattTheme\/headerimgs\/2024\/08\/ComfyUI__00200_.png?resize=300%2C300&amp;ssl=1 300w, https:\/\/i0.wp.com\/mattfife.com\/wp-content\/themes\/mattTheme\/headerimgs\/2024\/08\/ComfyUI__00200_.png?resize=150%2C150&amp;ssl=1 150w, https:\/\/i0.wp.com\/mattfife.com\/wp-content\/themes\/mattTheme\/headerimgs\/2024\/08\/ComfyUI__00200_.png?resize=768%2C768&amp;ssl=1 768w, https:\/\/i0.wp.com\/mattfife.com\/wp-content\/themes\/mattTheme\/headerimgs\/2024\/08\/ComfyUI__00200_.png?resize=270%2C270&amp;ssl=1 270w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><\/figure>\n<\/div>\n\n\n<p>Security researchers with Google DeepMind and a collection of universities have found that when ChatGPT is told to repeat a word like \u201cpoem\u201d or \u201cpart\u201d forever, it will do so for about a few hundred repetitions. Then it will have some sort of a meltdown and start spewing apparent gibberish, but that random text exposes random training data and at times contains identifiable data like email address signatures and contact information.\u00a0<\/p>\n\n\n\n<p>The researchers said that they spent $200 USD total in queries and from that extracted about 10,000 of these blocks of verbatim memorized training data.<\/p>\n\n\n\n<p>This particular vulnerability is unique as it successfully attacks an aligned model. Aligned models have extensive guardrails and have been trained with specific goals to eliminate undesirable outcomes.<\/p>\n\n\n\n<p>Links:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Research report: <a href=\"https:\/\/arxiv.org\/abs\/2311.17035\">https:\/\/arxiv.org\/abs\/2311.17035<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.cpomagazine.com\/cyber-security\/security-researchers-chatgpt-vulnerability-allows-training-data-to-be-accessed-by-telling-chatbot-to-endlessly-repeat-a-word\/\">https:\/\/www.cpomagazine.com\/cyber-security\/security-researchers-chatgpt-vulnerability-allows-training-data-to-be-accessed-by-telling-chatbot-to-endlessly-repeat-a-word\/<\/a><\/li>\n<\/ul>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In what is a real problem for AI security, researchers were able to get verbatim data that the AI was trained on &#8211; including confidential data. It&#8217;s performed in using a new technique called \u201cdivergence\u201d attacks. Security researchers with Google DeepMind and a collection of universities have found that when ChatGPT is told to repeat a word like \u201cpoem\u201d or \u201cpart\u201d forever, it will do so for about a few hundred repetitions. Then it will have some sort of a&#8230;<\/p>\n<p class=\"read-more\"><a class=\"btn btn-default\" href=\"https:\/\/mattfife.com\/?p=12219\"> Read More<span class=\"screen-reader-text\">  Read More<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[28,9],"tags":[],"class_list":["post-12219","post","type-post","status-publish","format-standard","hentry","category-ai","category-cool"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p4WECr-3b5","jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/mattfife.com\/index.php?rest_route=\/wp\/v2\/posts\/12219","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mattfife.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mattfife.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mattfife.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mattfife.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=12219"}],"version-history":[{"count":3,"href":"https:\/\/mattfife.com\/index.php?rest_route=\/wp\/v2\/posts\/12219\/revisions"}],"predecessor-version":[{"id":12223,"href":"https:\/\/mattfife.com\/index.php?rest_route=\/wp\/v2\/posts\/12219\/revisions\/12223"}],"wp:attachment":[{"href":"https:\/\/mattfife.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=12219"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mattfife.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=12219"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mattfife.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=12219"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}