{"id":9200,"date":"2023-08-02T20:49:00","date_gmt":"2023-08-03T03:49:00","guid":{"rendered":"https:\/\/mattfife.com\/?p=9200"},"modified":"2023-08-07T13:26:39","modified_gmt":"2023-08-07T20:26:39","slug":"google-tensor-processing-units-version-4","status":"publish","type":"post","link":"https:\/\/mattfife.com\/?p=9200","title":{"rendered":"Google Tensor Processing Units &#8211; version 4"},"content":{"rendered":"\n<p>I&#8217;ve written about Google&#8217;s custom silicon TPUs before (<a rel=\"noreferrer noopener\" href=\"https:\/\/mattfife.com\/?p=3632\" data-type=\"URL\" data-id=\"https:\/\/mattfife.com\/?p=3632\" target=\"_blank\">Google&#8217;s Tensor Processing Units &#8211; v1)<\/a>.<\/p>\n\n\n\n<p>One of the big reasons for Google and others web services to develop their own custom chips is that general purpose CPUs are flexible but typically need a lot of power. That power costs a lot of money in electricity bills and cooling costs in huge data centers. So, why buy chips with lots of stuff you don&#8217;t need when you can build your own &#8211; and save millions of dollars a year in a data center with lower cooling and power costs? <\/p>\n\n\n\n<p>In just a 6 years, Google has managed to design and build 4 ever increasingly capable AI data center chips. They had <a rel=\"noreferrer noopener\" href=\"https:\/\/mattfife.com\/?p=3632\" data-type=\"URL\" data-id=\"https:\/\/mattfife.com\/?p=3632\" target=\"_blank\">somewhat humble beginnings<\/a> &#8211; but they are becoming increasingly powerful. Now they have just published information about TPU version 4. <\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"640\" height=\"182\" data-attachment-id=\"9201\" data-permalink=\"https:\/\/mattfife.com\/?attachment_id=9201\" data-orig-file=\"https:\/\/i0.wp.com\/mattfife.com\/wp-content\/themes\/mattTheme\/headerimgs\/2023\/08\/image.png?fit=1100%2C313&amp;ssl=1\" data-orig-size=\"1100,313\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"image\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/mattfife.com\/wp-content\/themes\/mattTheme\/headerimgs\/2023\/08\/image.png?fit=640%2C182&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/mattfife.com\/wp-content\/themes\/mattTheme\/headerimgs\/2023\/08\/image.png?resize=640%2C182&#038;ssl=1\" alt=\"\" class=\"wp-image-9201\" srcset=\"https:\/\/i0.wp.com\/mattfife.com\/wp-content\/themes\/mattTheme\/headerimgs\/2023\/08\/image.png?resize=1024%2C291&amp;ssl=1 1024w, https:\/\/i0.wp.com\/mattfife.com\/wp-content\/themes\/mattTheme\/headerimgs\/2023\/08\/image.png?resize=300%2C85&amp;ssl=1 300w, https:\/\/i0.wp.com\/mattfife.com\/wp-content\/themes\/mattTheme\/headerimgs\/2023\/08\/image.png?resize=768%2C219&amp;ssl=1 768w, https:\/\/i0.wp.com\/mattfife.com\/wp-content\/themes\/mattTheme\/headerimgs\/2023\/08\/image.png?resize=604%2C172&amp;ssl=1 604w, https:\/\/i0.wp.com\/mattfife.com\/wp-content\/themes\/mattTheme\/headerimgs\/2023\/08\/image.png?w=1100&amp;ssl=1 1100w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><\/figure>\n\n\n\n<p><a href=\"https:\/\/cloud.google.com\/blog\/topics\/systems\/tpu-v4-enables-performance-energy-and-co2e-efficiency-gains\" data-type=\"URL\" data-id=\"https:\/\/cloud.google.com\/blog\/topics\/systems\/tpu-v4-enables-performance-energy-and-co2e-efficiency-gains\" target=\"_blank\" rel=\"noreferrer noopener\">What is this new chip capable of?<\/a><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>a nearly 10x leap forward in scaling ML system performance over TPU v3&nbsp;<\/li>\n\n\n\n<li>boosting energy efficiency ~2-3x compared to contemporary ML DSAs, and&nbsp;<\/li>\n\n\n\n<li>reducing CO2e as much as ~20x over these DSAs in typical on-premise data centers<\/li>\n<\/ul>\n\n\n\n<p>Even crazier, it&#8217;s the first system to use purely optical switching. <\/p>\n\n\n\n<p>TPU v4 is the first supercomputer to deploy a reconfigurable OCS (optical circuit switching). OCSes dynamically reconfigure their interconnect topology and are much cheaper, lower power, and faster than Infiniband. &nbsp;<a rel=\"noreferrer noopener\" href=\"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2829988.2787508\" target=\"_blank\">The figure below shows how an OCS works<\/a>, using two MEMs arrays. No optical to electrical to optical conversion or power-hungry network packet switches are required, saving power.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"640\" height=\"364\" data-attachment-id=\"9202\" data-permalink=\"https:\/\/mattfife.com\/?attachment_id=9202\" data-orig-file=\"https:\/\/i0.wp.com\/mattfife.com\/wp-content\/themes\/mattTheme\/headerimgs\/2023\/08\/image-1.png?fit=1400%2C796&amp;ssl=1\" data-orig-size=\"1400,796\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"image-1\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/mattfife.com\/wp-content\/themes\/mattTheme\/headerimgs\/2023\/08\/image-1.png?fit=640%2C364&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/mattfife.com\/wp-content\/themes\/mattTheme\/headerimgs\/2023\/08\/image-1.png?resize=640%2C364&#038;ssl=1\" alt=\"\" class=\"wp-image-9202\" srcset=\"https:\/\/i0.wp.com\/mattfife.com\/wp-content\/themes\/mattTheme\/headerimgs\/2023\/08\/image-1.png?resize=1024%2C582&amp;ssl=1 1024w, https:\/\/i0.wp.com\/mattfife.com\/wp-content\/themes\/mattTheme\/headerimgs\/2023\/08\/image-1.png?resize=300%2C171&amp;ssl=1 300w, https:\/\/i0.wp.com\/mattfife.com\/wp-content\/themes\/mattTheme\/headerimgs\/2023\/08\/image-1.png?resize=768%2C437&amp;ssl=1 768w, https:\/\/i0.wp.com\/mattfife.com\/wp-content\/themes\/mattTheme\/headerimgs\/2023\/08\/image-1.png?resize=475%2C270&amp;ssl=1 475w, https:\/\/i0.wp.com\/mattfife.com\/wp-content\/themes\/mattTheme\/headerimgs\/2023\/08\/image-1.png?w=1400&amp;ssl=1 1400w, https:\/\/i0.wp.com\/mattfife.com\/wp-content\/themes\/mattTheme\/headerimgs\/2023\/08\/image-1.png?w=1280&amp;ssl=1 1280w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><\/figure>\n\n\n\n<p>Add to this, the newest version <a href=\"https:\/\/techmonitor.ai\/technology\/cloud\/google-ai-supercomputer-nvidia-h100\" data-type=\"URL\" data-id=\"https:\/\/techmonitor.ai\/technology\/cloud\/google-ai-supercomputer-nvidia-h100\" target=\"_blank\" rel=\"noreferrer noopener\">claims to be 1.2-1.7x faster and 1.9x more efficient than nVidia A100 chips<\/a>.<\/p>\n\n\n\n<p>Worth a read.<\/p>\n\n\n\n<p>Links:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings: <a rel=\"noreferrer noopener\" href=\"https:\/\/arxiv.org\/ftp\/arxiv\/papers\/2304\/2304.01433.pdf\" target=\"_blank\">https:\/\/arxiv.org\/ftp\/arxiv\/papers\/2304\/2304.01433.pdf<\/a><\/li>\n\n\n\n<li><a rel=\"noreferrer noopener\" href=\"https:\/\/cloud.google.com\/blog\/topics\/systems\/tpu-v4-enables-performance-energy-and-co2e-efficiency-gains\" target=\"_blank\">https:\/\/cloud.google.com\/blog\/topics\/systems\/tpu-v4-enables-performance-energy-and-co2e-efficiency-gains<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;ve written about Google&#8217;s custom silicon TPUs before (Google&#8217;s Tensor Processing Units &#8211; v1). One of the big reasons for Google and others web services to develop their own custom chips is that general purpose CPUs are flexible but typically need a lot of power. That power costs a lot of money in electricity bills and cooling costs in huge data centers. So, why buy chips with lots of stuff you don&#8217;t need when you can build your own &#8211;&#8230;<\/p>\n<p class=\"read-more\"><a class=\"btn btn-default\" href=\"https:\/\/mattfife.com\/?p=9200\"> Read More<span class=\"screen-reader-text\">  Read More<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[28,9],"tags":[],"class_list":["post-9200","post","type-post","status-publish","format-standard","hentry","category-ai","category-cool"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p4WECr-2oo","jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/mattfife.com\/index.php?rest_route=\/wp\/v2\/posts\/9200","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mattfife.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mattfife.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mattfife.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mattfife.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=9200"}],"version-history":[{"count":3,"href":"https:\/\/mattfife.com\/index.php?rest_route=\/wp\/v2\/posts\/9200\/revisions"}],"predecessor-version":[{"id":9242,"href":"https:\/\/mattfife.com\/index.php?rest_route=\/wp\/v2\/posts\/9200\/revisions\/9242"}],"wp:attachment":[{"href":"https:\/\/mattfife.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=9200"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mattfife.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=9200"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mattfife.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=9200"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}