Browsed by
Category: AI

Which AI’s are best?

Which AI’s are best?

Retired programmer Dave’s Garage decided to look into all the major LLM AI models and gives his feedback on using them for the last few months.

Which should you use? It depends on what you’re trying to do. It also depends on how you’re testing it – because others come up with different ratings.

Coding

  • Claude 3.7/4 is great for serious/production oriented code (after a code review). Probably the winner here.
  • ChatGPT 4.1 is a good copilot for prototyping and exploration
  • Grok 3 – responds quickly/good speed
  • Gemini 2.5 Pro – does what you ask but not much more

Research

  • Claude 3.7/4 for carefully explained reasoning and good for references
  • ChatGPT 4.1for clear overviews
  • Grok 3 for current event
  • Gemini 2.5 Profor large, structured input and extraction

Storytelling

  • Claude 3.7/4 – literary and reflective
  • ChatGPT 4.1- most emotionally resonate
  • Grok 3 – flexible and imaginative
  • Gemini 2.5 Pro- informative and expandable

News

  • Grock 3 wins this easily – gives news and what people are saying about it
  • ChatGPT 4.1- Can handle current events decently but slower to pick up news
  • Claude 3.7/4- largely sits out news and doesn’t comment unless widely verified
  • Gemini 2.5 Pro- factual, accurate, but rarely first

He also discusses the different context sizes when it relates to the tasks. Bigger windows cost more but can allow you to summarize huge codebases or 60 page complex legal documents.

ChatGPT can handle 128,000 tokens or about 96,000 words (1 token roughly equals 4 characters). Claude has 200,000 tokens or about 150,000 words. Gemini 2.5 Pro and Grock 3 claim to have 1 million tokens.

If all you’re doing is summarizing emails, ChatGPT could be just fine. But if you need to make sense of large codebases or summarize large legal briefs, Gemini or Grock will be better at avoiding hallucinations or leaving gaps. There are some that believe that these windows might actually shrink if the systems are under heavy load (Grock in particular).

AI written mushroom foraging books will kill you

AI written mushroom foraging books will kill you

Atomic Shrimp noticed that a number of recent mushroom foraging books had errors. These weren’t simple errors, if you ate some of the things they say you could eat, you could destroy your kidneys or even kill yourself. How did this happen? He realized a lot of these books were being generated by AI and it looks very much like the people that put them together didn’t even know how to fact check them.

It’s a great discussion of how AI generated books have become very prevalent and the dangers of people just churning out AI slop without quality control.

25% of all new code at Google is AI generated

25% of all new code at Google is AI generated

Google’s CEO revealed that AI systems now generate more than a quarter of new code for its products, with human programmers overseeing the computer-generated contributions. The statement, made during Google’s Q3 2024 earnings call, shows how AI tools are already having a sizable impact on software development.

Stack Overflow’s 2024 Developer Survey, over 76 percent of all respondents “are using or are planning to use AI tools in their development process this year,” with 62 percent actively using them. A 2023 GitHub survey found that 92 percent of US-based software developers are “already using AI coding tools both in and outside of work.”

https://arstechnica.com/ai/2024/10/google-ceo-says-over-25-of-new-google-code-is-generated-by-ai

AI shows you the faces of ‘average’ British cheaters

AI shows you the faces of ‘average’ British cheaters

2,000 Brits recently took part in a study by online casino MrQ to see if AI could make a picture of the ‘average’ British cheater. They collected descriptions and then asked AI to make a composite image.

The average male cheater that seemed to be someone in his forties, with blue-grey eyes, small lips, short facial hair and little to no head hair. Someone who has a larger nose and visible frown lines.

The typical woman who cheats, apparently, has dark-haired and is in her early fifties. AI reckons that they are more likely to have a small nose and a medium-sized pout.

Links:

Students find it shockingly easy to create near realtime Facial Recognition Glasses

Students find it shockingly easy to create near realtime Facial Recognition Glasses

Kashif Hoda was waiting for a train near Harvard Square when a young man wearing glasses asked him for directions. A few minutes later, as Mr. Hoda’s train was pulling into the station, the young man, who was a junior at Harvard University named AnhPhu Nguyen, approached him again.

“Do you happen to be the person working on minority stuff for Muslims in India?” Mr. Nguyen asked.

Mr. Hoda was shocked. He worked in biotechnology, but had previously been a journalist and had written about marginalized communities in India.

AnhPhu Nguyen and Caine Ardayfio had created glasses that automatically identify people they look at. Nguyen and Ardayfio are both 21 and studying engineering at Harvard. They said in an interview that their system relied on already widely available technologies, including:

  • Meta glasses, which livestream video to Instagram.
  • Face detection software, which captures faces that appear on the livestream.
  • A face search engine called PimEyes, which finds sites on the internet where a person’s face appears.
  • A ChatGPT-like tool that was able to parse the results from PimEyes to suggest a person’s name and occupation, as well as look up the name on a people search site to find a home address, a phone number and relatives.

“All the tools were there,” Mr. Nguyen said. “We just had the idea to combine them together.” Nguyen posted a video of it working. Watching it is creepy to say the least. Imagine walking in public and anyone, at any time, can know exactly who you are and anything you’ve ever said or done.

Articles:

AI hiring backfiring?

AI hiring backfiring?

Shopify CEO Tobi Lutke says that employees/managers must prove jobs can’t be done by AI before asking for more headcount. Klarna CEO Sebastian Siemiatkowski says it has shrunk it’s workforce by 40% by using AI.

But Klarna has changed it’s tone. They started re-hiring real customer support people after it realized it’s AI customer service agents weren’t cutting it.

Carnegie Mellon tried to staff a fake software company full of AI employees – and it went very poorly (paper here: https://arxiv.org/pdf/2412.14161).

It is estimated that 4 in 10 business leaders have laid off employees as a result of deploying AI — and of those, 55% admit they made the wrong decisions about it, according to a recent survey

AI has it’s place, but knowing what those jobs are (and which are not good for AI) is the magic.

Newbie vibe coded a top-ranked mobile game

Newbie vibe coded a top-ranked mobile game

Ron decided to learn to code in 2024. He proceed to use AI to vibe-code a game called Letterlike. It’s now one of the top ranked mobile games on Steam and the #1 paid word game on Android.

He tells his story on this reddit post.

Vibe coding is here. People are building viable commercial products with less than a year of coding experience. Sure this isn’t a solution that needs a lot of security like an online service, but here it is.

AI Snow White better than the remake?

AI Snow White better than the remake?

The very controversial and firebrand issues/actors behind the Snow White remake has turned into a box office disaster and resulted in an apocalyptical round of firing at Disney (and rightly so).

It should tell you something when a decades old IP powerhouse like Disney and all their marketing efforts could only generate 18M views in 4 months on it’s official trailer, and an AI generated parody done by a likely single person YouTube channel gets 1.4M views in 12 days. And the AI content is honestly better.

Personally, I think Wicked AI‘s live version of the Little Mermaid with Danny DeVito is even funnier