Looks like Devin is not as powerful as Cognition makes it out to be.
A month ago, Cognition proudly presented Devin, "the first AI software engineer," which can allegedly not only solve engineering problems but also successfully complete tasks on freelance-focused websites. The creators showed off the AI's ability on a real Upwork case, wowing audiences and making real software engineers fear for their jobs.
However, it looks like they can breathe freely for a little longer as Cognition has recently been accused of lying about Devin's performance in its promo videos, including this particular task.
Full disclosure: I'm not a software engineer so I'll try to make it as simple as possible. If you'd like to learn the tech details, check out the sources listed in this article.
A YouTube channel called Internet of Bugs has recently published a video succinctly named "Debunking Devin: "First AI Software Engineer" Upwork lie exposed." There, its host dissects this example of Devin completing an Upwork project:
Later, the creator of this task, Felipe "Computer Vision Engineer," also went to YouTube to point out what the AI did wrong, and there are some crucial details to examine.
First of all, Devin failed the most important part of the job – understanding the problem. You see, the original post said: "I am looking to make inferences with the models in this repository. Your deliverable will be detailed instructions on how to do it in an EC2 instance in AWS. Please provide your estimate to complete this job."
Felipe couldn't meet the requirements and match different versions of software, so the AI needed to do it for him. However, Cognition fed only the first sentence to Devin and told it to "figure it out." Considering the request was in the second part, it's a significant error on the company's part, so the AI couldn't deliver the expected result, naturally.
Moreover, as machine learning engineer and AI researcher Devansh pointed out, the job itself was seemingly "cherry-picked to put Devin in the best light" as you can see "road damage" in the search box, meaning it's not just some random issue Devin was supposed to solve. On the other hand, it's not unusual to see specific examples chosen for promo materials.
Another fantastic ability of the first AI software engineer is to find bugs that humans miss. And it did encounter an error in one of the files. The problem is that file was not in the repository and was created by Devin itself, so it fixed its own error – admirable but not exactly groundbreaking.
Image credit: Devansh
So Devin does solve some kind of task, just not the one it was supposed to do. Devansh also noticed that the whole solution took the AI many hours. In comparison, Internet of Bugs managed to answer the real question in about 30 minutes. So I think human software engineers won't be out of work any time soon, even with tools as powerful as Devin.
And it is powerful, but this entire presentation was damaged by – ironically – a human mistake.
Devin will likely show us more of its capabilities soon. Meanwhile, I strongly suggest watching Internet of Bugs' video and reading Devansh's article if you'd like to know more.
Also, join our 80 Level Talent platform and our Telegram channel, follow us on Instagram, Twitter, and LinkedIn, where we share breakdowns, the latest news, awesome artworks, and more.