Spot acts differently depending on the personality.
Robots have just become a little smarter. Boston Dynamics, an engineering company famous for its robotic inventions, integrated ChatGPT into one of its robots, Spot. The dog-like companion can offer you a guided tour around the building and tell you about each exhibit.
Spot has several personalities to choose from. Depending on who it "role-plays" as, its voice, tone, and some personalized remarks change. You can ask it anything, and the robot will reply as a 1920s archaeologist, a Shakespearean actor, or "Josh" – a "sarcastic personality", although it sounds like an overly dramatic teenager to me.
Image credit: Boston Dynamics
To see what's happening around it, Spot uses Visual Question Answering (VQA) models that can caption images and answer simple questions about them. The image is updated about once a second and is fed to the system in the form of a text prompt.
To make the robot hear and speak, Boston Dynamics 3D printed a vibration-resistant mount for a Respeaker V2 speaker, a ring-array microphone with LEDs on it, and attached it via USB to Spot's EAP 2 payload.
"The actual control over the robot is delegated to an offboard computer, either a desktop PC or a laptop, which communicates with Spot over its SDK. We implemented a simple Spot SDK service to communicate audio with the EAP 2."
Image credit: Boston Dynamics
Spot's text-based responses are run through the ElevenLabs text-to-speech service. To reduce latency, the engineers stream text to the tool as “phrases” in parallel and play back the generated audio serially.
Finally, the robot got some body language. It can detect and track moving objects to guess where the nearest person is and turn its arm toward them. "We used a lowpass filter on the generated speech and turned this into a gripper trajectory to mimic speech sort of like the mouth of a puppet. This illusion was enhanced by adding silly costumes to the gripper and googly eyes."
The best part of this whole experiment is the AI's logic, which didn't need much tinkering. For example, when asked who its “parents” were, it went to the place where the previous versions of Spot were and said they were its “elders”.
This, of course, doesn’t mean the LLM is conscious, it just shows "the power of statistical association" between the concepts of “parents” and “old.”
This all sounds delightful, but Boston Dynamics notes some limitations in the demo. For instance, Spot can have hallucinations – a typical LLM problem when AI just makes up facts. An interesting example of this can be found in this article about The Sims-inspired town full of AI agents.
There is also an issue of latency: you sometimes have to wait about 6 seconds to get a reply from the robot.
Despite the problems, this is a great project that will advance research further. Boston Dynamics is going to continue exploring this combination of robotics and AI trying to make robots perform better when working with and around people.
Read more about it here and join our 80 Level Talent platform and our Telegram channel, follow us on Instagram, Twitter, and LinkedIn, where we share breakdowns, the latest news, awesome artworks, and more.