YOLO | OpenAI Five | ANYmal |
---|---|---|
YOLO is (or was when I was keeping track) the state-of-the-art real-time image classification model. It (or tinyYOLO, a light-weight modification) is capable of running at 30FPS, with low latency, on hardware you can fit into a battery-powered robot. It has a somewhat limited set of labels - you can see in the sample video that it's struggling to decide if a falcon is a dog or a cat. Nonetheless, it's an extremely impressive model. |
OpenAI Five is a DOTA 2 bot made by the people who made GPT-3, and their insane amount of Musk/Gates money. There's a lot of smart people working at OpenAI, but their ability to just throw money (and therefore computing power) at the problem is what really puts them out ahead of the rest of the field. Anyways, this bot is as good as, or better than, the best DOTA 2 players in the world (at least when dealing with a 17-character subset of the full 122-character roster). I find this a lot more impressive than similar victories in games like Chess or Go because DOTA 2 much more closely resembles robotics (i.e. real-world) problems. It's real-time, there's teams, the set of options available is continuous (i.e. can pick any angle to move in, shoot at, etc) instead of discrete (can only place a Go piece on one of 361 locations). Robotics has this paradigm of sense-plan-act, and right now there's rapid progress being made on all three parts, but hardware limitations on robots' ability to sense and act limit development of the purely-software "plan" phase. Using a game like DOTA 2 seems like a very clever way to be able to work on sense-plan-act without any hardware limits (since the world the AI is inhabiting is purely digital). So, given the ease with which they solved DOTA 2 by throwing neural nets and processing power at it, why does solving the problem of human-level performance at "walking around and picking up objects" feel so much further off? |
Okay so for this one you gotta just read that whole article, and watch both the videos. Something that maybe the article doesn't get across perfectly is just how impressive this is - it's a massive breakthrough in robotic locomotion! This is the cutting edge of the field, and the closest humans have ever gotten to creating a mammal-level intelligence. But, critically, ANYmal is really, really far away from that goal. It's mostly human-controlled, and it doesn't have the ability to recognize objects. |
The Ontology Of Robotics
- YOLO knows the names of things, and how to draw boxes around them, but it doesn't really know what they are
- YOLO knows that in this image, that part of the picture is a "car." This tricks humans into thinking it knows what a car is. It doesn't.
- YOLO doesn't know what driving is, or how a car works, or that they can move, or what "works" is, or what "move" is. It's probably not conscious.
- YOLO isn't cybernetic- there's no feedback loop. It's just a straight-through formula for identifying objects in pictures. YOLO does not have being-in-the-world.
- ANYmal, before its proprioception upgrade, only knows "terrain." Those gray polygon maps of the world represent its umwelt.
- But what's important is that it really does know what terrain is. It knows what terrain is present-at-hand for (you can walk on it). ANYmal has a very limited type of being-in-the-world.
- The software change that gave ANYmal the ability to ignore what it sees in favor of proprioceptive data gave it new concepts- the upgraded ANYmal understands something like "soft" or "illusory"
- What it means for something to be "soft" or "illusory" is that it can be seen, but not felt. It also knows "invisible" - feelable but not seeable.
- To be clear, ANYmal does not know those english words, and those words (because of the language they are from- structuralism, etc.) have much more meaning, and imply a deeper understanding of the world than ANYmal has.
- ANYmal, in fact, does not possess language, and is not a social creature.
- Still, it has being-in-the-world, and it knows the for-what of illusory, terrain, invisible, etc.
- Five (and all other game AIs) are different. They don't have to take in raw sense data and then apply an ontology to it. They inhabit a world with a natural, complete, a priori ontology.
- DOTA 2 is a world of heroes (with exact positions, known numerically), effects, health points, etc. Same with chess - Deep Blue never had to learn the concept "rook," that was never part of the problem of making a chess AI.
- There's no AI in existence right now that knows "tree" in the way a human does.
- Five shows us that once you have an ontology, once you can punctuate the stream of experiences correctly, the rest of the problem is pretty much solved at this point.
- And so, Five's gains are impressive, but there still remains an entire, very different sector of work to be done if we want AI that can live in the world.
- That's why ANYmal is so impressive to me - while it doesn't do much that's impressive yet, it's working in the unrestricted mammalian problem space. Any advances it makes are real advances - it's exactly as close to human-level intelligence as its behavior makes it appear.
- ANYmal is making progress towards understanding arbitrary space - towards being able to see a scene, and understand what distance is, and what's in the scene, and how one can move through it.
- Self-driving cars are somewhere in between Five and ANYmal. "Driving" is a restricted problem space, with a lot of the ontology one needs predefined, often in law (road sign, car, traffic light, pedestrian, street, lane).
- The promise of self-driving cars is the 100% autonomous car - capable of handling any scenario. I think that requires at least dog-level intelligence, which we really don't have. 99.9% autonomous cars are much easier, since you can give the car an a priori ontology of a few hundred types of things and have that cover 99.9% of cases. But you'll still need humans for the really weird cases that involve things outside your ontology - like a truck carrying traffic lights
- Though it's mostly gone now, the hype for AI generated by good classifiers comes from a Cartesian ontology- one where the world is understood by a passive observer, sitting still and staring at a ball of wax.
- Kind of related to the issue of "species", and how people think the categories/properties (tree, rock, dog, cat, soft, hard) that they use are natural and an inherent property of the world itself, instead of arising from their for-the-sake-of-which
- The uncategorized world is the pleroma as Jung describes it.