What do you think requires more intelligence? Please? the game of chess for opening. All right. I mean, many times we think that chess is a matter of genius. So if. cells first actually hard to do, they're building machines which can play chess should also be way harder than building machines, which can open doors.
Why it's Harder For AI to Open Doors Than Play Chess
But let's see what we have managed to do in artificial intelligence. You know, we probably heard about AI systems surpassing human by the game of check. This is not to take but 20 years ago license in we have had AI systems which can play complex multiplayer games and surpass humans even have this complex game of go But now let's you know take a look at opening doors Now you might think I'm showing you bad videos but let me assure you these are the best teams competing in the Darpa Grand Finals just 7 years ago doing. simple things like opening doors climbing stairs is actually very hard. OK,
so let's try to understand why is this the case? So there seems to be this dichotomy between what we think is hard and what are robots find to be hard and in this talk you know what I'm going to communicate to you is that human intuition of what we think is hard really gets in our way and you know kind of stops us from really building intelligent systems Now many scientists have pondered about this. You know I'll start with a quote from hands more awake you know one of the people who thought about AI quite a lot and here is what he has to say you know reasoning. requires very little computation you know reasoning like the kind of reasoning you have in this right but sensory motor skills requires enormous compute to code another scientist you know Stephen Pinkard from Howard the. main lesson of 35 years of AI research. Yes, that heart problems like chest are actually easy and easy problems like walking and hoping a door and actually hard These observations came to be known as the Morrow X paradox You know some people have gone ahead to speculate right that machines for AI touch films are going to do jobs which we think are cognitively challenging quite soon. For example being a board member being a data analyst or being creative and making paintings. So jobs which require physical intelligence are going to be not being able to done by machine for a long time to come. Now why is this? And the reason is that we are least aware of things that we do very well. For example,
our heart is beating. We are breathing, but are we aware of it? When you walk, are you aware of it? You know these systems are working all the time. They're flawless. You don't even think about them, right? This is what was quoted by Marvin Minsky. You know, one of the cofounders of the MIT is AI lab and a Turing Award winner, right? And then he goes on to say that. we are more aware of simple processes that do. not work well And when he says simple again think of this and this is not an abstract concept. I think all of us have experienced this model ex paradox in our lives. You know imagining riding a bicycle. You know when you learn how to ride a bicycle very early on you probably paid attempt set to every you know every movement of your foot where is the handle going but after some time it becomes natural it becomes inhuman you don't even think about it. So let's take this idea and you know apply it to a newness to understand termination of intelligence. So life started you know some 3.7 million years ago with single cell organism. then it took around you know 3.7 billion years to come to this. So what's your names do simple sensing water sucks you know like hanging from branches you know picking up a fruit throwing it so on and so forth Then it took you know few million years or 20 million years for humans still walls then a few million years for language to come in and then you know we are just 15,000 to 150,000 hills from when language start. So maybe there's a lesson over here also.
Who went flamberstad? So maybe there's a lesson over here. Also, the evolution spent a lot of time evolving sensory motors and relatively very well let a dry developing language or reasoning that we think is complex. So just to give a sense of these numbers, it will imagine that the origin of Earth we are describing it in one day, right? So we have start of the, you know, Ernest Bond at midnight, and they're going to look at, you know, our 24 hour speed. So language is just 10 seconds. right? Humans are just one minute, 26 one minute 22nd. It's. I may be 6 minutes overtime, but life started 20 hours ago. So that gives you the sense of how much time it took to get to these symbols Now, what is this implication for building a roof? So I, for one, you know, want to have a rope, or which can do the mundane things that I do at my house today Now, if I say to the robot, you know, make me dinner. The first thing the robot needs to understand is, you know, what is thinner? What do I eat? What the recipes and how to make So this is what language, my idea. Now, there is another part which is physical intelligence, which is, how do I actually make dinner, which I have to chop vegetables, so on and so forth. Now lets look back at our timeline.
So how much time it took for language understandings? 10 seconds. 10 seconds. How much time for physical intelligence? May be 20 hours. So what have we done in ai building is, you know, we have taken lots amounts of data from the Internet and developed ways to systems which can understand that, you know, just to give you a very quick summary of how they work, you know. So these systems are called as language models. They consume a lot of data. And then, given a few words, they try to predict what words are going to come next. And for example, if the question is spoken and then they assume can, you know, make a prediction of what the next words are going to do. And here is, you know, one prediction member assistance that is asking a different question and see what the prediction is. You know, sounds very reasonable, right? They are another question. Let us see what the answer is. You know, maybe, maybe a bit nonsense. But, but the point is, you know, yes, you know, there are a few aberrations, but the systems are becoming really, really good. And we can also put up images with these systems.
That's not about language, right? For example, you know, we can ask an ai system to generate an image of a 2 to 1 soul with accounts. And sometimes, which probably is, you know, we never imagined, right? Or something like raw images of an other panel chain. And the systems can do it. So what does this mean that we have such good language understanding in context of building robots that you could, you know, be in my house So the mode of mixed paradox was made in, you know, 1988 after observing thirty five years of aid. Now we are almost sent by 2023 which means it is almost 35 years since. And let's look at an attempt, you know, to fail a robotic system to replicate. So this is. a very impressive system, you know, put out by Google, you know, some time back. And the question is, you know, someone spilled a cope and they want the robot to clean the house. So let's look at, you know, what the robot ends up doing, right? It realizes it needs to find a book. It moves, it grasps this book can happen. And then it tries to throw it in the crash. But I cannot throw it, right? Then it moves ahead and says, hey, you know, I need something to wipe off the head. So I'm going to pick up this bag and take it to go wipe the deck. How about now? That cannot throw it right. Then I put the head and says, hey, you know, I need something to wipe off the table. So I am going to pick up this bag and tickets to go wise the table. We cannot wipe the table.
OK, so what is the lesson? The moroves paradox still tries. 35 years have passed, you know, and still the same problem exists, right? Now, I don't want to be here 35 years from now until it is the same thing, right? So we need to fix this. So now what? What actually is the problem? Right? The problem is, you know, to pick the language understanding, which is probably the 10 second of evolution, we are pretty much consumed all of the Internet. Now, how are we going to go to financing models? So, you know, some people say that, hey, you know, maybe we can get to artificial intelligence without doing physically Now I can talk a lot about this, right? But in a sense of time, I was going to tell you my back. My bet is. and these paradoxes that we have, you know, seen so far also happen in physical intelligence. And this is what it, you know, makes physical intelligence challenging. Try to give you an example. You know, consider a robot doing a background. Impressive, right? But, you know, what about the behavior of what feels very. simple.
But when you do a back point, you know, maybe what you're doing is a specialized motion that you only have to reason about your own voter system. But when I walked in, I have to walk on many different terrors. So you do reasons about anyone. So these systems start to generalize to a large variety of settings. And this is what makes them challenging. So the question which means my lab are trying to look at is, you know, how do we get to physically? And I'm going to briefly now tell you, you know, some of the ideas and techniques that we have in this one thing. for the one thing we heavily make use of is simulations. You know, 2000 simulation we can generate lots of data that in three hours we can generate, you know, 100 days worth of data. So this is, you know, an example where you can simulate many, many robots in parallel then in simulation you know they can learn how to walk right. So these are some dates that we have learned. So what we'll learn is walking behaviors in simulations then we take them and transfer them into the real world and the real world I mean different kinds of direct science might be stairs which has an obstacle or walking or stand So here are you know some results of these systems which are trained in simulation but then deployed you know in the real and over here 50,000 fast with not just fast they can go on these challenging terrain and still be stable or for example over here it tries to go under an obstacle. So you have to crowds to put it and go believe all. you know for example you know going up this gravity. you know sometimes when the robot is doing you know they've behaviors the environment is not you know foreign government you know for example once when you're running the snowboard outside in this building one of the screws in the leaves came off. So what you now see is you know there's snow voltage limits you know but still walked and this is the kind of robustness and generalization that we hope towards. and this is not just in context of locomotion you know we can also think about in context of manipulation mean for example you know things that we do every day right to pick up flu we use them right and we keep throwing it all the time sometimes for a purpose and sometimes you know just for. and sometimes you know we do it because we have to so what they can do is you know we can also run simulations where we have lots of happens you know simulating you know this task of reorienting of positives needed to perform a time frame. and then you can take you know this third system in simulation approximately. and the way the system is going to work is it's going to have a camera and going to give some commands to the fingers to fight you. So we have a sense of what our solution and evaluate on new objects in the system has never steered. so you know for example over here on your top right is a poor orientation and you know let's look at what the system ends up doing.
So if I have to reorient this object to the target which is shown on the top so again we look at some examples showing that in simulation we can leverage large amounts of data and then to perform things that seem simple to humans but I actually swipe now you know building these you know physical intelligence for these systems is not just about the controller body is also a lot about us you know for example the hand that I showed you had no that sense. so we need to some more modality so it can start you know sensing to where it makes contact and you know we have been running you know some experiments with you know doing problems like hey if I give you an object you know can you feel you know what the object is right and you know then if I close the lights so it's just dark can you go and find that object again right so for example now we put more objects shut up the light and the hand still have to find these objects just based on that question. another question you can ask is well is the design of this hand good maybe. maybe not. Now it's a bit developing tools which can help us design you know better Samsung your phone examples of four different times that you're able to design which are optimized for the particular task right so for example over here you know the hand which can cut things and you know it can use this to decide that cut paper but not that So in some weeks I hit a not just about control but also thinking about perception and also thinking about and we need to think in a full squad to approach perfectness. So to end what I discussed was the dichotomy between artificial and national intelligence and what we think we have is a sharing right which is 10 seconds worth of evolution you know models that we have trained on the agencies. But the question is where is my kick. right and you know while there some people who believe that you know we can just go on the Internet bring bigger models and not be embodied and get to artificial intelligence intelligence you know me and some other people are maybe on the other gap. We think we need she need to build the case for us to build physical intelligence before we can go to through artificial industry and I hope now that for the whole people think about physical intelligence but especially that there is a lockdown you know and a lot of excitement about this embodied intelligence going on right I think that I is very good but we cannot you know have the cake without building continue but that thank you.
Comments
Post a Comment