Human beings have the capacity of seeing a photograph and knowing what would happen immediately later. This is so natural for us that we do not consider the huge amount of information necessary for this task. Teaching computers to do the same is quite a challenging undertaking.
MIT researchers have trained computers to predict what would happen next in a given image. Machines were trained using over 2 million videos of series such as The Office, Desperate Housewives and The Big Bang Theory. The scenes used showed people hugging, kissing, shaking hands and giving a high-five.
The algorithm created to predict this four reactions analyzes the video and compares it with what he learnt about the signs leading to each of those interactions. The final result is a 43% of accuracy, in contrast with a 71% of accuracy from its human counterpart. Still, it is a very good number for this emerging technology.
Facebook’s head of AI Yann LeCun said “If you’re watching a Hitchcock movie and I ask, ‘15 minutes from now, what is it going to look like in the movie?’ You have to figure out who the murderer is. Solving this problem completely will require knowing everything about the world and human nature. That’s what’s interesting about it.”
In the future, this algorithm could be used in hospitals or similar places to prevent people from getting injured. Also, its cameras could analyze situations and alert the corresponding authorities. Even more, implanted in robots, the system could have some intervention in emergency situations.
More info MIT Machines learning