When you feel or see something happen, you can instantly describe it: “a goalkeeper in a black jersey caught a ball released by a striker,” or “a dog runs along the beach.” It’s a simple task for us, but an immensely hard one for computers — fortunately, IBM and MIT are partnering up to see what they can do about making it a little easier and better.
The new IBM-MIT Laboratory for Brain-inspired Multimedia Machine Comprehension — we’ll just call it BM3C — is a multi-year collaboration between the two organizations that will be looking specifically at the problem of computer vision and audition.
It’ll be led by Jim DiCarlo, head of MIT’s Department for Brain & Cognitive Science; that department and CSAIL will contribute members to the new lab, as will IBM’s Watson team. No money will change hands and no specific product is being pursued; the idea is to engender jolly and hopefully fruitful mutual aid.
The problem of computer vision spans multiple disciplines, so it has to be attacked from multiple directions. Say your camera is good enough to track objects minutely — what good is it if you don’t know how to separate objects from their background? Say you can do that — what good is it if you can’t identify the objects? Then you need to establish relationships between them, intuit physical rules… all stuff our brains are especially good at.