A four-wheeled robot revs up and travels across the room as twenty two-year-old Samarth Shah verbally commands it to move forward. The robot is connected to a computer and the Microsoft Kinect, an accessory of the XBox 360 video game console. Shah belongs to a sprawling breed of young indie developers who put to use the Kinect’s abilities to recognise speech, gestures, and motion in applications beyond the Kinect’s intended purpose of gaming.

The robot, which Shah built along with two friends Devanshee Shah and Yagnik Suchak, recognises voice commands and gestures and performs simple actions. Raise the right hand and it goes forward. A swipe of the hand sends it to the right. It stops when the left hand is raised and moves to the left when the left hand is swiped. The gesture recognition program, coded in Visual Basic using the Microsoft software development kit (SDK), carries out what Shah describes as “skeleton tracking”. It follows the positions of 20 joints of the body to detect a valid gesture. The three cameras on the Kinect together determine the angle, position, and orientation of the joints. The trio has decided to scale up the design of the robot to numerous applications such as operating a wheelchair with voice commands. Shah is also working on a Kinect application to surf the web using a browser that recognises movement.

VB code screenshot

Code in Visual Basic for the speech and gesture controlled robot built by Samarth Shah

Reaching the semifinals of the Kinect Fun Labs Challenge in the prestigious Microsoft Imagine Cup in 2012, got Shah, who was then an engineering student, immersed into the world of Kinect. He procured a Kinect from the US and got his hands dirty with C++ code and later moved to Visual Basic. His winning entry at the Imagine Cup was a prototype used to automate trivial, repetitive tasks for researchers at the Physical Research Laboratory, Ahmedabad, where he was working as a Research Fellow. The researchers could automatically schedule emails, save updates, share work with their colleagues, print documents, search documents, etc. with the help of gestures signalled to the Kinect. He was awarded a Kinect, which he later used to build more applications such as a combination of the Kinect and a computing board for home automation. Leave the room for five minutes or more and the Kinect will detect your absence and switch off selected utilities in the room. Return and the utilities will switch back on again. You can turn on or off any appliances in the room using a speech command or a gesture.

Eventually when he started interacting with developers around the world, he realised that most of them use C# for Kinect application development. As more support and documentation is available in C# compared with C++ or Visual Basic, Shah decided to switch to the popular programming language.

Owing to his personal interest in image processing, Shah delved deeper and started to explore the use of the Kinect for image processing tasks in Linux and Windows environments. Using the Kinect as a camera and OpenCV, a programming library for computer vision, RGB data and depth data can be retrieved from images. After applying a few processes, the data can be used to perform face detection in almost real time. He says it is possible to run other complex algorithms on the Kinect by modifying his program further.

While attempting to understand how the Kinect works, Shah also created a few simple applications that could be of practical use or would give ideas to other enthusiasts to build more advanced applications upon them. One of his initial projects, which can be interesting for the geeks in the corporate world, is a program for gesture and speech controlled presentations. You can move to the next slide with the swipe of your hand or by saying keywords such as ‘next’ and ‘previous’. Another fun application created during that period is a Kinect-controlled mouse through which one can control the computer’s mouse and perform actions such as left and right clicks with hand gestures. Using principles applied to the presentation controller, Shah created a gesture and speech controlled music player and a speech-to-text converter. He also created a Kinect application to view and browse online interactive maps such as Google Maps. Shah claims that all these applications have been tested by him on both Windows 7 and Windows 8.

The $150 (Rs 10,000) Kinect, with its compelling array of sensors and cameras, has been attracting hackers, hobbyists, programmers, developers, roboticists, and scientists since it was launched in 2010. Sensors that are too expensive or too rare to tinker with became accessible with the Kinect, enabling young, standalone developers such as Samarth Shah to build amazing things on top of it, without making big monetary investments in their works. Shah says he hopes to use his skills with Kinect development to integrate it with open source environments and for social good — to help the physically challenged go about their everyday lives and to use the Internet.

The source code for the gesture controlled robot is available for download under a free license.

Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,