It is a challenging task for computer vision to identify appropriate grasp patterns for copious objects. The problem becomes incredibly complex when it is modeled based on human multi-fingered grasping. Apart from knowing various properties of the object, there are many other factors for consideration. Grasping in scenarios where there are similar objects, extreme lighting conditions such as shadows and reflections are some interesting challenging conditions that makes grasping a nontrivial problem from a computer vision perspective. The literature review reveals that despite an extensive amount of work, alternate approaches are required to be explored for developing a control framework, as it may be applied across the varied scenarios. In this thesis, solutions based on different implementations of artificial neural networks are proposed for generalized grasping. We take inspiration from the human visual system, where humans use only visual information for selecting a suitable (pre)grasp hand shape for an object, based on learnt correspondences between visual appearance and suitable successful grasps. Additionally, humans do not use any depth information of the object while grasping, other than stereoscopic cues, and are able to form successful grasps using a monoscopic view. A novel method is presented to learn grasping patterns from images and data recorded from a dataglove, provided by the Technical University of Berlin (TUB) Dataset. We experiment with re-training a number of pre-trained Deep Convolutional Neural Network (DCNN), starting with the well known as AlexNet, to learn deep features from images that correspond to human grasps. The results show that it is possible to generate grasps from 2D visual information only that have a high degree of similarity with the ground truth grasps for those objects. In addition, we use two methods, Support Vector Machines (SVM) and Hotelling’s T2 test to demonstrate that the dataset includes distinctive grasps for different objects. Images in the TUB dataset show very limited variation in terms of viewpoint, lighting, object and background colours and textures etc. In order to demonstrate the viability of the proposed method in more realistic and challenging conditions, we developed a synthetic image dataset, based on the real dataset, but with varied view points and adding noise to the environment by introducing different lights, textures and shading on the objects to depict a real-time environment for the robotic agent. The results demonstrate that the proposed methods can be trained to be robust to such variations, pointing the way to future work integrating into a full reaching and grasping pipeline.
|Goruchwyliwr||Bernie Tiddeman (Goruchwylydd) & Patricia Shaw (Goruchwylydd)|