Providing an artificial visual system to a robot is an important step to implementing autonomous robotics applications. Moving towards such applications, the idea is to build robots that can perform tasks without human intervention, reacting quickly, if necessary, in response to stimuli observed from the environment. One possible approach to implementing such artificial visual systems is to perform a complete analysis and image indexing, so that image processing tasks that give useful information to the decision making process can be performed with the full data. The problem becomes even more critical when the goal is to extract visual information (or features) that are useful for the several general-purpose activities of a robot.
A point that we want to emphasize is that we do not want a system designed to perform specific tasks. We want a behaviorally cooperative and active system that can perform several, different tasks in different environments or situations, automatically responding, in real-time, to environment changes. In this way, we believe that data reduction and feature abstraction is the main key of the system, allowing recognition or an on-line weight tuning (attention) integrating the features extracted from sensory information, according to the task being executed. Thus, the biologically inspired model for feature extraction proposed in this article, that has allowed us to develop a system with these requirements, is the main issue treated in this work.
In order to reduce the processing time, the image features are extracted in a foveated way (see Figure 1). There is two consequences:
In most of top-down attention tasks it is important to keep track of the object. In this way, we propose to select only part of the features that seem to be important in keeping up with the tracking. Features can exist at different scales, and if the object is almost parallel to the camera plane, then the object features that will be matched can be estimated. For example, if the object is near the camera, the lower scale features from the object are matched with the high scales features from the model. On the other hand, if the object is far from the camera, the high scale features from the object are matched with the low scales features from the model.
The fovea position and foveated parameters depend on the task to be implemented. For example, if this task is tracking, then it is most suitable to keep the fovea around the most relevant features of the object. If the features are equally distributed along the object, in a general way, it is better to keep the fovea at the center of the detected object. One problem happens when the visual system loses the fovea. In this case, if the fovea is placed far from the object, the system can become unstable without finding the object. Several strategies can be applied using the foveated model. Another example is an examination of the environment by the visual system. In this case, one can move the fovea around the most salient regions.
A possibility is to use bottom-up attention and move the fovea to the most salient region. Another option is to eliminate the fovea model and compute the features in the whole image for a while. In this case, the extra processing time will cause a decrease in the frame rate, but the fovea can be recovered when the object is found.
Efficient 3D object recognition using foveated point clouds
Rafael Beserra Gomes , Bruno Marques da Silva , Renato Gardiman , Rafael Vidal Aroca , Lourena Rocha Medeiros , Luiz Marcos Recent hardware technologies have enabled acquisition of 3D point clouds from real world scenes in real time. A variety of interactive applications with the 3D world can be developed on top of this new technological scenario. However, a main problem that still remains is that most processing techniques for such 3D point clouds are computationally intensive, requiring optimized approaches to handle such images, especially when real time performance is required. As a possible solution, we propose the use of a 3D moving fovea based on a multiresolution technique that processes parts of the acquired scene using multiple levels of resolution. Such approach can be used to identify objects in point clouds with efficient timing. Experiments show that the use of the moving fovea shows a seven fold performance gain in processing time while keeping 91.6% of true recognition rate in comparison with state-of-the-art 3D object recognition methods.
|Computers & Graphics||2013|
Visual Attention Guided Features Selection with Foveated Images
Rafael Beserra Gomes , Bruno Motta de Carvalho , Luiz Marcos Visual attention is a very important task in autonomous robotics, but, because of its complexity, the processing time required is significant. We propose an architecture for feature selection using foveated images that is guided by visual attention tasks and that reduces the processing time required to perform these tasks. Our system can be applied in bottomup or topdown visual attention. The foveated model determines which scales are to be used on the feature extraction algorithm. The system is able to discard features that are not extremely necessary for the tasks, thus, reducing the processing time. If the fovea is correctly placed, then it is possible to reduce the processing time without compromising the quality of the tasks' outputs. The distance of the fovea from the object is also analyzed. If the visual system loses the tracking in topdown attention, basic strategies of fovea placement can be applied. Experiments have shown that it is possible to reduce up to 60% the processing time with this approach. To validate the method, we tested it with the feature algorithm known as speeded up robust features (SURF), one of the most efficient approaches for feature extraction. With the proposed architecture, we can accomplish real time requirements of robotics vision, mainly to be applied in autonomous robotics.
Real time vision for robotics using a moving fovea approach with multi resolution
Rafael Beserra Gomes , Bruno Motta de Carvalho , Luiz Marcos We propose a new approach to reduce and abstract visual data useful for robotics applications. Basically, a moving Fovea in combination with a multi-resolution representation is created from a pair of input images given by a stereo head, that reduces hundreds of times the amount of information from the original images. With this new theoretical approach we are able to compute several feature maps, including several filters, stereo matching, and motion, in real time, that is at more than 30 frames per second. As the main contribution, the moving fovea allows, most of the time, a robot to avoid performing physical motion with the cameras in order to get a desirable region in the images center. We present mathematical formalization of the moving Fovea approach, the algorithms, and details of the implementation of such schema. We validate it with experimental results. This approach has demonstrated to be very useful to robotics vision.