CoSy logo Cognitive Systems for Cognitive Assistants
 
 
 

Explorer

Objective

More and more robots find their way into environments where their primary purpose is to interact with humans to help and solve a variety of service-oriented tasks. Particularly if such a service robot is mobile, it needs to have an understanding of the spatial and functional properties of the environment in which it operates. The problem we address is how a robot can acquire an understanding of the environment so that it can autonomously operate in the environment, and talk about it with a human. One key issue in the Explorer scenario is how to establish a correspondence between how a human perceives spatial and functional aspects of an environment, and what the robot autonomously learns as a map. Successful interaction between human and robot relies on bridging the gap between the two representations.

Activities

SLAM

A key component in the system is the ability to build a map of the environment based on sensor data. This is done using so called Simultaneous Localization And Mapping (SLAM). Most traditional systems rely on a laser scanner as main sensor for SLAM. The laser scanner is expensive and also provides information about the environment limited to distances. In comparison a camera offers much richer information, including the appearance of the environment. However, building a map from sensor camera data requires significantly much processing and a number of challenging issues need to be addressed. In the explorer scenario the main sensor is still the laser scanner but a number of alternative approaches are investigates, either using a combination of laser and vision or using vision only.

Multi-level conceptual spatial representations

Most existing approaches to robot map building, or Simultaneous Localization And Mapping (SLAM), use a metric representation of space. Humans, though, have a more qualitative, topological perspective on spatial organization (McNamara 1986). We adopt an approach in which we build a multi-level representation of the environment, combining metrical maps and topological graphs (as an abstraction over metrical information), like (Kuipers 2000). We extend these representations with structural descriptions that capture aspects of spatial and functional organization. The robot obtains these descriptions either through interaction with a human, or through inference combining its own observations (I see a coffee machine}) with ontological knowledge (Coffee machines are usually found in kitchens, so this is likely to be a kitchen!). We store objects in the spatial representations, and so associate the functionality of a location with that of the functions of the objects present there. A schematic view of the multi-level representation is given here .

Situated dialogue

A core characteristic of th explorer system is that each utterance is analyzed to obtain a representation of the meaning it expresses, and how it (syntactically) conveys that meaning  rather than just doing for example keyword spotting. This way, we can properly handle the variety of ways in which people may express assertions, questions, and commands. Furthermore, having a representation of the meaning of the utterance we can combine it with further inferences over ontologies to obtain a complete conceptual description of the location or object being talked about. This way we can ground situated dialogue in the situational awareness of the robot.

Human augmented mapping

Following (Topp and Christensen 2005) we talk about Human-Augmented Mapping (HAM) to indicate the active role that human-robot interaction plays in the robot's acquisition of qualitative spatial knowledge. Existing dialoguebased approaches to HRI usually implement a master/slave model of dialogue: the human speaks, the robot listens. However, situations naturally arise in which the robot needs to take the initiative, e.g. to clarify an issue with the human. This is one form of mixedinitiative interaction, enabling a robot to recognize when help is needed from a human, and learn from this interaction (Bruemmer & Walton, 2003). A situation that may require is for example when uncertainty arises in automatic area classification: Doors provide important knowledge about spatial organization, but are difficult to recognize robustly and reliably. Clarification dialogues can help to improve the quality of the spatial representation the robot constructs, and to increase the robot's robustness in dealing with uncertain information. The basic idea is to allow for any modality to raise an issue. The image below shows the timeline for an example with clarification dialogue.

Situation awareness

Situation awareness (SA) can be paraphrased as knowing [the important aspects of] what is going on around you, where importance is defined in terms of the goals and decision tasks for [the current] job (Endsley and Garland 2000). Endsley defines three levels of SA: perception, comprehension, and projection. A smart robot should be able perceive and comprehend the situation and adapt its behavior depending on it. An example is mmart handling of doors: When the user approaches a door, the robot can cause problems if it continues in normal following mode. If the user intends to close an open door or open a closed door the robot might end up in a situation where it blocks the user from, for example, swinging open a closed door leaf. A smart robot should be aware of this danger and take appropriate action.

Visual place recognition

Current research on vision-based localization systems faces several issues, of which robustness and adaptability are probably the most challenging. The system should be robust to many types of variations such as changes in illumination conditions, people moving around, or objects being used and moved. Moreover, the visual appearance of indoor environments changes continuously in time. This poses serious problems for recognition algorithms trained off-line on data acquired once and for all during a fixed time span. At the same time, when used on a robot, the system must run in real-time on hardware with limited processing and memory resources. To cope with the variations the system must adapt over time and update the representation of the environment.

Object Search and localization

For the use of specific objects in the environment in navigation as well as for manipulative or information-gathering tasks, it is necessary to find them first. The system must be able both to plan its sensing - including movement around the map - in order to perform efficiently and quickly, and to use robust and effective recognition methods. All of this needs to be backed by a good representation of the distribution of the objects in space.

Hardware

The experiments with the integrated explorer system s carried out at KTH and DFKI on the Performance PeopleBots Robone and Minnie respectively. Both of these robots are equiped with SICK laser scanners. Both also have a camera mounted on a pan-tilt unit which allows the camera to scan the environment while searching for visual information. This camera setup can also be used to give gaze feedback during interaction so that the user known where the robot's attention is.

Development

Year 1

The emphasis for the first year was on integrating some of the basic building blocks to get an integrated system that could build a map of the environment, navigate through it either by following a user or driving autonomously to a specified goal location and interaction with the user using spoken dialogue. The scenarios included clarification dialogue, annotion of the map using natural language, verification, going to places.

Year 2

During the second year the robot the laser based place classification algorithm was integrated into the system. Furthermore, simple object recognition functionality was added. These two combined allowed the robot to infer knowledge and reason about space to a larger exend. Initial work on situation awareness was also integrated into the system so that the robot could adapt its person following behaviour when passing though doors.

Year 3

In the third year, methods for improving object recognition were investigated. View planning based on a map of the environment was used to more efficiently cover the space during search. In order to allow for smaller objects to be detected with a relatively low resolution camera the object recognition process was divided into two parts, one for detection of objects and one for recognition. In the detection phase object hypotheses are formed and these are investigated by gradually zooming in on the objects. When the object fills enough of the image recognition is performed. In addition, an enhanced visual distance estimate was implemented. Initial work was also started towards a more general, hierarchical SLAM framework along with an improved navigation graph.

We have also worked on an approach to intelligent, interactive people following for autonomous robots. The approach combines robust methods for simultaneous localization and mapping and for people tracking in order to yield a socially and environmentally sensitive people following behavior. Unlike current purely reactive approaches ("nearest point following") it enables the robot to follow a human in a socially acceptable way, providing verbal and non-verbal feedback to the user where necessary. At the same time, the robot makes use of information about the spatial and functional organization of its environment, so that it can anticipate likely actions performed by a human, and adjust its motion accordingly. As a result, the robot's behaviors become less reactive and more intuitive when following people around an indoor environment. Below you can find two videos that contrast a purely reactive approach and our situation-aware approach in a corridor setting.

Results

Videos

Year 1
Year 2
Year 3
  • Year 3 Explorer demo (Exhaustive search; storing object location, Reacquisition, Reacquisition on retry, Failed reacquisition; revesion to exhaustive search, Exploiting serendipity: detecting objects in unexpected views) [.ogg, 8'34/57MB] [.avi, 8'34/64MB]
  • Bird's eye view of a people following run: the video visualizes the robot's internal representation of its surrounds: the robot's position with respect to a map acquired and maintained using SLAM, and the user's position extracted from laser range scans using a people tracking algorithm. In this run, the robot adapts with respect to its situation. When operating in a corridor, it adapts an "optimal-lane" following behavior, which tries to find a smooth trajectory around possible obstacles along the corridor. As a result, the robot can safely increase its top speed and also maintain a higher average speed. [video1, 0'41]
  • Bird's eye view of a people following run: the video visualizes the robot's internal representation of its surrounds: the robot's position with respect to a map acquired and maintained using SLAM, and the user's position extracted from laser range scans using a people tracking algorithm. In this run, the robot just follows the user. It neither adapts its driving behavior on the basis of what kind of environment it is in, nor does it plan ahead to avoid obstacles. This results in a far-from-optimal motion when driving down the corridor. It is especially evident when the robot is moving past an obstacle near the end of the corridor. [video2, 0'46]

Publications

Year1
[1] Elena Pacchierotti, Henrik I. Christensen, and P. Jensfelt. Embodied social interaction for service robots in hallway environments. In Proc. of the International Conference on Field and Service Robotics (FSR'05), July 2005.
[ bib | .pdf ]
[2] O. Martinez Mozos, C. Stachniss, and W. Burgard. Supervised learning of places from range data using adaboost. In Proc. of the IEEE Int. Conf. on Robotics & Automation (ICRA), pages 1742-1747, Barcelona, Spain, April 2005.
[ bib | .pdf ]
Year2
[3] E. Pacchierotti, H.I. Christensen, and P. Jensfelt. Evaluation of passing distance for social robots. In IEEE Workshop on Robot and Human Interactive Communication (ROMAN), Hartfordshire, UK, September 2006.
[ bib | .pdf ]
[4] Geert-Jan M. Kruijff, Hendrik Zender, Patric Jensfelt, and Henrik I. Christensen. Clarification dialogues in human-augmented mapping. In Proc. of the 1st Annual Conference on Human-Robot Interaction (HRI'06), Salt Lake City, UT, March 2006.
[ bib | .pdf ]
[5] Federico Bertolli, Patric Jensfelt, and Henrik I. Christensen. Slam using visual scan-matching with distinguishable 3d points. In Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'05), 2006.
[ bib | .pdf ]
[6] Geert-Jan M. Kruijff, Hendrik Zender, Patric Jensfelt, and Henrik I. Christensen. Situated dialogue and understanding spatial organization: Knowing what is where and what you can do there. In Proc. of IEEE Workshop on Robot and Human Interactive Communication (ROMAN), 2006.
[ bib ]
[7] O. Martinez Mozos and W. Burgard. Supervised learning of topological maps using semantic information extracted from range data. In Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2772-2777, Beijing, China, 2006.
[ bib | .pdf ]
[8] O. Martinez Mozos, A. Rottmann, R. Triebel, P. Jensfelt, and W. Burgard. Semantic labeling of places using information extracted from laser and vision sensor data. In In Proc. of the IEEE/RSJ IROS 2006 Workshop: From Sensors to Human Spatial Concepts, Beijing, China, 2006.
[ bib | .pdf ]
[9] E. Pacchierotti, H.I. Christensen, and P. Jensfelt. Design of an office guide robot for social interaction studies. In Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'06), 2006.
[ bib | .pdf ]
[10] A. Pronobis, B. Caputo, P. Jensfelt, and H.I. Christensen. A discriminative approach to robust visual place recognition. In Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'06), 2006.
[ bib | .pdf ]
Year3
[11] Oscar Martinez Mozos, Patric Jensfelt, Hendrik Zender, Geert-Jan M. Kruijff, and Wolfram Burgard. From labels to semantics: An integrated system for conceptual spatial representations of indoor environments for mobile robots. In Proc. of the IEEE ICRA 2007 Workshop: Semantic information in robotics (ICRA), Roma, Italy, 2007.
[ bib | .pdf ]
[12] K. O. Arras, O. Martinez Mozos, and W. Burgard. Using boosted features for the detection of people in 2d range data. In Proc. of the IEEE Int. Conf. on Robotics & Automation (ICRA), 2007.
[ bib | .pdf ]
[13] Geert-Jan M. Kruijff, Hendrik Zender, Patric Jensfelt, and Henrik I. Christensen. Situated dialogue and spatial organization: What, where... and why? International Journal of Advanced Robotic Systems, 4(2), 2007.
[ bib | .pdf ]
[14] Hendrik Zender and Geert-Jan M. Kruijff. Multi-layered conceptual spatial mapping for autonomous mobile robots. In Holger Schultheis, Thomas Barkowsky, Benjamin Kuipers, and Bernhard Hommel, editors, Control Mechanisms for Spatial Knowledge Processing in Cognitive / Intelligent Systems, volume Technical Report SS-07-01 of Papers from the AAAI Spring Symposium, pages 62-66, Menlo Park, CA, USA, March 2007. AAAI Press.
[ bib ]

Print this page

Print this page

 

Last modified: 26.2.2008 1:37:41