CoSy logo Cognitive Systems for Cognitive Assistants
 
 
 

Explorer

Activities

SLAM

A key component in the system is the ability to build a map of the environment based on sensor data. This is done using so called Simultaneous Localization And Mapping (SLAM). Most traditional systems rely on a laser scanner as main sensor for SLAM. The laser scanner is expensive and also provides information about the environment limited to distances. In comparison a camera offers much richer information, including the appearance of the environment. However, building a map from sensor camera data requires significantly much processing and a number of challenging issues need to be addressed. In the explorer scenario the main sensor is still the laser scanner but a number of alternative approaches are investigates, either using a combination of laser and vision or using vision only.

Multi-level conceptual spatial representations

Most existing approaches to robot map building, or Simultaneous Localization And Mapping (SLAM), use a metric representation of space. Humans, though, have a more qualitative, topological perspective on spatial organization (McNamara 1986). We adopt an approach in which we build a multi-level representation of the environment, combining metrical maps and topological graphs (as an abstraction over metrical information), like (Kuipers 2000). We extend these representations with structural descriptions that capture aspects of spatial and functional organization. The robot obtains these descriptions either through interaction with a human, or through inference combining its own observations (I see a coffee machine}) with ontological knowledge (Coffee machines are usually found in kitchens, so this is likely to be a kitchen!). We store objects in the spatial representations, and so associate the functionality of a location with that of the functions of the objects present there. A schematic view of the multi-level representation is given here .

Situated dialogue

A core characteristic of th explorer system is that each utterance is analyzed to obtain a representation of the meaning it expresses, and how it (syntactically) conveys that meaning  rather than just doing for example keyword spotting. This way, we can properly handle the variety of ways in which people may express assertions, questions, and commands. Furthermore, having a representation of the meaning of the utterance we can combine it with further inferences over ontologies to obtain a complete conceptual description of the location or object being talked about. This way we can ground situated dialogue in the situational awareness of the robot.

Human augmented mapping

Following (Topp and Christensen 2005) we talk about Human-Augmented Mapping (HAM) to indicate the active role that human-robot interaction plays in the robot's acquisition of qualitative spatial knowledge. Existing dialoguebased approaches to HRI usually implement a master/slave model of dialogue: the human speaks, the robot listens. However, situations naturally arise in which the robot needs to take the initiative, e.g. to clarify an issue with the human. This is one form of mixedinitiative interaction, enabling a robot to recognize when help is needed from a human, and learn from this interaction (Bruemmer & Walton, 2003). A situation that may require is for example when uncertainty arises in automatic area classification: Doors provide important knowledge about spatial organization, but are difficult to recognize robustly and reliably. Clarification dialogues can help to improve the quality of the spatial representation the robot constructs, and to increase the robot's robustness in dealing with uncertain information. The basic idea is to allow for any modality to raise an issue. The image below shows the timeline for an example with clarification dialogue.

Situation awareness

Situation awareness (SA) can be paraphrased as knowing [the important aspects of] what is going on around you, where importance is defined in terms of the goals and decision tasks for [the current] job (Endsley and Garland 2000). Endsley defines three levels of SA: perception, comprehension, and projection. A smart robot should be able perceive and comprehend the situation and adapt its behavior depending on it. An example is mmart handling of doors: When the user approaches a door, the robot can cause problems if it continues in normal following mode. If the user intends to close an open door or open a closed door the robot might end up in a situation where it blocks the user from, for example, swinging open a closed door leaf. A smart robot should be aware of this danger and take appropriate action.

Visual place recognition

Current research on vision-based localization systems faces several issues, of which robustness and adaptability are probably the most challenging. The system should be robust to many types of variations such as changes in illumination conditions, people moving around, or objects being used and moved. Moreover, the visual appearance of indoor environments changes continuously in time. This poses serious problems for recognition algorithms trained off-line on data acquired once and for all during a fixed time span. At the same time, when used on a robot, the system must run in real-time on hardware with limited processing and memory resources. To cope with the variations the system must adapt over time and update the representation of the environment.

Object Search and localization

For the use of specific objects in the environment in navigation as well as for manipulative or information-gathering tasks, it is necessary to find them first. The system must be able both to plan its sensing - including movement around the map - in order to perform efficiently and quickly, and to use robust and effective recognition methods. All of this needs to be backed by a good representation of the distribution of the objects in space.

Print this page

 

Last modified: 9.1.2009 9:58:18