| |
|
|
|
|
|
|
Explorer
Objective
More and more robots find their way into environments where their
primary purpose is to interact with humans to help and solve a variety
of service-oriented tasks. Particularly if such a service robot is
mobile, it needs to have an understanding of the spatial and
functional properties of the environment in which it operates. The
problem we address is how a robot can acquire an understanding of the
environment so that it can autonomously operate in the environment,
and talk about it with a human.
One key issue in the Explorer scenario is how to establish a
correspondence between how a human perceives spatial and functional
aspects of an environment, and what the robot autonomously learns as a
map. Successful interaction between human and robot relies on bridging
the gap between the two representations.
Activities
SLAM
A key component in the system is the ability to build a map of the
environment based on sensor data. This is done using so called
Simultaneous Localization And Mapping (SLAM). Most traditional systems
rely on a laser scanner as main sensor for SLAM. The laser scanner is
expensive and also provides information about the environment limited
to distances. In comparison a camera offers much richer information,
including the appearance of the environment. However, building a map
from sensor camera data requires significantly much processing and a
number of challenging issues need to be addressed. In the explorer
scenario the main sensor is still the laser scanner but a number of
alternative approaches are investigates, either using a combination of
laser and vision or using vision only.
Multi-level conceptual spatial representations
Most existing approaches to robot map building, or Simultaneous
Localization And Mapping (SLAM), use a metric representation of
space. Humans, though, have a more qualitative, topological
perspective on spatial organization (McNamara 1986). We adopt an
approach in which we build a multi-level representation of the
environment, combining metrical maps and topological graphs (as an
abstraction over metrical information), like (Kuipers 2000). We extend
these representations with structural descriptions that capture
aspects of spatial and functional organization. The robot obtains
these descriptions either through interaction with a human, or through
inference combining its own observations (I see a coffee
machine}) with ontological knowledge (Coffee machines are
usually found in kitchens, so this is likely to be a kitchen!). We
store objects in the spatial representations, and so associate the
functionality of a location with that of the functions of the objects
present there. A schematic view of the multi-level representation is
given here .
Situated dialogue
A core characteristic of th explorer system is that each utterance is
analyzed to obtain a representation of the meaning it expresses, and
how it (syntactically) conveys that meaning rather than just doing
for example keyword spotting. This way, we can properly handle the
variety of ways in which people may express assertions, questions, and
commands. Furthermore, having a representation of the meaning of the
utterance we can combine it with further inferences over ontologies to
obtain a complete conceptual description of the location or object
being talked about. This way we can ground situated dialogue in the
situational awareness of the robot.
Human augmented mapping
Following (Topp and Christensen 2005) we talk about
Human-Augmented Mapping (HAM) to indicate the active role that
human-robot interaction plays in the robot's acquisition of
qualitative spatial knowledge.
Existing dialoguebased approaches to HRI usually implement a
master/slave model of dialogue: the human speaks, the robot
listens. However, situations naturally arise in which the robot needs
to take the initiative, e.g. to clarify an issue with the human. This
is one form of mixedinitiative interaction, enabling a robot to
recognize when help is needed from a human, and learn from this
interaction (Bruemmer & Walton, 2003). A situation that may require is
for example when uncertainty arises in automatic area classification:
Doors provide important knowledge about spatial organization, but are
difficult to recognize robustly and reliably. Clarification dialogues
can help to improve the quality of the spatial representation the
robot constructs, and to increase the robot's robustness in dealing
with uncertain information. The basic idea is to allow for any
modality to raise an issue. The image below shows the timeline for an
example with clarification dialogue.
Situation awareness
Situation awareness (SA) can be paraphrased as knowing [the important
aspects of] what is going on around you, where importance is defined
in terms of the goals and decision tasks for [the current] job
(Endsley and Garland 2000). Endsley defines three levels of SA:
perception, comprehension, and projection. A smart robot should be
able perceive and comprehend the situation and adapt its behavior
depending on it.
An example is mmart handling of doors: When the user approaches a
door, the robot can cause problems if it continues in normal following
mode. If the user intends to close an open door or open a closed door
the robot might end up in a situation where it blocks the user from,
for example, swinging open a closed door leaf. A smart robot should be
aware of this danger and take appropriate action.
Visual place recognition
Current research on vision-based localization systems faces several
issues, of which robustness and adaptability are probably the most
challenging. The system should be robust to many types of variations
such as changes in illumination conditions, people moving around, or
objects being used and moved. Moreover, the visual appearance of
indoor environments changes continuously in time. This poses serious
problems for recognition algorithms trained off-line on data acquired
once and for all during a fixed time span. At the same time, when used
on a robot, the system must run in real-time on hardware with limited
processing and memory resources. To cope with the variations the
system must adapt over time and update the representation of the
environment.
Object Search and localization
For the use of specific objects in the environment in navigation as
well as for manipulative or
information-gathering tasks, it is
necessary to find them first. The system must be able both to plan
its sensing - including movement around the map -
in
order to perform efficiently and quickly, and to
use robust and effective recognition methods. All of this needs to be
backed by a good representation of
the distribution of the
objects in space.
Hardware
The experiments with the integrated explorer system s carried out at
KTH and DFKI on the Performance PeopleBots Robone and Minnie
respectively. Both of these robots are equiped with SICK laser
scanners. Both also have a camera mounted on a pan-tilt unit which
allows the camera to scan the environment while searching for visual
information. This camera setup can also be used to give gaze feedback
during interaction so that the user known where the robot's attention
is.
Development
Year 1 The emphasis for the first year was on integrating
some of the basic building blocks to get an integrated system that
could build a map of the environment, navigate through it either by
following a user or driving autonomously to a specified goal location
and interaction with the user using spoken dialogue.
The scenarios included clarification dialogue, annotion of the map
using natural language, verification, going to places.
Year 2 During the second year the robot the laser based
place classification algorithm was integrated into the
system. Furthermore, simple object recognition functionality was
added. These two combined allowed the robot to infer knowledge and
reason about space to a larger exend. Initial work on situation
awareness was also integrated into the system so that the robot could
adapt its person following behaviour when passing though doors.
Year 3 In the third year, methods for improving object
recognition were
investigated. View planning based on a map of the environment was used
to
more efficiently cover the space during search. In order to allow for smaller
objects to be
detected with a relatively low resolution camera the
object recognition
process was divided into two parts, one for detection of objects and one for
recognition. In the detection phase object hypotheses are
formed and these
are investigated by gradually zooming in on the objects. When the object
fills enough of the image recognition is performed. In
addition, an
enhanced visual distance estimate was implemented.
Initial work was also started towards a more general, hierarchical SLAM
framework along
with an improved navigation graph.
We have also worked on an approach to intelligent, interactive people following for autonomous robots. The approach combines robust methods for simultaneous localization and mapping and for people tracking in order to yield a socially and environmentally sensitive people following behavior. Unlike current purely reactive approaches ("nearest point following") it enables the robot to follow a human in a socially acceptable way, providing verbal and non-verbal feedback to the user where necessary. At the same time, the robot makes use of information about the spatial and functional organization of its environment, so that it can anticipate likely actions performed by a human, and adjust its motion accordingly. As a result, the robot's behaviors become less reactive and more intuitive when following people around an indoor environment. Below you can find two videos that contrast a purely reactive approach and our situation-aware approach in a corridor setting.
Results
Videos
Year 1
Year 2
Year 3
-
Year 3 Explorer demo (Exhaustive search; storing object location,
Reacquisition, Reacquisition on retry, Failed reacquisition; revesion to exhaustive search, Exploiting serendipity: detecting objects in unexpected views)
[.ogg, 8'34/57MB]
[.avi, 8'34/64MB]
-
Bird's eye view of a people following run: the video visualizes the robot's internal representation of its surrounds: the robot's position with respect to a map acquired and maintained using SLAM, and the user's position extracted from laser range scans using a people tracking algorithm.
In this run, the robot adapts with respect to its situation. When operating in a corridor, it adapts an "optimal-lane" following behavior, which tries to find a smooth trajectory around possible obstacles along the corridor. As a result, the robot can safely increase its top speed and also maintain a higher average speed.
[video1, 0'41]
-
Bird's eye view of a people following run: the video visualizes the robot's internal representation of its surrounds: the robot's position with respect to a map acquired and maintained using SLAM, and the user's position extracted from laser range scans using a people tracking algorithm.
In this run, the robot just follows the user. It neither adapts its driving behavior on the basis of what kind of environment it is in, nor does it plan ahead to avoid obstacles. This results in a far-from-optimal motion when driving down the corridor. It is especially evident when the robot is moving past an obstacle near the end of the corridor.
[video2, 0'46]
Publications
Year1
|
[1]
|
Elena Pacchierotti, Henrik I. Christensen, and P. Jensfelt.
Embodied social interaction for service robots in hallway
environments.
In Proc. of the International Conference on Field and Service
Robotics (FSR'05), July 2005.
[ bib |
.pdf ]
|
|
[2]
|
O. Martinez Mozos, C. Stachniss, and W. Burgard.
Supervised learning of places from range data using adaboost.
In Proc. of the IEEE Int. Conf. on Robotics & Automation
(ICRA), pages 1742-1747, Barcelona, Spain, April 2005.
[ bib |
.pdf ]
|
Year2
|
|
[3]
|
E. Pacchierotti, H.I. Christensen, and P. Jensfelt.
Evaluation of passing distance for social robots.
In IEEE Workshop on Robot and Human Interactive Communication
(ROMAN), Hartfordshire, UK, September 2006.
[ bib |
.pdf ]
|
|
[4]
|
Geert-Jan M. Kruijff, Hendrik Zender, Patric Jensfelt, and Henrik I.
Christensen.
Clarification dialogues in human-augmented mapping.
In Proc. of the 1st Annual Conference on Human-Robot Interaction
(HRI'06), Salt Lake City, UT, March 2006.
[ bib |
.pdf ]
|
|
[5]
|
Federico Bertolli, Patric Jensfelt, and Henrik I. Christensen.
Slam using visual scan-matching with distinguishable 3d points.
In Proc. of the IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS'05), 2006.
[ bib |
.pdf ]
|
|
[6]
|
Geert-Jan M. Kruijff, Hendrik Zender, Patric Jensfelt, and Henrik I.
Christensen.
Situated dialogue and understanding spatial organization: Knowing
what is where and what you can do there.
In Proc. of IEEE Workshop on Robot and Human Interactive
Communication (ROMAN), 2006.
[ bib ]
|
|
[7]
|
O. Martinez Mozos and W. Burgard.
Supervised learning of topological maps using semantic information
extracted from range data.
In Proc. of the IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS), pages 2772-2777, Beijing, China, 2006.
[ bib |
.pdf ]
|
|
[8]
|
O. Martinez Mozos, A. Rottmann, R. Triebel, P. Jensfelt, and W. Burgard.
Semantic labeling of places using information extracted from laser
and vision sensor data.
In In Proc. of the IEEE/RSJ IROS 2006 Workshop: From Sensors to
Human Spatial Concepts, Beijing, China, 2006.
[ bib |
.pdf ]
|
|
[9]
|
E. Pacchierotti, H.I. Christensen, and P. Jensfelt.
Design of an office guide robot for social interaction studies.
In Proc. of the IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS'06), 2006.
[ bib |
.pdf ]
|
|
[10]
|
A. Pronobis, B. Caputo, P. Jensfelt, and H.I. Christensen.
A discriminative approach to robust visual place recognition.
In Proc. of the IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS'06), 2006.
[ bib |
.pdf ]
|
Year3
|
|
[11]
|
Oscar Martinez Mozos, Patric Jensfelt, Hendrik Zender, Geert-Jan M.
Kruijff, and Wolfram Burgard.
From labels to semantics: An integrated system for conceptual spatial
representations of indoor environments for mobile robots.
In Proc. of the IEEE ICRA 2007 Workshop: Semantic information in
robotics (ICRA), Roma, Italy, 2007.
[ bib |
.pdf ]
|
|
[12]
|
K. O. Arras, O. Martinez Mozos, and W. Burgard.
Using boosted features for the detection of people in 2d range data.
In Proc. of the IEEE Int. Conf. on Robotics & Automation
(ICRA), 2007.
[ bib |
.pdf ]
|
|
[13]
|
Geert-Jan M. Kruijff, Hendrik Zender, Patric Jensfelt, and Henrik I.
Christensen.
Situated dialogue and spatial organization: What, where... and
why?
International Journal of Advanced Robotic Systems, 4(2), 2007.
[ bib |
.pdf ]
|
|
[14]
|
Hendrik Zender and Geert-Jan M. Kruijff.
Multi-layered conceptual spatial mapping for autonomous mobile
robots.
In Holger Schultheis, Thomas Barkowsky, Benjamin Kuipers, and
Bernhard Hommel, editors, Control Mechanisms for Spatial Knowledge
Processing in Cognitive / Intelligent Systems, volume Technical Report
SS-07-01 of Papers from the AAAI Spring Symposium, pages 62-66, Menlo
Park, CA, USA, March 2007. AAAI Press.
[ bib ]
|
Print this page
Print this page
|
|
|