CoSy logo Cognitive Systems for Cognitive Assistants
 
 
 
[1] H. Zender, P. Jensfelt, Ó. Martínez Mozos, G.J.M. Kruijff, and W. Burgard. Conceptual spatial representations for indoor mobile robots. Robotics and Autonomous Systems, 56(6), June 2008. Special Issue From Sensors to Human Spatial Concepts.
[ bib | .pdf ]
We present an approach for creating conceptual representations of human-made indoor environments using mobile robots. The concepts refer to spatial and functional properties of typical indoor environments. Following findings in cognitive psychology, our model is composed of layers representing maps at different levels of abstraction. The complete system is integrated in a mobile robot endowed with laser and vision sensors for place and ob ject recognition. The system also incorporates a linguistic framework that actively supports the map acquisition process, and which is used for situated dialogue. Finally, we discuss the capabilities of the integrated system.
[2] H. Jacobsson, N.A. Hawes, G.J.M. Kruijff, and J. Wyatt. Crossmodal content binding in information-processing architectures. In Proceedings of the 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI), Amsterdam, The Netherlands, March 12-15 2008.
[ bib | .pdf ]
Operating in a physical context, an intelligent robot faces two fundamental problems. First, it needs to combine information from its different sensors to form a representation of the environment that is more complete than any representation a single sensor could provide. Second, it needs to combine high-level representations (such as those for planning and dialogue) with sensory information, to ensure that the interpretations of these symbolic representations are grounded in the situated context. Previous approaches to this problem have used techniques such as (low-level) information fusion, ontological reasoning, and (high-level) concept learning. This paper presents a framework in which these, and related approaches, can be used to form a shared representation of the current state of the robot in relation to its environment and other agents. Preliminary results from an implemented system are presented to illustrate how the framework supports behaviours commonly required of an intelligent robot.
[3] G.J.M. Kruijff, M. Brenner, and N.A. Hawes. Continual planning for cross-modal situated clarification in human-robot interaction. In Proceedings of the 17th International Symposium on Robot and Human Interactive Communication (RO-MAN 2008), Munich, Germany, 2008.
[ bib | .pdf ]
Cognitive robots typically operate in dynamic, open-ended environments. This may naturally lead to the robot not knowing how to understand the environment, or an agent acting therein. This raises the question of how a robot could then try and overcome its lack of understanding. The article focuses on mechanisms for overcoming failure to understand aspects of a physical situation. The article proposes an approach to situated clarification, in which, succinctly put, the robot tries to identify the issues that appear to give rise to the problem in situated understanding, and then creates a plan for addressing them. Addressing an issue may involve using dialogue with other agents. The strategies a robot can adopt in its clarification plan depend on how these issues refer to a physical situation, and acting therein. The article details the approach and its embedding in a framework for situated artificial cognition, and discusses its implementation for human-robot interaction.
[4] P. Lison and G.J.M. Kruijff. Salience-driven contextual priming of speech recognition for human-robot interaction. In Proceedings of ECAI 2008, Athens, Greece, 2008.
[ bib | .pdf ]
The paper presents an implemented model for priming speech recognition, using contextual information about salient entities. The underlying hypothesis is that, in human-robot interaction, speech recognition performance can be improved by exploiting knowledge about the immediate physical situation and the dialogue history. To this end, visual salience (objects perceived in the physical scene) and linguistic salience (objects, events already mentioned in the dialogue) are integrated into a single cross-modal salience model. The model is dynamically updated as the environment changes. It is used to establish expectations about which words are most likely to be heard in the given context. The update is realised by continuously adapting the word-class probabilities specified in a statistical language model. The paper discusses the motivations behind the approach, and presents the implementation as part of a cognitive architecture for mobile robots. Evaluation results on a test suite show a statistically significant improvement of salience-driven priming speech recognition (WER) over a commercial baseline system.
[5] H. Zender and G.J.M. Kruijff. Towards generating referring expressions in a mobile robot scenario. In Language and Robots: Proceedings of the Symposium, pages 101-106, Aveiro, Portugal, December 2007.
[ bib | .pdf ]
This paper describes an approach towards generating referring expressions that identify and distinguish spatial entities in large-scale space, e.g. in an office environment, for autonomous mobile robots. In such a scenario a dialogue is often about things and places outside the current perceptual fields of the interlocutors. One of the challenges therefore lies in determining an appropriate dialogue context. Other important issues are to have adequate models of both the large-scale spatial environment and of the user's knowledge.
[6] H. Zender, P. Jensfelt, Ó. Martínez Mozos, G.J.M. Kruijff, and W. Burgard. An integrated robotic system for spatial understanding and situated interaction in indoor environments. In Proc. of AAAI-07, pages 1584-1589, Vancouver, BC, Canada, July 2007.
[ bib | .pdf ]
A major challenge in robotics and artificial intelligence lies in creating robots that are to cooperate with people in human-populated environments, e.g. for domestic assistance or elderly care. Such robots need skills that allow them to interact with the world and the humans living and working therein. In this paper we investigate the question of spatial understanding of human-made environments. The functionalities of our system comprise perception of the world, natural language, learning, and reasoning. For this purpose we integrate state-of-the-art components from different disciplines in AI, robotics and cognitive systems into a mobile robot system. The work focuses on the description of the principles we used for the integration, including cross-modal integration, ontology-based mediation, and multiple levels of abstraction of perception. Finally, we present experiments with the integrated CoSy Explorer system and list some of the major lessons that were lea rned from its design, implementation, and evaluation.
[7] G.J.M. Kruijff and M Brenner. Modelling spatio-temporal comprehension in situated human-robot dialogue as reasoning about intentions and plans. In Proceedings of the Symposium on Intentions in Intelligent Systems, AAAI Spring Symposium Series 2007, Stanford University, Palo Alto, CA, March 2007.
[ bib | .pdf ]
The article presents a cross-modal approach to modeling spatio-temporal comprehension in situated dialogue. The article proposes an approach for representing spatiotemporal-causal structure at the level of linguistically conveyed meaning, adopting the notion of event nucleus presented [?]. In the approach, basic tense, aspect and modality can be captured, as well as aspectual coercion, and temporal sequencing. The article then discusses how the incremental construction of such linguistic representations can be combined with continuous action planning. Through cross-modal integration of action planning representations into linguistic processing, the article explores how action planning can prime selectional attention in utterance comprehension by disambiguating linguistic analyses on the basis of plan availability, and by raising expectations what action(s) may be talked about next. Furthermore, planning can complement linguistic analyses with information about spatiotemporal-causal structure established in planning inferences. This makes such inferences available for future referencing in the discourse context, yet lessening the load on dialogue comprehension for having to establish them.
[8] M. Brenner, N.A. Hawes, J. Kelleher, and J. Wyatt. Mediating between qualitative and quantitative representations for task-orientated human-robot interaction. In Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI-07), 2007.
[ bib | .pdf ]
In human-robot interaction (HRI) it is essential that the robot interprets and reacts to a human’s utter- ances in a manner that re?ects their intended mean- ing. In this paper we present a collection of novel techniques that allow a robot to interpret and ex- ecute spoken commands describing manipulation goals involving qualitative spatial constraints (e.g. “put the red ball near the blue cube”). The result- ing implemented system integrates computer vi- sion, potential ?eld models of spatial relationships, and action planning to mediate between the contin- uous real world, and discrete, qualitative represen- tations used for symbolic reasoning.
[9] G.J.M. Kruijff, H. Zender, P. Jensfelt, and H.I. Christensen. Situated dialogue and spatial organization: What, where... and why? International Journal of Advanced Robotic Systems, 4(2), 2007.
[ bib | .pdf ]
The paper presents a model of situated dialogue processing. The underlying assumption is that to understand situated dialogue, communicated meaning needs to be related to situation(s) it refers to. The model couples incremental processing to a notion of bidirectional connectivity, inspired by how humans process visually situated language. Analyzing an utterance in a word-by-word fashion, a representation of possible utterance interpretations is gradually built up. In a top-down fashion, the model tries to ground these interpretations in situation awareness, through which they can prime what is focused on in a situation. In a bottom-up fashion, the (im)possibility to ground certain interpretations primes how the analysis of the utterance further unfolds. The paper discusses the implementation of the model in a distributed, cognitive architecture for human-robot interaction, and presents an evaluation on a test suite. The evaluation shows (and quantifies) the effects different levels of linguistic- and situation-relative interpretation have on priming utterance processing.
[10] J.D. Kelleher and G.J.M. Kruijff. Incremental generation of spatial referring expressions in situated dialog. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pages 1041-1048, 2006.
[ bib | .pdf ]
This paper presents an approach to incrementally generating locative expressions. It addresses the is- sue of combinatorial explosion inherent in the con- struction of relational context models by: (a) con- textually defining the set of objects in the context that may function as a landmark, and (b) sequenc- ing the order in which spatial relations are consid- ered using a cognitively motivated hierarchy of re- lations, and visual and discourse salience.
[11] J.D. Kelleher, G.J.M. Kruijff, and F. Costello. Proximity in context: an empirically grounded computational model of proximity for processing topological spatial expressions. In Proceedings of ACL/COLING 2006, 2006.
[ bib | .pdf ]
The paper presents a new model for context- dependent interpretation of linguistic expressions about spatial proximity between objects in a nat- ural scene. The paper discusses novel psycholin- guistic experimental data that tests and verifies the model. The model has been implemented, and en- ables a conversational robot to identify objects in a scene through topological spatial relations (e.g. 'X near Y'). The model can help motivate the choice between topological and projective prepositions.
[12] G.J.M. Kruijff, J.D. Kelleher, G. Berginc, and A. Leonardis. Structural descriptions in human-assisted robot visual learning. In Proc. 1st Annual Conference on Human-Robot Interaction (HRI'06), 2006.
[ bib | .pdf ]
The paper presents an approach to using structural descriptions, obtained through a human-robot tutoring dialogue, as labels for the visual ob ject models a robot learns. The paper shows how structural descriptions enable relating models for different aspects of one and the same ob ject, and how being able to relate descriptions for visual models and discourse referents enables incremental updating of model descriptions through dialogue (either robot- or human-initiated). The approach has been implemented in an integrated architecture for human-assisted robot visual learning.
[13] G.J.M. Kruijff, J.D. Kelleher, and N. Hawes. Information fusion for visual reference resolution in dynamic situated dialogue. In E. André, L. Dybkjaer, W. Minker, H. Neumann, and M. Weber, editors, Perception and Interactive Technologies (PIT 2006). Spring Verlag, 2006.
[ bib | .pdf ]
Human-Robot Interaction (HRI) invariably involves dialogue about objects in the environment in which the agents are situated. The paper focuses on the issue of resolving discourse references to such visual objects. The paper addresses the problem using strategies for intra-modal fusion (identifying that different occurrences concern the same object), and inter-modal fusion, (relating object references across different modalities). Core to these strategies are sensori- motoric coordination, and ontology-based mediation between content in different modalities. The approach has been fully implemented, and is illustrated with sev- eral working examples.
[14] G.J.M. Kruijff. Context-sensitive utterance planning for ccg. In Proceedings of the European Workshop on Natural Language Generation, Aberdeen, Scotland, 2005.
[ bib | .pdf ]
The paper presents an approach to utterance planning, which can dynamically use context information about the environment in which a dialogue is situated. The approach is functional in nature, using systemic networks to specify its planning grammar. The planner takes a description of a communicative goal as input, and produces one or more logical forms that can express that goal in a contextually appropriate way. Both the goal and the resulting logical forms are expressed in a single formalism as ontologically rich, relational structures. To realize the logical forms, OpenCCG is used. The paper focuses primarily on the implementation, but also discusses how the planning grammar can be based on the grammar used in OpenCCG, and trained on (parseable) data.

This file has been generated by bibtex2html 1.79

Print this page

 

Last modified: 9.1.2009 16:50:17