2 Objective of project
2.1 The problem
3 Scenario-based Research
4 Examples of sub-topics
Planning and Acting in Dynamic Multiagent Environments
planning and acting
requirements for an active robot
language and social interaction
4.9 Software tools
This document is a summary of a proposal produced in October
2003, inspired by the visionary FP6 objective
"To construct physically instantiated ... systems
that can perceive, understand ... and interact with their environment,
and evolve in order to achieve human-like performance in activities
requiring context-(situation and task) specific knowledge"
We assume that this
is far beyond the current state of the art and will remain so for many
years. However we have devised a set of intermediate targets based on
that vision. Achieving these targets will provide a launch pad for
further work towards the long term vision.
In particular we aim to advance the science of cognitive
through a multi-disciplinary investigation of requirements,
design options and trade-offs for human-like,
integrated, physical (e.g. robot) systems, including requirements for
architectures, for forms of representation, for perceptual mechanisms,
for learning, planning, reasoning, motivation, action, and
The results of the investigation will provide the basis for a
of increasingly ambitious working systems to test and demonstrate the
ideas. Devising demanding but achievable test scenarios, including
scenarios in which a machine not only performs some task but
that it understands what it has done, and why, is one of the
challenges to be addressed in the project.
Preliminary scenarios have been proposed. Further scenarios, designs
and implementations will be developed on the basis of (a) their
potential contribution to the long term vision, (b) their achievability
(which may not be obvious in advance) and (c) the possibility of
practical applications, for instance in machines to help house-bound
invalids who prefer not to impose too much on fellow humans.
Tools will be developed to
support this exploration. The
work will use an `open' framework facilitating collaboration with a
variety of international projects with related objectives.
2 Objective of project
2.1 The problem
Despite impressive progress in many specific sub-topics in AI and
Cognitive Science, the field as a whole moves slowly. Most systems able
to perform complex tasks that humans and other animals can perform
easily, for instance robot manipulators, or intelligent advisers, have
to be very carefully crafted, normally their field of expertise is very
narrow, and they are hard to extend. Whatever intelligence they have
could be described as `insect-like' insofar as they have capabilities
that they do not understand, they do not know why they do things one
rather than another, they cannot explain what they are doing, they
cannot improve their performance by taking advice from a human, and
cannot give advice or help to someone else doing similar tasks. Part of
the reason for this is that over the last few decades research has
become highly fragmented: with many individuals and research teams
focusing their efforts on very narrowly defined problems, for instance
in vision, or learning, or language processing, or problem solving, or
We propose to try to overcome these limitations by using ideas from
several relevant disciplines to investigate an ambitious distant vision
of a highly competent robot, combining many different capabilities in a
coherent manner, for instance a non-trivial subset of the capabilities
of a typical human child a few years old.
The scientific importance of this
objective is that such a robot would require generic capabilities
providing a platform for many different sorts of subsequent
since such a child can develop in any human culture and benefit
from many forms of education. However, we do not underestimate the
profound difficulties of this challenge.
The project will make use of results in the various component
disciplines of AI and cognitive science, for instance, new results on
perception, learning, reasoning, language processing, memory, plan
execution, and studies of motivation and emotion. It should also
new substantive contributions to those disciplines in the form of new
theories and working models, and also new research questions.
The goal of producing a robot with many of the capabilities of a
child cannot be achieved in the time-frame of this project: it is an
enormous long term challenge. However, by analysing in great detail the
many requirements for moving in that direction, we can derive sets of
successively less challenging sub-goals that should provide significant
steps towards the distant goal. Some of these sub-goals are achievable
in the time-frame of the project.
The project has two main types of objectives concerned with theory
and implementation, and related
2.2 Theory objectives:
We aim to produce a body of theory, at different levels of abstraction,
requirements, architectures, forms of representation, kinds of
ontologies, types of reasoning, kinds of knowledge, and varieties of
mechanisms relevant to embodied, integrated, multi-functional
intelligent systems. The results should be useful both for enhancing
scientific understanding of naturally occurring intelligent systems
(e.g. humans) and for the design of artificial
We expect such a theory to be built around the core idea of a
self-modifying architecture comprising different sorts of capabilities
which develop over time. The theory would cover both analysis of
requirements for such an architecture and also design options
with their trade-offs. Sub-theories would be concerned with
different sorts of components of the architecture.
Key ideas for the architecture will be inspired by biological
Requirements for perceptual and motor systems that operate
concurrently with, and in close coordination with, processes in all
the different architectural layers will be analysed, as will varieties
of learning (discussed below).
Different varieties of communication and social interaction will be
related to the different architectural layers: for instance,
(a) dancing, fighting and moving heavy objects require coupled
reactive systems; (b) linked collaborative actions spanning
spatial and temporal gaps, e.g. in building houses and bridges,
require deliberative capabilities; (c) the ability to
empathise, exhort, persuade, may require extensions of
self-understanding in a meta-management (reflective) system
other-understanding. (All of these influences can go both ways: e.g.
meeting requirements for social developments may enhance individual
The theory will also have to account for affective and motivational
mechanisms that allow an individual to exist as an autonomous agent
instead of always having to be told exactly what to do or how to deal
with conflicts and choices.
Since different sorts of designs are
possible the theory will include an analysis of architectural
options and trade-offs as well as design-options and trade-offs
2.3 Implementation objectives
We expect to produce well-documented implementations of working systems
demonstrating applications of parts of the theory, e.g. in a robot
capable of performing a collection of diverse tasks in a variety of
challenging scenarios, including various combinations of visual and
other forms of perception, learning, reasoning, communication and goal
Distinctive features of such a robot will include integration of
sub-functions (e.g. vision and other senses can be combined in making
sense of a scene, vision can be used to disambiguate a sentence by
looking at what the sentence refers to, and learning processes can
enhance different kinds of capabilities, including linguistic, visual,
reasoning, planning, and motor skills), and also self-understanding.
Central to all of this will be an understanding of affordances.
Nature vs. Nurture: How much should be programmed into such a
robot and how much will have to be learnt by interacting with the
environment, including teachers and other agents? Projects aiming to
develop intelligent systems on the basis of powerful and general
learning mechanisms starting from something close to a "Tabula rasa"
risk being defeated by explosive search spaces requiring evolutionary
time-scales for success.
Biological evolution enables individuals to avoid this problem by
providing large amounts of "innate" information in the genomes of all
species. In the case of humans this seems to include meta-level
information about what kinds of things are good to learn, helping to
drive the learning processes as well as specific mechanisms, forms
of representation, and architectures to enable them to work.
We shall avoid dogmatism on what needs to be innate, and
explore various alternatives for amounts and types of innate knowledge
and produce an analysis of the trade-offs.
2.4 Subsidiary activities
The project will also produce a succession of workshops and summer
schools, publications, and an `open' web site containing code,
development tools, theoretical papers, various kinds of re-usable
libraries, demonstration packages, etc, including contributions from
external collaborators, academic and industrial. We expect to have to
share development of tools with other projects.
3 Scenario-based Research
Three scenarios have been identified for the study of systems
integrating many functions within a single architecture. The first one
The Explorer is concerned with a trainable robot able to learn
how to find its way around a building or some terrain. The second
scenario The PlayMate1
is concerned with a robot that is able to
manipulate 3-D structures, for instance in order to build a copy of
a structure already built by someone else. The third scenario The
Philosopher concerns the ability of a robot to reflect on what it
done, explain what is done and why, answer questions about why it did
not do something and about what would have happened if it had done
something different, or about what someone else has done wrong.
The third scenario will be built on top of the first two: each of them
will provide a test-bed for the mechanisms and representations proposed
for acquiring and using reflective-understanding of both actions and
In all of the scenarios we shall investigate various options for innate
knowledge and capabilities and for kinds of learning that can arise
out of and build on what is innate.
The first two scenarios make use of very different spatial properties
and relationships because they involve different spatial scales and
different relationships between percepts, body-parts and the actions
performed, implying very different requirements for understanding the
structure of space and the positive and negative affordances relevant
the tasks. Different forms of representation may be useful (a) for
thinking about an individual moving around in a (mostly) 2-D space and
(b) for thinking about complex objects being simultaneously moved and
rotated in a 3-D space by an agent which itself has spatial structure
which changes during actions. Large-scale and small-scale
spatial actions also have different requirements.
Different learning processes are needed because of
different time scales and different relationships to perceived and
remembered information: the Explorer, unlike the PlayMate involves
constantly relating a small region of space to a much larger enclosing
region, whereas the PlayMate involves constantly relating visible
surfaces to invisible surfaces and perceived spatial relationships to
possible future spatial relationships. Different planning formalisms
strategies may be required.
Combining the two in a single scenario integrating the different forms
of understanding of space and motion will be a major challenge. Even
uses of language in the two contexts will have interesting differences
(e.g., different interpretations for "here" in different contexts:
"Fetch a hammer from the store-room and bring it here", vs
the hammer here where I can reach it").
We also intend to explore the relationship between self-understanding
and other-understanding in such contexts.
These scenarios raise many difficult problems whose solution will
require interdisciplinary advances.
The next section illustrates our approach to some of the problems.
4 Examples of sub-topics
For many years, research in AI and computational cognitive science
focused on forms of representation, algorithms to operate on them, and
knowledge to be encoded and deployed or derived. In the last decade or
two it has become clear that there is also a need to investigate
alternative ways of putting pieces together into a complex functioning
system, possibly including parts that operate concurrently and
asynchronously on different sub-tasks, for instance, perception,
action, reasoning and communicating.
Unfortunately this has led to a plethora of architectures being
proposed, and much ambiguity in the terminology used to describe them.
One reason for this is lack of agreement on what the space of possible
architectures is like, or on the terminology for describing
architectures or on criteria for evaluating and comparing them.
We aim to produce a framework
for describing and comparing architectures. A first draft and
simple example of such a framework is the CogAff schema
described in ,
partly inspired by 
and work by Minsky. The schema classifies components of an architecture
in terms of their functional role, using different functional
dimensions, including a crude three-way division between perceptual,
central and action components, and another three-way division between
components concerned with reactive, deliberative or meta-management
functions. Superimposing those divisions gives a grid of nine types of
components which may or may not be present in an architecture, and
may be connected in various ways to other components. Other ways of
distinguishing architectural components will be needed. E.g. different
sorts of developmental and learning processes, and also different types
of motivational and emotional processes, will be associated with
different sorts of components.
Another architectural possibility is the inclusion of very fast
pattern recognition mechanisms connected to many parts of the
architecture making it possible to detect problems that require rapid
and global reorganisation of behaviour, e.g. freezing, fleeing,
fighting, or pouncing on prey.
Such an "alarm" mechanism could account for several types of
emotions and is reminiscent of functions of brain-stem and amygdala.
In recent years many researchers have attempted to design robots using
only reactive mechanisms, arguing that either features of the
environment or emergent interactions between many individuals will
produce effects that were thought to require deliberative and other
mechanisms. Others have argued that this suffices only for simple
organisms and insect-like robots. Instead of engaging in such battles
shall try to understand under which conditions the various types of
architectural components are useful.
A difficult challenge will be designing different parts of the
architecture so that they can interact unexpectedly while running. A
visual system may need to switch from looking ahead for a gap in a
to looking down at uneven terrain, so as to guide walking actions.
Likewise detailed walking actions may have to be modulated or
on the basis of high level perceptual processes, e.g. noticing evidence
of a slippery surface. Likewise speech may need to be modulated or
re-directed on the basis of visual processes that detect puzzlement in
the face of the listener nor notice something that answers a question
before it is fully formulated.
One of our tasks is to explore whether the self-understanding
that most AI systems lack can be based on an
architectural layer permitting self-observation, classification,
evaluation and possibly some control of internal states and processes,
especially deliberative processes that are capable of getting stuck in
loops, wasting resources by repeating sub-tasks or not noticing
opportunities. An important form of learning might include
detecting such cases and finding out how to reduce their
This is related to notions of "executive function" used in psychology
and psychiatry. Empirical research on executive functions in humans and
their development may both illuminate and be illuminated by exploratory
designs of artificial cognitive systems with similar functions.
Recently some researchers have claimed that animals or robots need
Our response is that all organisms use sesnory information to determine
how to select actions that use internal energy. Biological evolution
discovered many variations on that theme, depending on the kind of
information acquired, how it is processed, how it is used, when it is
used (e.g. long-term storage may be required), how it is transformed,
how it is combined with other information, and how it is communicated.
In all cases there is some medium used for the information,
there are great differences between different media, including whether
they are discrete or continuous, one-dimensional or multidimensional,
what sorts of structures they can have, and so on. We can avoid
about whether some of them are or are not really
by investigating what kinds of representations they are, and
their costs and benefits are to the organism.
Our proposal emphasises perception of affordances, namely the
ability of an agent not merely to see what already exists
(objects, handles, surfaces, gaps, holes, etc.) but also to see
the possibilities for action and the constraints on possible
actions, e.g. movements, grasping, folding, joining, separating,
lifting, dropping, etc. This leads to novel requirements for
perceptual mechanisms. Most work on perception considers how to
represent the entities that exist and are
perceived, whereas affordances are concerned with what does not
not exist but might exist. We need to find ways of perceiving them
without generating combinatorial explosions of
possibilities. This may require new forms of representation of
possibilities and constraints on possibilities.
Part of the research will be on requirements for
representations and the trade-offs between different forms of
representation in different parts of an integrated system. There has
already been much investigation of representations suitable for very
specific tasks (e.g. extracting structure from motion in order to
produce a graphical display of a scene from novel viewpoints), but the
task of designing representations for systems with multiple
(e.g. supporting verbal descriptions of the scene. or controlling
actions, or aiding causal understanding, or allowing performance to
improve with practice) may lead to new, more challenging, requirements.
In a complex architecture, there may be different kinds of learning
mechanisms in different components. Current theories of learning will
need to be substantially extended to explain, for instance, kinds of
learning that extend the individual's ontology for perceiving and
thinking about the environment, and kinds of learning that develop
fluency and speed in motor performance, e.g. because reactive
are trained by processes in deliberative components. For the
system as a whole we shall investigate different sorts of learning
within our planned scenarios, including tutor-driven learning
where a tutor gives various kinds of tasks, explanations,
demonstrations, corrections, etc., tutor-supervised learning
the learner (the robot) takes most of the initiative and requests help
or advice when difficulties are encountered, and exploratory
learning, where the robot notices new phenomena and categorizes
using its previously acquired knowledge, using whatever mode of
categorization (as a type of object, a type of event, a type of
difficulty, a type of solution, etc.) is appropriate.
Requirements for continuous, incremental, open-ended, life-long
will be analysed. These requirements rule out forms of learning which
separate a training phase from a phase in which information is used.
Humans, like many other animals, continue extending and refining skills
of many kinds for many years. Our robot should be able to do the same.
That implies that the early knowledge, both about the environment and
about oneself, while useable is incomplete in many ways. This
requirement for indefinite learning will probably provide important
clues as to the nature of some of aspects of self knowledge. Obviously
not everything improves over time: you know exactly how many arms,
and fingers you have at a relatively early stage, whereas developing
ball-catching, stone throwing, berry picking, tool-manipulating and
violin-playing skills may go on for a long time thereafter. Such
continuous improvements in precision and speed might be produced by
feedback-driven partly probabilistic adaptive mechanisms. However some
kinds of learning involve development of new large scale `chunks' that
are re-usable, such as the actions appropriate to a particular tool, or
playing a particular chord on the piano, or fluently typing a certain
syllable, or a whole word, on a keyboard, where each chunked action
allows quite a lot of variation in detailed movements according to
context. Such re-usable chunks require at least two distinct types of
learning (a) whatever has to be learnt in order to perform them, and
whatever has to be learnt in order to make plans in advance of
A major challenge is detecting and removing, or preventing
inconsistencies, for instance where learning occurs at different levels
in abstraction hierarchies. It may be that in view of the explosive
combinatorics the system will have to tolerate some undetected
inconsistencies and take remedial action only when contradictions are
These and other considerations suggest that different forms of learning
about the same objects and actions may happen in different parts of the
architecture. In particular, it may be useful to have different
perceptual and learning processes going on concurrently in a reactive
layer, in a deliberative layer and in a meta-management
(self-reflective) layer that includes observation of the processes in
the other layers. Part of the challenge to be address is how these
different processes share the same physical sensors and motors for
All these requirements constrain the sorts of ontologies that develop,
the sorts of representations that facilitate learning, and mechanisms
for making results of learning usable in different tasks. This should
provide new ways of testing and evaluating previous theories and
4.4 Perception-Action Modelling
State-of-the-art approaches to perception, action, and planning, fall
into two broad classes: abstract relational representations of the
effects of actions as used in classical AI planning and probabilistic
models of action effects in continuous spaces as used in robot
localisation and mapping. The former are general, and non-task
but assume either powerful symbolic perceptual systems, or information
provided from elsewhere, and if not carefully designed can lead to
explosive search spaces. The latter capture uncertainty in both action
and observation, and for some-problems they can converge to solutions
without massive search. They are, however, typically tied to geometric
representations of space, and to specific types of sensors used and
specific uses of perceptual information. One of our tasks is to connect
these different types of representation in such a way that updates to
one representation can be propagated to other representations. We may
find that neither mode of representation as currently used is adequate
for some of the tasks, for instance representation of affordances,
involve multiple possibilities for changing relationships, or coping
with problems requiring significant extensions of the robot's
current ontology - e.g. learning new concepts of physics or chemistry,
or learning to think about goals and thoughts of other agents.
4.5 Continuous Planning and
Acting in Dynamic Multiagent Environments
Realistic dynamic and partially observable environments pose great
difficulties. Other agents' actions as well as naturally occurring
events (e.g. sunset) may change the agent's surroundings in ways it
cannot foresee, control or even perceive. So plan-execution must be
modulated in the light of perceived changes (e.g. stop moving when your
path or your line of site is blocked). With increasing dynamics of the
world an agent's knowledge will become less accurate, and its plans
likely to require modification during execution - yet not all
not all plans can easily be repaired during plan execution. Switching
purely reactive forms of planning is no solution
since there are situations in which how best to react cannot be decided
without thinking several steps ahead. Constructing conditional plans
that work under all possible circumstances is both computationally
explosive and may require unrealistic prophetic capabilities.
One solution may be to allow agents to postpone resolution of
contingencies and handle them only if they occur. The robot may be able
to learn which actions are not worth planning in great detail, and how
to use planned and unplanned acquisition of new information during
execution to check the applicability of plans, to fill gaps in abstract
plans and to help with plan revision. This requires an architecture in
which unplanned-for perceptual processes can cause current external and
internal behaviours to be interrupted or modulated. This should include
the ability to detect new malfunctions in sensors or motors which may
require either repair or use of alternative strategies.
4.6 Collaborative planning and
Further complications and further opportunities arise when other agents
are in the environment. They can produce many surprises. In general it
is difficult or impossible to predict everything that other intelligent
systems will do. However, friendly others may be willing to give
provide useful factual information or collaborate either in forming
plans or executing them, or both. All this requires
communication. However, different groups of agents may have
different ways to communicate. Groups of artificial agents can
communicate using special-purpose formal languages, while human-robot
interactions should allow the human to use more convenient methods.
The project will investigate requirements for various kinds of
communication in different sorts of contexts and will analyse
between different solutions, including trade-offs concerning forms of
representation to be used within individual agents and forms of
communication between agents in multi-agent scenarios of different
kinds. For instance, requirements for agents collaborating on
"Explorer" tasks that require moving between different rooms of a
building where contact may be temporarily lost are different from the
requirements for "PlayMate" tasks where two or more agents are jointly
building some structure where they differ in which parts and
relationships are visible and what sorts of actions they are performing
at any one time, e.g. picking up, putting down, holding together,
holding something out of the way, etc. In the latter, communication may
require more subtle inferences about what the other can see or do.
4.7 Further requirements for an
An active robot has to be able to control and make use of its own body,
and the relation of its body to the environment. In the PlayMate
the agent has to be able to move its arm to a target position, do
eye-hand coordination, and do these things irrespective of whether the
arm is carrying a load, impeded by some obstacle, or moving in unusual
conditions such as injury or mechanical, or sensor dysfunction. How
should such an agent represent information about its own body? It is
unlikely that animals have full 3-D geometric models of their bodies.
One possibility is to use dynamically changing affordances, i.e.
information about possibilities for and constraints on, possible
as a kind of knowledge combining things in the environment and oneself.
So different kinds of self-knowledge will be relevant in the Explorer
and the PlayMate scenarios.
4.8 Meaning, language and
In all the scenarios the robot will have to be able to acquire,
manipulate, store, combine and use information, about the environment
and about itself and other agents. Some information may be expressed
only in internal forms, others in external communications and some in
both forms. This raises deep questions about how it comes about that
internal or external structures can be treated by the robot as having
semantic content. This is sometimes referred to in AI circles as the
problem of `symbol-grounding',
but is much older in the history of philosophy.
We expect to show that no simple answers are correct, since in order to
be able to do anything at all, including being able to perceive and
learn, the robot, like a new-born animal, will require some "innate"
information which implies that not all information can come from
perceiving and acting in the world. However, it is clear that animals
learn about new things through interacting with the world so that the
innate mechanisms must allow bootstrapping of new ontologies driven at
least in part by interacting with the environment.
It is likely that several different kinds of semantic development will
be required, including discovery of new hierarchies of sub-categories
through self-organising classifiers, and also high level conceptual
extensions through discovery of structural inadequacies in an existing
ontology - e.g. the need to explain why two things that appear very
similar to the senses behave in very different ways, perhaps because
they are similar agents with different beliefs and desires, or because
they have different, unobservable, physical structures or internal
mechanisms. Another process that can drive ontological extension is
discovering bugs and features in the agent's own planning and thinking
strategies that are not objects of ordinary perception.
Finally, language-based social processes can drive semantic
as happens when humans learn school and university subjects using
ontologies that extend far beyond what the learner can sense or act on.
Sometimes a precursor for this is learning a new more appropriate form
of representation, e.g. in learning to use mathematical notations,
circuit diagrams, chemical formulae, maps of various kinds, and most
recently programming languages. It can even include acquiring explicit
knowledge about the language used for communication, after the
has been acquired implicitly through learning to use the language.
Several aspects of social interaction implicit in our descriptions
above, will need to be made explicit as the project progresses. For
instance, if the robot is to be able to communicate effectively whether
as slave, pupil, collaborator, negotiator, or teacher it will have to
acquire an understanding of a number or facts about language users,
as that they have percepts, beliefs, desires, intentions, preferences,
principles, etc. that they have a variety of types of knowledge and
skill and can differ in their capabilities and also differ over time as
a result of learning. It will also have to understand various kinds of
dialogue structures and how they can be used (or abused) to achieve
various kinds of goals.
The fact that fairly abstract dialogue structures (e.g. requesting
clarification before answering a question) can coexist with other kinds
of processes (e.g. listening to and watching other agents,
completing some action,
planning the next sentence,
noticing that the hearer looks puzzled)
helps to determine requirements for the architecture.
Very young children cannot do such things at all, let alone do them
concurrently. Yet they seem to develop those abilities over time.
This is one of many indications that the information-processing
architecture itself develops over time. How to achieve that in our
is one of the hard questions to be investigated.
4.9 Software tools
Success of a project like this will depend on tools that support
rapid-prototyping for exploratory construction of complex architectures
with many interacting, concurrently active components performing
different tasks, possibly at different levels of abstraction.
Existing toolkits are mostly either committed to a particular sort of
architecture or else aimed at multi-agent systems composed of lots of
relatively simple agents perhaps distributed over
many machines. More general and open-ended toolkits will be needed,
including tools for developing mechanisms that allow self-observation
and self-criticism during program execution (meta-management), and
that support design and implementation of architectures that develop
within an individual, something not achieved by
current learning mechanisms.
We do not claim that we can achieve our long term targets within the
scope of this project - or even a large subset. However, unless
researchers at least try to assemble all the various pieces of the
puzzle that they have been mostly studying in isolation they will fail
to see even the trees properly because they don't see the larger wood
which they are part.
The problems are so difficult that many will regard even thinking about
them as a waste of time. Our answer is that by carefully analysing the
long term goal and working back from it to intermediate goals we can
define short-term and intermediate objectives that are attainable and
take us in the right direction.
- Nilsson, N. Artificial Intelligence: A New Synthesis.
Morgan Kaufmann, San Francisco, 1998.
- Sloman, A. Beyond shallow models of emotion. Cognitive
Processing: International Quarterly of Cognitive Science 2, 1
to as "CopyCat" in
File translated from
On 16 Mar 2004, 00:08.