CoSy logo Cognitive Systems for Cognitive Assistants
 
 
 
[1] Michael Stark, Philipp Lies, Michael Zillich, Jeremy Wyatt, and Bernt Schiele. Functional object class detection based on learned affordance cues. In 6th International Conference on Computer Vision Systems (ICVS), May 2008. Accepted.
[ bib | .pdf ]
Current approaches to visual object class detection mainly focus on the recognition of abstract object categories, such as cars, motorbikes, mugs and bottles. Although these approaches have demonstrated impressive performance in terms of recognition, their restriction to abstract categories seems artificial and inadequate in the context of embodied, cognitive agents. Here, distinguishing objects according to functional aspects based on object affordances is vital for a meaningful human-machine interaction. In this paper, we propose a complete system for the detection of functional object classes, based on a representation of visually distinct hints on object affordances (affordance cues). It spans the complete cycle from tutor-driven acquisition of affordance cues, one-shot learning of corresponding object models, and detecting novel instances of functional object classes in real images.
[2] M. Kristan, D. Skocaj, and A. Leonardis. Incremental learning with Gaussian mixture models. In Computer Vision Winter Workshop CVWW 2008, pages 25-32, Moravske toplice, Slovenia, February 2008.
[ bib | .pdf ]
In this paper we propose a new incremental estimation of Gaussian mixture models which can be used for applications of online learning. Our approach allows for adding new samples incrementally as well as removing parts of the mixture by the process of unlearning. Low complexity of the mixtures is maintained through a novel compression algorithm. In contrast to the existing approaches, our approach does not require fine-tuning parameters for a specific application, we do not assume specific forms of the target distributions and temporal constraints are not assumed on the observed data. The strength of the proposed approach is demonstrated with an example of online estimation of a complex distribution, an example of unlearning, and with an interactive learning of basic visual concepts.
[3] S. Hongeng and J. Wyatt. Learning Causality and Intentional Actions, pages 27-46. LNAI: Towards Affordance-Based Robot Control. Springer, 2008.
[ bib | .pdf ]
Previous research has shown that human actions can be detected by motion patterns. However, labeling motion patterns is not sufficient in a cognitive system that requires reasoning about the agent's intentions, and how the environmental context affects the way an action is performed. In this paper, we develop a graphical model that captures how the movements that realize the action vary depending on the situ- ations, and present statistical learning algorithms. Using ob ject manip- ulation tasks, we illustrate how a system infers the agent's goals from visual observation and compare results with findings in psychological experiments.
[4] M. Kristan, D. Sko}caj, and A. Leonardis. Online kernel density estimation for interactive learning. Image and Vision Computing, 2008.
[ bib ]
In this paper we propose a Gaussian-kernel-based online kernel density estimation which can be used for applications of online probability density estimation and online learning. Our approach generates a Gaussian mixture model of the observed data and allows online adaptation from positive examples as well as from the negative examples. The adaptation from the negative examples is realized by a novel concept of unlearning in mixture models. Low complexity of the mixtures is maintained through a novel compression algorithm. In contrast to the existing approaches, our approach does not require fine-tuning parameters for a specific application, we do not assume specific forms of the target distributions and temporal constraints are not assumed on the observed data. The strength of the proposed approach is demonstrated with examples of online estimation of complex distributions, an example of unlearning, and with an interactive learning of basic visual concepts.
[5] D. Skocaj, M. Kristan, and A. Leonardis. Continuous learning of simple visual concepts using incremental kernel density estimation. In International Conference on Computer Vision Theory and Applications, pages 598-604, Funchal, Madeira, Portugal, January 2008.
[ bib | .pdf ]
In this paper we propose a method for continuous learning of simple visual concepts. The method continuously associates words describing observed scenes with automatically extracted visual features. Since in our setting every sample is labelled with multiple concept labels, and there are no negative examples, reconstructive representations of the incoming data are used. The associated features are modelled with kernel density probability distribution estimates, which are built incrementally. The proposed approach is applied to the learning of object properties and spatial relations.
[6] Michael Stark and Bernt Schiele. How good are local features for classes of geometric objects. In Eleventh IEEE International Conference on Computer Vision (ICCV), October 2007. Accepted.
[ bib | .pdf ]
Recent work in object categorization often uses local image descriptors such as SIFT to learn and detect object categories. As such descriptors explicitly code local appearance they have shown impressive results on objects with sufficient local appearance statistics. However, many important object classes such as tools, cups and other man-made artifacts seem to require features that capture the respective shape and geometric layout of those object classes. Therefore this paper compares, on a novel data collection of 10 geometric object classes, various shape-based features with more appearance based descriptors such as SIFT. The analysis includes a direct comparison of feature statistics as well as the results within standard recognition frameworks. The results suggest that there are indeed differences between shape- based and more appearance-based features but that those differences do not always conform with what one might expect.
[7] D. Skocaj, G. Berginc, B. Ridge, A. Štimec, M. Jogan, O. Vanek, A. Leonardis, M. Hutter, and N. Hewes. A system for continuous learning of visual concepts. In International Conference on Computer Vision Systems ICVS 2007, Bielefeld, Germany, March 2007.
[ bib | .pdf ]
We present an artifficial cognitive system for learning visual concepts. It comprises of vision, communication and manipulation sub- systems, which provide visual input, enable verbal and non-verbal com munication with a tutor and allow interaction with a given scene. The main goal is to learn associations between automatically extracted visual features and words that describe the scene in an open-ended, continuous manner. In particular, we address the problem of cross-modal learning of visual properties and spatial relations. We introduce and analyse several learning modes requiring different levels of tutor supervision.
[8] D. Skocaj, B. Ridge, G. Berginc, and A. Leonardis. A framework for continuous learning of simple visual concepts. In Computer Vision Winter Workshop 2007, pages 99-105, St. Lambrecht, Austria, February 2007.
[ bib | .pdf ]
We present a continuous learning framework for learning simple visual concepts and its implementation in an artificial cognitive system. The main goal is to learn associations between automatically extracted visual features and words that describe the scene in an open-ended, continuous manner. In particular, we address the problem of cross-modal learning of elementary visual properties and spatial relations; we show that the same learning mechanism can be used to both types of concepts. We introduce and analyse several learning modes requiring different levels of tutor supervision, ranging from a completely tutor driven to a completely autonomous exploratory approach.
[9] Somboon Hongeng and Jeremy Wyatt. Learning causality and intention in human actions. In Proceedings of the 6th IEEE-RAS International Conference of Humanoid Robots (Humanoids'06). IEEE, December 2006.
[ bib | .pdf ]
Previous research has shown that human actions can be detected by motion patterns. However, labeling motion patterns is not sufficient in a cognitive system that requires reasoning about the agent's intentions, and how the environmental context affects the way an action is performed. In this paper, we develop a graphical model that captures how the movements that realize the action vary depending on the situations, and present statistical learning algorithms. Using object manipulation tasks, we illustrate how a system infers the agent's goals from visual observation and compare results with findings in psychological experiments.
Keywords: cosy; irlab
[10] Sanja Fidler, Danijel Skocaj, and Aleš Leonardis. Combining reconstructive and discriminative subspace methods for robust classification and regression by subsampling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(3):337-350, March 2006.
[ bib | .pdf ]
Linear subspace methods that provide sufficient reconstruction of the data such as PCA offer an efficient way of dealing with missing pixels, outliers, and occlusions that often appear in the visual data. Discriminative methods, such as LDA and CCA, which on the other hand, are better suited for classification and regression tasks, are highly sensitive to corrupted data. We present a theoretical framework for achieving best of both types of methods: an approach that combines the discrimination power of discriminative methods with the reconstruction property of reconstructive methods which enables one to work on subsets of pixels in images, to efficiently detect and reject the outliers. The proposed approach is therefore capable of robust classification/regression with a high-breakdown point. The theoretical results are demonstrated on several computer vision tasks showing that the proposed approach significantly outperforms the standard discriminative methods in the case of missing pixels and images containing occlusions and outliers.
[11] B. Leibe, A. Leonardis, and B. Schiele. An implicit shape model for combined object categorization and segmentation. In M. Hebert, J. Ponce, C. Schmid, and A. Zisserman, editors, Towards Category-Level Object Recognition, LNCS. Springer, 2006. to appear.
[ bib ]
We present a method for object categorization in real-world scenes. Following a common consensus in the field, we do not assume that a figure-ground segmentation is available prior to recognition. However, in contrast to most standard approaches for object class recognition, our approach automatically segments the object as a result of the categorization. This combination of recognition and segmentation into one process is made possible by our use of an Implicit Shape Model, which integrates both capabilities into a common probabilistic framework. This model can be thought of as a non-parametric approach which can easily handle configurations of large numbers of object parts. In addition to the recognition and segmentation result, it also generates a per-pixel confidence measure specifying the area that supports a hypothesis and how much it can be trusted. We use this confidence to derive a natural extension of the approach to handle multiple objects in a scene and resolve ambiguities between overlapping hypotheses with an MDL-based criterion. In addition, we present an extensive evaluation of our method on a standard dataset for car detection and compare its performance to existing methods from the literature. Our results show that the proposed method outperforms previously published methods while needing one order of magnitude less training examples. Finally, we present results for articulated objects, which show that the proposed method can categorize and segment unfamiliar objects in different articulations and with widely varying texture patterns, even under significant partial occlusion.

This file has been generated by bibtex2html 1.79

Print this page

 

Last modified: 9.1.2009 16:48:51