CoSy logo Cognitive Systems for Cognitive Assistants
 
 
 
[1] Mario Fritz and Bernt Schiele. Decomposition, discovery and detection of visual categories using topic models. In Proceedings of CVPR, June 2008.
[ bib | .pdf ]
We present a novel method for the discovery and detection of visual object categories based on decompositions using topic models. The approach is capable of learning a compact and low dimensional representation for multiple visual categories from multiple view points without labeling of the training instances. The learnt object components range from local structures over line segments to global silhouette-like descriptions. This representation can be used to discover object categories in a totally unsupervised fashion. Furthermore we employ the representation as the basis for building a supervised multi-category detection system making efficient use of training examples and outperforming pure features-based representations. The proposed speed-ups make the system scale to large databases. Experiments on three databases show that the approach improves the state-of-the-art in unsupervised learning as well as supervised detection. In particular we improve the state-of-the-art on the challenging PASCAL'06 multi-class detection tasks for several categories.
[2] Michael Stark and Bernt Schiele. How good are local features for classes of geometric objects. In Eleventh IEEE International Conference on Computer Vision (ICCV), October 2007. Accepted.
[ bib | .pdf ]
Recent work in object categorization often uses local image descriptors such as SIFT to learn and detect object categories. As such descriptors explicitly code local appearance they have shown impressive results on objects with sufficient local appearance statistics. However, many important object classes such as tools, cups and other man-made artifacts seem to require features that capture the respective shape and geometric layout of those object classes. Therefore this paper compares, on a novel data collection of 10 geometric object classes, various shape-based features with more appearance based descriptors such as SIFT. The analysis includes a direct comparison of feature statistics as well as the results within standard recognition frameworks. The results suggest that there are indeed differences between shape- based and more appearance-based features but that those differences do not always conform with what one might expect.
[3] S. Fidler and A. Leonardis. Towards scalable representations of visual categories: Learning a hierarchy of parts. In CVPR'07, 2007.
[ bib ]
This paper proposes a novel approach to constructing a hierarchical representation of visual input that aims to enable recognition and detection of a large number of object categories. Inspired by the principles of efficient indexing (bottom-up,), robust matching (top-down,), and ideas of compositionality, our approach learns a hierarchy of spatially flexible compositions, i.e. parts, in an unsupervised, statistics-driven manner. Starting with simple, frequent features, we learn the statistically most significant compositions (parts composed of parts), which consequently define the next layer. Parts are learned sequentially, layer after layer, optimally adjusting to the visual data. Lower layers are learned in a category-independent way to obtain complex, yet sharable visual building blocks, which is a crucial step towards a scalable representation. Higher layers of the hierarchy, on the other hand, are constructed by using specific categories, achieving a category representation with a small number of highly generalizable parts that gained their structural flexibility through composition within the hierarchy. Built in this way, new categories can be efficiently and continuously added to the system by adding a small number of parts only in the higher layers. The approach is demonstrated on a large collection of images and a variety of object categories. Detection results confirm the effectiveness and robustness of the learned parts.
[4] K. Mikolajczyk, B. Leibe, and B. Schiele. Multiple object class detection with a generative mode. In Proceedings of International Conference on Computer Vision and Pattern Recognition 2006, New York, USA, June 2006.
[ bib | .pdf ]
In this paper we propose an approach capable of simultaneous recognition and localization of multiple object classes using a generative model. We propose a novel hierarchical representation which allows to represent individual images as well as various objects classes in a single similarity invariant model. The recognition method is based on a codebook representation where appearance clusters built from edge based features are shared among several object classes. A probabilistic model based on Bayesian rules allows for reliable detection of various objects in the same image. The approach is very efficient due to applied fast clustering and matching method capable of dealing with millions of high dimensional features. The system shows an excellent performance on several object categories in wide range of scales, in-plane rotations, background clutter, and occlusion. The performance is comparable with state of the art approaches dedicated to single object classes.
[5] S. Fidler, G. Berginc, and A. Leonardis. Hierarchical statistical learning of generic parts of object structure. In CVPR, pages 182-189, 2006.
[ bib ]
With the growing interest in object categorization various methods have emerged that perform well in this challenging task, yet are inherently limited to only a moderate number of object classes. In pursuit of a more general categorization system this paper proposes a way to overcome the computational complexity encompassing the enormous number of different object categories by exploiting the statistical properties of the highly structured visual world. Our approach proposes a hierarchical acquisition of generic parts of object structure, varying from simple to more complex ones, which stem from the favorable statistics of natural images. The parts recovered in the individual layers of the hierarchy can be used in a top-down manner resulting in a robust statistical engine that could be efficiently used within many of the current categorization systems. The proposed approach has been applied to large image datasets yielding important statistical insights into the generic parts of object structure.
[6] M. Fritz, B. Leibe, B. Caputo, and B. Schiele. Integrating representative and discriminant models for object category detection. In Proceedings of International Conference on Computer Vision 2005, Beijing, China, October 2005.
[ bib | .pdf ]
Category detection is a lively area of research. While categorization algorithms tend to agree in using local descriptors, they differ in the choice of the classifier, with some using generative models and others discriminative approaches. This paper presents a method for object category detection which integrates a generative model with a discriminative classifier. For each object category, we generate an appearance codebook, which becomes a common vocabulary for the generative and discriminative methods. Given a query image, the generative part of the algorithm finds a set of hypotheses and estimates their support in location and scale. Then, the discriminative part verifies each hypothesis on the same codebook activations. The new algorithm exploits the strengths of both original methods, minimizing their weaknesses. Experiments on several databases show that our new approach performs better than its building blocks taken separately. Moreover, experiments on two challenging multi-scale databases show that our new algorithm outperforms previously reported results.
[7] K. Mikolajczyk, B. Leibe, and B. Schiele. Local features for object class recognition. In Proceedings of International Conference on Computer Vision 2005, Beijing, China, October 2005.
[ bib | .pdf ]
In this paper we compare the performance of local detectors and descriptors in the context of object class recognition. Recently, many detectors / descriptors have been evaluated in the context of matching as well as invariance to viewpoint changes [Mikolajczyk,IJCV04]. However, it is unclear if these results can be generalized to categorization problems, which require different properties of features. We evaluate 5 state-of-the-art scale invariant region detectors and 5 descriptors. Local features are computed for 20 object classes and clustered using hierarchical agglomerative clustering. We measure the quality of appearance clusters and location distributions using entropy as well as precision. We also measure how the clusters generalize from training set to novel test data. Our results indicate that extended SIFT descriptors [Mikolajczyk,TR04a] computed on Hessian-Laplace [Mikolajczyk,IJCV04] regions perform best. Second score is obtained by Salient regions [Kadir,IJCV01]. The results also show that these two detectors provide complementary features. The evaluation is validated with a recognition approach on pedestrian database.
[8] B. Leibe, A. Leonardis, and B. Schiele. Robust object detection by interleaving categorization and segmentation. International Journal of Computer Vision, 2005.
[ bib | .pdf ]
This paper presents a new method for visual object categorization, i.e. for recognizing previously unseen objects, localizing them in cluttered images, and assigning the correct category label. It considers object categorization and figure-ground segmentation as two interleaved processes that closely collaborate towards a common goal. As shown in our work, the tight coupling between those two processes allows them to profit from each other and improve the combined performance. The core part of our work is a highly flexible learned representation for object shape that can combine the information observed on different training examples in a probabilistic extension of the Generalized Hough Transform. The resulting approach can detect categorical objects in novel images and automatically infer a probabilistic segmentation from the recognition result. This segmentation is then used to again improve recognition by allowing the system to focus its efforts on object pixels and discard misleading influences from the background. Moreover, the information from where in the image a hypothesis draws its support is used in an MDL based hypothesis verification stage to resolve ambiguities between overlapping hypotheses and factor out the effects of partial occlusion. An extensive evaluation on several large data sets shows that the proposed system is applicable to a range of different object categories, including both rigid and articulated objects. In addition, its flexible representation allows it to achieve competitive object detection performance already from training sets that are between one and two orders of magnitude smaller than those used in comparable systems.
[9] Bastian Leibe, Edgar Seemann, and Bernt Schiele. Pedestrian detection in crowded scenes. In CVPR '05: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1, pages 878-885, Washington, DC, USA, 2005. IEEE Computer Society.
[ bib | .pdf ]
In this paper, we address the problem of detecting pedestrians in crowded real-world scenes with severe overlaps. Our basic premise is that this problem is too difficult for any type of model or feature alone. Instead, we present a novel algorithm that integrates evidence in multiple iterations and from different sources. The core part of our method is the combination of local and global cues via a probabilistic top-down segmentation. Altogether, this approach allows to examine and compare object hypotheses with high precision down to the pixel level. Qualitative and quantitative results on a large data set confirm that our method is able to reliably detect pedestrians in crowded scenes, even when they overlap and partially occlude each other. In addition, the flexible nature of our approach allows it to operate on very small training sets.

This file has been generated by bibtex2html 1.79

Print this page

 

Last modified: 9.1.2009 18:05:58