<?xml version="1.0" encoding="ISO-8859-1"?>
<rss version="2.0">
<channel>
<title>IEEE Transactions on Pattern Analysis and Machine Intelligence</title>
<link>http://www.computer.org/tpami</link>
<description>The IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) is published monthly. Its Editorial Board strives to publish papers that present important research results within PAMI's scope. These include statistical and structural pattern recognition; image analysis; computational models of vision; computer vision systems; enhancement, restoration, segmentation, feature extraction, shape and texture analysis; applications of pattern analysis in medicine, industry, government, and the arts and sciences; artificial intelligence, knowledge representation, logical and probabilistic inference, learning, speech recognition, character and text recognition, syntactic and semantic processing, understanding natural language, expert systems, and specialized architectures for such processing.	</description>
	<language>en-us</language>
	<pubDate>Wed, 4 Jan 2012 11:00:01 GMT</pubDate>
	<image>
		<url>http://csdl.computer.org/common/images/logos/tpami.gif</url>
		<title>IEEE Computer Society</title>
		<description>List of recently published journal articles</description>
		<link>http://www.computer.org/tpami</link>
	</image>
  <item>
     <title>PrePrint: RASL: Robust Alignment by Sparse and Low-rank Decomposition for Linearly Correlated Images</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.282</link>
     <description>This paper studies the problem of simultaneously aligning a batch of linearly correlated images despite gross corruption (such as occlusion). Our method seeks an optimal set of image domain transformations such that the matrix of transformed images can be decomposed as the sum of a sparse matrix of errors and a low-rank matrix of recovered aligned images. We reduce this extremely challenging optimization problem to a sequence of convex programs that minimize the sum of l1-norm and nuclear norm of the two component matrices, which can be efficiently solved by fast and scalable convex optimization techniques. We verify the efficacy of the proposed robust alignment algorithm with extensive experiments on both controlled and uncontrolled real data, demonstrating higher accuracy and efficiency than existing methods over a wide range of realistic misalignments and corruptions.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.282</guid>
  </item>
  <item>
     <title>PrePrint: Ensemble Segmentation Using Efficient Integer Linear Programming</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.280</link>
     <description>We present a method for combining several segmentations of an image into a single one, that is in some sense is the average segmentation, in order to achieve a more reliable and accurate segmentation result. The goal is to find a point in the "space of segmentations" which is close to all the individual segmentations. We present an algorithm for segmentation averaging. The image is first over-segmented into superpixels. Next, each segmentation is projected onto the superpixel map. An instance of the EM algorithm combined with integer linear programming is applied on the set of binary merging decisions of neighboring superpixels to obtain the average segmentation. Apart from segmentation averaging, the algorithm also reports the reliability of each segmentation. The performance of the proposed algorithm is demonstrated on manually annotated images from the Berkeley segmentation dataset and on the results of automatic segmentation algorithms.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.280</guid>
  </item>
  <item>
     <title>PrePrint: Pushing the Envelope of Modern Methods for Bundle Adjustment</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.256</link>
     <description>In this paper, we present results and experiments with several methods for bundle adjustment, producing the fastest bundle adjuster ever published in terms of computation and convergence. From a computational perspective, the fastest methods naturally handle the block-sparse pattern that arises in a reduced camera system. Adapting to the naturally arising block-sparsity allows the use of BLAS3, efficient memory handling, fast variable ordering, and customized sparse solving all simultaneously. We present two methods; one uses exact minimum degree ordering and block-based LDL solving, and the other uses block-based preconditioned conjugate gradients. Both methods are performed on the reduced camera system. We show experimentally that the adaptation to the natural block sparsity allows both of these methods to perform better than previous methods. Further improvements in convergence speed are achieved by the novel use of embedded point iterations. Embedded point iterations take place inside each camera update step yielding a greater cost decrease from each camera update step and, consequently, a lower minimum. This is especially true for points projecting far out on the flatter region of the robustifier. Intensive analyses from various angles demonstrate the improved performance of the presented bundler.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.256</guid>
  </item>
  <item>
     <title>PrePrint: Intrinsic Dimensionality Predicts the Saliency of Natural Dynamic Scenes</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.198</link>
     <description>Since visual attention-based computer vision applications have gained popularity, ever more complex, biologically-inspired models seem to be needed to predict salient locations (or interest points) in naturalistic scenes. In this paper, we explore how far one can go in predicting eye movements by using only basic signal processing, such as image representations derived from efficient coding principles, and machine learning. To this end, we gradually increase the complexity of a model from simple single-scale saliency maps computed on grayscale videos to spatio-temporal multiscale and multispectral representations. Using a large collection of eye movements on high-resolution videos, supervised learning techniques fine-tune the free parameters whose addition is inevitable with increasing complexity. The proposed model, although very simple, demonstrates significant improvement in predicting salient locations in naturalistic videos over four selected baseline models and two distinct data labelling scenarios.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.198</guid>
  </item>
  <item>
     <title>PrePrint: A Probabilistic Approach to Pattern Matching in the Continuous Domain</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.284</link>
     <description>The goal of this paper is to solve the following basic problem: given discrete noisy samples from a continuous signal, compute the probability distribution of its distance from a fixed template. As opposed to the typical restoration problem, which considers a single optimal signal, the computation of the entire probability distribution necessitates integrating over the entire signal space. To achieve this, we apply path integration techniques. The problem is studied in one and two dimension, and an accurate solution as well as an efficient approximation scheme are provided.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.284</guid>
  </item>
  <item>
     <title>PrePrint: Face Recognition using Sparse Approximated Nearest Points between Image Sets</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.283</link>
     <description>We propose an efficient and robust solution for image set classification. A joint representation of an image set is proposed which includes the image samples of the set and their affine hull model. The model accounts for unseen appearances in the form of affine combinations of sample images. To calculate the between-set distance, we introduce the Sparse Approximated Nearest Point (SANP). SANPs are the nearest points of two image sets such that each point can be sparsely approximated by the image samples of its respective set. This novel sparse formulation enforces sparsity on the sample coefficients and jointly optimizes the nearest points as well as their sparse approximations. Unlike standard sparse coding, the data to be sparsely approximated is not fixed. A convex formulation is proposed to find the optimal SANPs between two sets and the accelerated proximal gradient method is adapted to efficiently solve this optimization. We also derive the kernel extension of the SANP and propose an algorithm for dynamically tuning the RBF kernel parameter while matching each pair of image sets. Comprehensive experiments on the UCSD/Honda, CMU MoBo and Youtube Celebrities face datasets show that our method consistently outperforms the state-of-the-art.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.283</guid>
  </item>
  <item>
     <title>PrePrint: Learning Optimal Embedded Cascades</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.281</link>
     <description>The problem of automatic and optimal design of embedded object detector cascades is considered. Two main challenges are identified: optimization of the cascade configuration, and optimization of individual cascade stages, so as to achieve the best trade-off between classification accuracy and speed, under a detection rate constraint. Two novel boosting algorithms are proposed to addressed these problems. The first, RCBoost, formulates boosting as a constrained optimization problem, which is solved with a barrier penalty method. The constraint is the target detection rate, which is met at all iterations of the boosting process. This enables the design of embedded cascades of known configuration without extensive cross-validation or heuristics. The second, ECBoost, searches over cascade configurations, to achieve the optimal trade-off between classification risk and speed. The two algorithms are combined into an overall boosting procedure, RCECBoost, which optimizes both the cascade configuration and its stages under a detection rate constraint, in a fully automated manner. Extensive experiments in face, car, pedestrian, and panda detection show that the resulting detectors achieve an accuracy vs. speed trade-off superior to those of previous methods.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.281</guid>
  </item>
  <item>
     <title>PrePrint: Beyond Novelty Detection: Incongruent Events, when General and Specific Classifiers Disagree</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.279</link>
     <description>Unexpected stimuli are a challenge to any machine learning algorithm. Here we identify distinct types of unexpected events, when general level and specific level classifiers give conflicting predictions. We define a formal framework for the representation and processing of incongruent events: starting from the notion of label hierarchy, we show how partial order on labels can be deduced from such hierarchies. For each event, we compute its probability in different ways, based on adjacent levels in the label hierarchy. An incongruent event is an event where the probability computed based on some more specific level is much smaller than the probability computed based on some more general level, leading to conflicting predictions. Algorithms are derived to detect incongruent events from different types of hierarchies, different applications and a variety of data types. We present promising results for the detection of novel visual and audio objects, and new patterns of motion in video. We also discuss the detection of Out Of Vocabulary words in speech recognition, and the detection of incongruent events in a multi modal audio-visual scenario.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.279</guid>
  </item>
  <item>
     <title>PrePrint: Multidimensional Scaling for Matching Low-resolution Face Images</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.278</link>
     <description>Face recognition performance degrades considerably when the input images are of low resolution as is often the case for images taken by surveillance cameras or from a large distance. In this paper, we propose a novel approach for matching low resolution probe images with higher resolution gallery images, which are often available during enrollment, using multidimensional scaling. The ideal scenario is when both the probe and gallery images are of high enough resolution to discriminate across different subjects. The proposed method simultaneously embeds the low resolution probe images and the high resolution gallery images in a common space such that the distances between them in the transformed space approximates the distances had both the images been of high resolution. The two mappings are learned simultaneously from high resolution training images using iterative majorization algorithm. Extensive evaluation on the Multi-PIE dataset illustrates the usefulness of the method. We show that the proposed approach improves the matching performance significantly as compared to performing matching in the low-resolution domain or using super-resolution techniques to obtain a higher-resolution test image prior to recognition. Experiments on low resolution surveillance images from Surveillance Cameras Face Database further highlight the effectiveness of the approach.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.278</guid>
  </item>
  <item>
     <title>PrePrint: Rotationally Invariant Descriptors using Intensity Order Pooling</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.277</link>
     <description>This paper proposes a novel method for interest region description, which pools local features based on their intensity orders in multiple support regions. Pooling by intensity orders is not only invariant to rotation and monotonic intensity changes, but also encodes ordinal information into descriptor. Two kinds of local features are used in this paper, one based on gradients and the other on intensities, hence two descriptors are obtained: MROGH and MRRID. Thanks to the intensity order pooling scheme, the two descriptors are rotation invariant without estimating a reference orientation, which appears to be a major error source for the most of existing methods, such as SIFT, SURF and DAISY. Promising experimental results on image matching and object recognition demonstrate the effectiveness of the proposed descriptors compared to state-of-the-art descriptors.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.277</guid>
  </item>
  <item>
     <title>PrePrint: Exploring Tiny Images: The Roles of Appearance and Contextual Information for Machine and Human Object Recognition</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.276</link>
     <description>Typically, object recognition is performed based solely on the appearance of the object. However, relevant information also exists in the scene surrounding the object. In this paper, we explore the roles that appearance and contextual information play in object recognition. First, through machine experiments and human studies, we show that the importance of contextual information varies with the quality of the appearance information, such as an image's resolution. Our machine experiments explicitly model context between object categories through the use of relative location and relative scale, in addition to co-occurrence. With the use of our context model, our algorithm achieves state-of-the-art performance on the MSRC and Corel datasets. We perform recognition tests, for machines and human subjects, on low and high resolution images, which vary significantly in the amount of appearance information present, using just the object appearance information, the combination of appearance and context, as well as just context without object appearance information (blind recognition). We also explore the impact of the different sources of context (co-occurrence, relative-location and relative-scale). We find that the importance of different types of contextual information varies significantly across datasets.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.276</guid>
  </item>
  <item>
     <title>PrePrint: Gender and Ethnicity Specific Generic Elastic Models from a Single 2D Image for Novel 2D Pose Face Synthesis and Recognition</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.275</link>
     <description>In this paper, we propose a novel method for generating a realistic 3D human face from a single 2D face image for the purpose of synthesizing new 2D face images at arbitrary poses using gender and ethnicity specific models. We employ the Generic Elastic Model (GEM) approach, which elastically deforms a generic 3D depth-map based on the sparse observations of an input face image in order to estimate the depth of the face image. Particularly, we show that gender and ethnicity specific GEMs can approximate the 3D shape of the input face image more accurately, achieving a better generalization of 3D face modeling and reconstruction compared to the original GEM approach. We qualitatively validate our method using real-world celebrity images downloaded from the web, by showing each reconstructed 3D shape generated from a single image and new synthesized poses of the same person at arbitrary angles. For quantitative comparisons, we compare our synthesized results against 3D scanned data and also perform face recognition using synthesized images generated from a single enrollment frontal image. We obtain promising results for handling pose and expression changes based on the proposed method.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.275</guid>
  </item>
  <item>
     <title>PrePrint: SAR Image Segmentation Based on Level Set Spproach and G_A^0 Model</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.274</link>
     <description>This paper proposes an image segmentation method for synthetic aperture radar (SAR), exploring statistical properties of SAR data to characterize image regions. We consider GA0 distribution parameters for SAR image segmentation, combined to the level set framework. The GA0 distribution belongs to a class of G distributions that have been successfully used to model different regions in amplitude SAR images for data modelling purpose. Such statistical data model is fundamental to derive the energy functional to perform region mapping, which is input to our level set propagation numerical scheme that splits SAR images into homogeneous, heterogeneous and extremely heterogeneous regions. Moreover, we introduce an assessment procedure based on stochastic distance and GA0 model to quantify the robustness and accuracy of our approach. Our results demonstrate the accuracy of the algorithms regarding experiments on synthetic and real SAR data.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.274</guid>
  </item>
  <item>
     <title>PrePrint: Cross-Domain Multi-Cue Fusion for Concept-Based Video Indexing</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.273</link>
     <description>The success of query-by-concept, recently proposed to cater to video retrieval needs, depends on the accuracy of concept-based video indexing. Unfortunately, it remains a challenge to recognize the presence of concepts in a video segment or extract an objective linguistic description from it because of the semantic gap, that is, the lack of correspondence between low-level features and high-level interpretation. This paper studies three issues with the aim to reduce such a gap: (1) how to explore cues beyond low-level features, (2) how to combine diverse cues to improve performance, and (3) how to utilize the learned knowledge when applying it to a new domain. To solve these problems, we propose a framework that jointly exploits multiple cues across multiple video domains. First, a recursive algorithm is proposed to learn both inter-concept and inter-shot relationships from annotations. Second, all concept labels for all shots are simultaneously refined in a single fusion model. Additionally, unseen shots are assigned pseudo-labels according to their initial scores so that relationships can be learned. Integration of cues embedded within training and testing video sets accommodates domain change. Experiments on popular concept detection benchmarks show that our framework is effective, achieving significant improvement over popular baselines.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.273</guid>
  </item>
  <item>
     <title>PrePrint: Context-Aware Saliency Detection</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.272</link>
     <description>We propose a new type of saliency -- context-aware saliency -- which aims at detecting the image regions that represent the scene. This definition differs from previous definitions whose goal is to either identify fixation points or detect the dominant object. In accordance with our saliency definition, we present a detection algorithm which is based on four principles observed in the psychological literature. The benefits of the proposed approach are evaluated in two applications where the context of the dominant objects is just as essential as the objects themselves. In image retargeting we demonstrate that using our saliency prevents distortions in the important regions. In summarization we show that our saliency helps to produce compact, appealing, and informative summaries.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.272</guid>
  </item>
  <item>
     <title>PrePrint: Detachable Object Detection: Segmentation and Depth Ordering from Short-baseline Video</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.271</link>
     <description>We describe an approach for segmenting an image into regions that correspond to surfaces in the scene that are surrounded by the medium. It integrates both appearance and motion statistics into a cost functional, that is seeded with occluded regions and minimized by efficiently solving a linear programming problem. Where a short observation time is insufficient to determine whether the object is "detached,'' the results of the minimization can be used to seed a more costly optimization based on a longer sequence of video data. The result is an entirely unsupervised object detection and segmentation scheme that we test empirically to highlight the potential, as well as limitations, of our approach.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.271</guid>
  </item>
  <item>
     <title>PrePrint: Online Kernel Principal Component Analysis: A Reduced-order Model</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.270</link>
     <description>Kernel principal component analysis (kernel-PCA) is an elegant nonlinear extension of one of the mostly used data analysis and dimensionality reduction techniques, the principal component analysis. In this paper, we propose an online algorithm for kernel-PCA. To this end, we examine a kernel-based version of Oja's rule, initially put forward to extract a linear principal axe. As with most kernel-based machines, the model order equals the number of available observations. To provide an online scheme, we propose to control the model order. We discuss theoretical results, such as an upper bound on the error of approximating the principal functions with the reduced-order model. We derive a recursive algorithm to discover the first principal axe, and extend it to multiple axes. Experimental results demonstrate the effectiveness of the proposed approach, both on synthetic dataset and on images of handwritten digits, with comparison to classical kernel-PCA and iterative kernel-PCA.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.270</guid>
  </item>
  <item>
     <title>PrePrint: Unsupervised Learning of Categorical Segments in Image Collections</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.268</link>
     <description>Which one comes first: segmentation or recognition? We propose a unified framework for carrying out the two simultaneously and without supervision. The framework combines a flexible probabilistic model, for representing the shape and appearance of each segment, with the popular "bag of visual words'' model for recognition. If applied to a collection of images, our framework can simultaneously discover the segments of each image, and the correspondence between such segments, without supervision. Such recurring segments may be thought of as the 'parts' of corresponding objects that appear multiple times in the image collection. Thus, the model may be used for learning new categories, detecting/classifying objects, and segmenting images, without using expensive human annotation.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.268</guid>
  </item>
  <item>
     <title>PrePrint: Detecting Curves with Unknown Endpoints and Arbitrary Topology Using Minimal Paths</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.267</link>
     <description>Existing state of the art minimal path techniques work well to extract simple open curves in images when both endpoints of the curve are given as user input or when one input is given and the total length of the curve is known in advance. Curves which branch require even further prior input from the user, namely each branch endpoint. In this work, we present a novel minimal path based algorithm which works on much more general curve topologies with far fewer demands on the user for initial input compared to prior minimal path based algorithms. The two key novelties and benefits of this new approach are that (1) it may be used to detect both open and closed curves, including more complex topologies containing both multiple branch points and multiple closed cycles without requiring a priori knowledge about which of these types is to be extracted and (2) it requires only a single input point which, in contrast to existing methods, is no longer constrained to be an end point of the desired curve but may in fact by ANY point along the desired curve (even an internal point). We perform quantitative evaluation of the algorithm on 48 images (44 pavement crack images, 1 catheter tube image, and 3 retinal images) against human supplied ground truth. The results demonstrate that the algorithm is indeed able to extract curve-like objects accurately from images with far less prior knowledge and less user interaction compared to existing state of the art minimal path based image processing algorithms. In future, the algorithm can be applied to other 2D curve like objects and it can be extended to detect 3D curves.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.267</guid>
  </item>
  <item>
     <title>PrePrint: Combining Scale-Space and Similarity-Based Aspect Graphs for Fast 3D Object Recognition</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.266</link>
     <description>This paper describes an approach for recognizing instances of a 3D object in a single camera image and for determining their 3D poses. A hierarchical model is generated solely based on the geometry information of a 3D CAD model of the object. The approach does not rely on texture or reflectance information of the object's surface, making it useful for a wide range of industrial and robotic applications, e.g., bin-picking. A hierarchical view-based approach that addresses typical problems of previous methods is applied: It handles true perspective, is robust to noise, occlusions, and clutter to an extent that is sufficient for many practical applications, and is invariant to contrast changes. For the generation of this hierarchical model, a new model image generation technique by which scale-space effects can be taken into account is presented. The necessary object views are derived using a similarity-based aspect graph. The high robustness of an exhaustive search is combined with an efficient hierarchical search. The 3D pose is refined by minimizing geometric distances in the image, yielding a position accuracy of up to 0.12% with respect to the object distance, and an orientation accuracy of up to 0.35 degrees in our tests. Typical run-times are in the range of a few 100ms.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.266</guid>
  </item>
  <item>
     <title>PrePrint: Visual Event Recognition in Videos by Learning from Web Data</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.265</link>
     <description>We propose a visual event recognition framework for consumer video domain videos by leveraging a large amount of loosely labeled web videos (e.g., from YouTube). At first, we propose a new aligned space-time pyramid matching method to measure the distances between two video clips. Second, we propose a new cross-domain learning scheme referred to as Adaptive Multiple Kernel Learning (A-MKL) in order to 1) fuse the information from multiple pyramid levels and features (i.e., space-time feature and static SIFT feature) and 2) cope with the considerable variation in feature distributions between videos from two domains (i.e., web video domain and consumer video domain). We train a set of SVM classifiers based on the combined training set from two domains by using multiple base kernels from different kernel types and parameters, which are fused with equal weights to obtain an average classifier. In A-MKL, for each event class we learn an adapted target classifier based on multiple base kernels and the prelearned average classifiers by minimizing both the structural risk functional and the mismatch between data distributions of two domains. Extensive experiments demonstrate the effectiveness of our proposed framework that requires only a small number of labeled consumer videos by leveraging web data.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.265</guid>
  </item>
  <item>
     <title>PrePrint: Handwritten Chinese Text Recognition by Integrating Multiple Contexts</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.264</link>
     <description>This paper presents an effective approach for the offline recognition of unconstrained handwritten Chinese texts. Under the general integrated segmentation-and-recognition framework with character over-segmentation, we investigated into three important issues: candidate path evaluation, path search, and parameter estimation. For path evaluation, we combine multiple contexts (character recognition scores, geometric and linguistic contexts) from the Bayesian decision view, and convert the classifier outputs to posterior probabilities via confidence transformation. In path search, we use a refined beam search algorithm to improve the search efficiency, and meanwhile, use a candidate character augmentation strategy to improve the recognition accuracy. The combining weights of the path evaluation function are optimized by supervised learning using a Maximum Character Accuracy criterion. We evaluated the recognition performance on a Chinese handwriting database CASIA-HWDB, which contains nearly four million character samples of 7,356 classes and 5,091 pages of unconstrained handwritten texts. The experimental results show that confidence transformation and combining multiple contexts improve the text line recognition performance significantly. On a test set of 1,015 handwritten pages, the proposed approach achieved character-level accurate rate of 90.75 percent and correct rate of 91.39 percent, which are superior by far to the best results reported in the literature.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.264</guid>
  </item>
  <item>
     <title>PrePrint: Fast Rotation Invariant 3D Feature Computation utilizing Efficient Local Neighborhood Operators</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.263</link>
     <description>We present a method for densely computing local rotation invariant image descriptors in volumetric images. The descriptors are based on a transformation to the harmonic domain, which we compute very efficiently via differential operators. We show that this fast voxel-wise computation is restricted to a family of basis functions that have certain differential relationships. Building upon this finding, we propose local descriptors based on the Gaussian Laguerre and spherical Gabor basis functions and show how the coefficients can be computed efficiently by recursive differentiation. We exemplarily demonstrate the effectiveness of such dense descriptors in a detection and classification task on biological 3D images. In a direct comparison to existing volumetric features, among them 3D-SIFT, our descriptors reveal superior performance.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.263</guid>
  </item>
  <item>
     <title>PrePrint: A Tangent Bundle Theory for Visual Curve Completion</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.262</link>
     <description>Visual curve completion is a fundamental perceptual mechanism that completes the missing parts (e.g., due to occlusion) between observed contour fragments. Previous research into the shape of completed curves has generally followed an "axiomatic" approach, where desired perceptual/geometrical properties are first defined as axioms, followed by mathematical investigation into curves that satisfy them. However, determining psychophysically such desired properties is difficult and researchers still debate what they should be in the first place. Instead, here we exploit the observation that curve completion is an early visual process to formalize the problem in the unit tangent bundle R\sup 2 S\sup 1, which abstracts the primary visual cortex (V1) and facilitates basic principles from which perceptual properties are later \it derived \it rather than imposed. Exploring here the elementary principle of \it least action \it in V1, we show how the problem becomes one of finding minimum-length admissible curves in R\sup 2&amp;#x00D7; \it S\sup 1. We formalize the problem in variational terms, we analyze it theoretically, and we formulate practical algorithms for the reconstruction of these completed curves. We then explore their induced visual properties vis-a-vis popular perceptual axioms and show how our theory predicts many perceptual properties reported in the corresponding perceptual literature. Finally, we demonstrate variety of curve completions and report comparisons to psychophysical data and other completion models.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.262</guid>
  </item>
  <item>
     <title>PrePrint: Edge Structure Preserving 3-D Image Denoising By Local Surface Approximation</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.261</link>
     <description>In various applications, including magnetic resonance imaging (MRI) and functional MRI (fMRI), 3-D images get increasingly popular. To improve reliability of subsequent image analyses, 3-D image denoising is often a necessary pre-processing step, which is the focus of the current paper. In the literature, most existing image denoising procedures are for 2-D images. Their direct extensions to 3-D cases generally can not handle 3-D images efficiently, because the structure of a typical 3-D image is substantially more complicated than that of a typical 2-D image. For instance, edge locations are surfaces in 3-D cases, which would be much more challenging to handle, compared to edge curves in 2-D cases. We propose a novel 3-D image denoising procedure in this paper, based on local approximation of the edge surfaces using a set of surface templates. An important property of this method is that it can preserve edges and major edge structures (e.g., intersections of two edge surfaces and pointed corners). Numerical studies show that it works well in various applications.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.261</guid>
  </item>
  <item>
     <title>PrePrint: Human Identification Using Temporal Information Preserving Gait Template</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.260</link>
     <description>Gait Energy Image (GEI) is an efficient template for human identification by gait. However, such template loses temporal information in a gait sequence, which is critial to the performance of gait recognition. To address this issue, we develop a novel temporal template, named Chrono-Gait Image (CGI) in this paper. The proposed CGI template first extracts the contour in each gait frame, followed by encoding each of gait contour images in the same gait sequence with a multi-channel mapping function and compositing them to a single CGI. To make the templates robust to complex surrounding environment, we also propose CGI-based real and synthetic temporal information preserving templates by using different gait periods and contour distortion techniques. Extensive experiments on three benchmark gait databases indicate that, compared with the recently published gait recognition approaches, our CGI-based temporal information preserving approach achieves competitive performance in gait recognition with robustness and efficiency.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.260</guid>
  </item>
  <item>
     <title>PrePrint: Reflection Symmetry Integrated Image Segmentation</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.259</link>
     <description>This paper presents a new symmetry integrated region-based image segmentation method. The method is developed to obtain improved image segmentation by exploiting image symmetry. It is realized by constructing a symmetry token that can be flexibly embedded into segmentation cues. Interesting points are initially extracted from an image by the SIFT operator and they are further refined for detecting the global bilateral symmetry. A symmetry affinity matrix is then computed using the symmetry axis and it is used explicitly as a constraint in a region growing algorithm in order to refine the symmetry of the segmented regions. A multi-objective genetic search finds the segmentation result with the highest performance for both segmentation and symmetry, which is close to the global optimum. The method has been investigated experimentally in challenging natural images and images containing man-made objects. It is shown that the proposed method outperforms current segmentation methods both with and without exploiting symmetry. A thorough experimental analysis indicates that symmetry plays an important role as a segmentation cue, in conjunction with other attributes like color and texture.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.259</guid>
  </item>
  <item>
     <title>PrePrint: Learning Sparse Representations for Human Action Recognition</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.253</link>
     <description>This paper explores the effectiveness of sparse representations obtained by learning a set of overcomplete basis (dictionary) in the context of action recognition in videos. Although this work concentrates on recognizing human movements -- physical actions as well as facial expressions, the proposed approach is fairly general and can be used to address other classification problems. In order to model human actions three overcomplete dictionary learning frameworks are investigated. An overcomplete dictionary is constructed using a set of spatio-temporal descriptors (extracted from the video sequences) in such a way that each descriptor is represented by some linear combination of a small number of dictionary elements. This leads to a more compact and richer representation of the video sequences compared to the existing methods that involve clustering and vector quantization. For each framework, a novel classification algorithm is proposed. Additionally, this work also presents the idea of a new local spatio-temporal feature that is distinctive, scale invariant and fast to compute. The proposed approach repeatedly achieves state-of-the-art results on several public datasets containing various physical actions and facial expressions.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.253</guid>
  </item>
  <item>
     <title>IEEE Transactions on Pattern Analysis and Machine Intelligence - February 2012 (Vol. 34, No. 2)</title>
     <link>http://opac.ieeecomputersociety.org/opac?year=2012&amp;volume=34&amp;issue=02&amp;acronym=tpami</link>
     <description>IEEE Transactions on Pattern Analysis and Machine Intelligence</description>
     <guid isPermaLink="true">http://www.computer.org/portal/site/tpami/</guid>
  </item>
  <item>
     <title>PrePrint: A Closed-Form Solution to Tensor Voting: Theory and Applications</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.250</link>
     <description>We prove a closed-form solution to tensor voting (CFTV): given a point set in any dimensions, our closed-form solution provides an exact, continuous and efficient algorithm for computing a structure-aware tensor that simultaneously achieves salient structure detection and outlier attenuation. In addition, structure propagation can also be achieved using this closed-form solution. In this paper, we apply CFTV in two related contributions. First, we embed structure-aware tensor into expectation maximization (EM) for optimizing a single linear structure to achieve efficient and robust parameter estimation. Specifically, our EMTV algorithm optimizes both the tensor and fitting parameters and does not require random sampling consensus typically used in existing robust statistical techniques. Although EMTV is a theoretical contribution, we performed quantitative evaluation on its accuracy and robustness, showing that EMTV performs better than the original TV and other state-of-the-art techniques in fundamental matrix estimation for multiview stereo matching. Second, we demonstrate how CFTV and EMTV benefit multiview stereo reconstruction beyond matching. Our tensor based multiview stereo (TMVS) combines the complementary advantages of photoconsistency, visibility and geometric consistency enforcement in multiview stereo via the use of 3D structure-aware tensors, where CFTV provides a unified means to manipulate geometric information in the entire match-propagate-filter multiview stereo pipeline.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.250</guid>
  </item>
  <item>
     <title>PrePrint: Maximum Likelihood Estimation of Depth Maps using Photometric Stereo</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.249</link>
     <description>Photometric stereo and depth-map estimation provide a way to construct a depth map from image sets of an object under one viewpoint but with varying illumination directions. While estimating surface normals using the Lambertian model of reflectance is well-established, depth-map estimation methods are an ongoing field of research. Dealing with image noise is one such active topic. This paper introduces a maximum likelihood depth-map estimation technique using the zero-mean Gaussian model of image noise. The technique accounts for the propagation of noise through all steps of the reconstruction process. Based on this model, solving for maximum likelihood depth-map estimates involves executing an independent sequence of nonlinear regression estimates, one for each pixel, followed by a single large and sparse linear regression estimate. The linear system employs anisotropic weights, which arise naturally and differ in value to related work. Depth-map estimation remains efficient and fast, making the technique practical for realistic image sizes. Experiments using synthetic images demonstrate the technique's ability to robustly estimate depth maps under the noise model. Practical benefits of the method on challenging imaging scenarios are illustrated by experiments using the Extended Yale Face Database B and an extensive dataset of 500 reflected light microscopy image sets.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.249</guid>
  </item>
  <item>
     <title>PrePrint: Convergent Iterative Closest-Point Algorithm to Accomodate Anisotropic and Inhomogenous Localization Error</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.248</link>
     <description>The Iterative Closest Point (ICP) algorithm has become one of the most well-known methods for fine geometric alignment of 3D models that can be represented by point sets. It iteratively establishes point correspondences given the current alignment of the data and computes a rigid transform accordingly. From a statistical point of view, however, it implicitly assumes that the points are observed with isotropic Gaussian noise. In this paper, we show that this assumption may lead to errors and generalize the ICP such that it can account for anisotropic and inhomogenous localization errors in the input data. We (1) provide a formal description of the algorithm, (2) extend it to registration of partially overlapping surfaces, (3) prove its convergence, (4) derive the required covariance matrices for a set of selected applications, and (5) present means for optimizing the run-time of the algorithm. An evaluation on publically available surface meshes as well as on a set of meshes extracted from medical imaging data shows a dramatic increase in accuracy compared to the standard ICP, especially in the case of partial surface registration. As surface matching is a central component in various applications, the potential impact of the proposed method is extremely high.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.248</guid>
  </item>
  <item>
     <title>PrePrint: Multi-Stage Particle Windows for Fast and Accurate Object Detection</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.247</link>
     <description>The common paradigm employed for object detection is the sliding window (SW) search. This approach generates grid-distributed patches, at all possible positions and sizes, which are evaluated by a binary classifier: the trade-off between computational burden and detection accuracy is the real critical point of sliding windows; several methods have been proposed to speed up the search such as adding complementary features. We propose a paradigm that differs from any previous approach, since it casts object detection into a statistical-based search using a Monte Carlo sampling for estimating the likelihood density function with Gaussian kernels. The estimation relies on a multi-stage strategy where the proposal distribution is progressively refined by taking into account the feedback of the classifiers. The method can be easily plugged in a Bayesian-recursive framework to exploit the temporal coherency of the target objects in video sequences. Several tests on pedestrian and face detection, both on images and videos, with different types of classifiers (cascade of boosted classifiers, soft cascades and SVM) and features (covariance matrices, Haar-like features, integral channel features and histogram of oriented gradients demonstrate that the proposed method provides higher detection rates and accuracy as well as a lower computational burden w.r.t. sliding window detection.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.247</guid>
  </item>
  <item>
     <title>PrePrint: Incremental Activity Modelling in Multiple Disjoint Cameras</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.246</link>
     <description>Activity modelling and unusual event detection in a network of cameras is challenging particularly when the camera views are not overlapped. We show that it is possible to detect unusual events in multiple disjoint cameras as context-incoherent patterns, through incremental learning of time delayed dependencies between distributed local activities observed within and across camera views. Specifically, we model multi-camera activities using a Time Delayed Probabilistic Graphical Model (TD-PGM) with different nodes representing activities in different decomposed regions from different views and the directed links between nodes encoding their time delayed dependencies. To deal with visual context changes, we formulate a novel incremental learning method for modelling time delayed dependencies that change over time. We validate the effectiveness of the proposed approach using a synthetic dataset and videos captured from a camera network installed at a busy underground station.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.246</guid>
  </item>
  <item>
     <title>PrePrint: Differential Area Profiles: Decomposition Properties and Efficient Computation</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.245</link>
     <description>Differential area profiles (DAPs) are point-based multi-scale descriptors used in pattern analysis and image segmentation. They are defined through sets of size-based connected morphological filters that constitute a joint area opening top-hat and area closing bottom-hat scale-space of the input image. The work presented in this paper explores the properties of this image decomposition through sets of area zones. An area zone defines a single plane of the DAP vector field and contains all the peak components of the input image, whose size is between the zone's attribute extrema. Area zones can be computed efficiently from hierarchical image representation structures, in a way similar to regular attribute filters. Operations on the DAP vector field can then be computed without the need of exporting it first, and an example with the leveling-like convex/concave segmentation scheme is given. This is referred to as the one-pass method and it is demonstrated on the Max-Tree structure. Its computational performance is tested and compared against conventional means for computing differential profiles, relying on iterative application of area openings and closings. Applications making use of the area zone decomposition are demonstrated in problems related to remote sensing and medical image analysis.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.245</guid>
  </item>
  <item>
     <title>PrePrint: M-Idempotent and Self-Dual Morphological Filters</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.244</link>
     <description>In this paper, a comprehensive analysis of m-idempotent and self-dual morphological operators is presented. Conditions for self-duality of morphological operators are obtained by studying the kernel of morphological centers in the general framework of spatially-variant mathematical morphology. Necessary and sufficient conditions for the idempotence of morphological operators are characterized in terms of their kernel representation. We further introduce the notion of m-idempotence by generalization of idempotent operators (i.e. operators that converge after m iterations) and extend our results to the representation of the kernel of m-idempotent morphological operators. We finally rely on the conditions on the kernel representation derived and establish methods for the construction of m-idempotent and self-dual morphological operators.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.244</guid>
  </item>
  <item>
     <title>PrePrint: Density-Based Multi-Feature Background Subtraction with Support Vector Machine</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.243</link>
     <description>Background modeling and subtraction is a natural technique for object detection in videos captured by a static camera, and also a critical preprocessing step in various high level computer vision applications. However, there have not been many studies concerning useful features and binary segmentation algorithms for this problem. We propose a pixel-wise background modeling and subtraction technique using multiple features, where generative and discriminative techniques are combined for classification. In our algorithm, color, gradient and Haar-like features are integrated to handle spatio-temporal variations for each pixel. A pixel-wise generative background model is obtained for each feature efficiently and effectively by Kernel Density Approximation (KDA). Background subtraction is performed in a discriminative manner using a Support Vector Machine (SVM) over background likelihood vectors for a set of features. The proposed algorithm is robust to shadow, illumination changes, spatial variations of background.We compare the performance of the algorithm with other density-based methods using several different feature combinations and modeling techniques, both quantitatively and qualitatively.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.243</guid>
  </item>
  <item>
     <title>PrePrint: IntentSearch:Capturing User Intention for One-Click Internet Image Search</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.242</link>
     <description>Web-scale image search engines mostly rely on surrounding text features. It is difficult for them to interpret users' search intention only by query keywords and this leads to ambiguous and noisy search results. We propose a novel Internet image search approach. It only requires the user to click on one query image with the minimum effort and images from a pool retrieved by text-based search are re-ranked based on both visual and textual content. Our key contribution is to capture the users' search intention from this one-click query image in four steps. (1) The query image is categorized into one of the predefined adaptive weight categories. Inside each category, a specific weight schema is used to combine visual features adaptive to this kind of images to better re-rank the text-based search result. (2) Based on the visual content of the query image selected by the user, query keywords are expanded to capture user intention. (3) Expanded keywords are used to enlarge the image pool to contain more relevant images. (4) Expanded keywords are also used to expand the query image to multiple positive visual examples from which new query specific visual and textual similarity metrics are learned.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.242</guid>
  </item>
  <item>
     <title>PrePrint: Free Energy Score Spaces: Using Generative Information in Discriminative Classifiers</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.241</link>
     <description>A score function induced by a generative model of the data can provide a feature vector of a fixed dimension for each data sample. Data samples themselves may be of differing lengths, but as a score function is based on the properties of the data generation process, it produces a fixed-length vector in a highly informative space, typically referred to as a "score space''. Discriminative classifiers have been shown to achieve higher performance in appropriately chosen score spaces than is achievable by either the corresponding generative likelihood-based classifiers, or the discriminative classifiers using standard feature extractors. In this paper, we present a novel score space that exploits the free energy associated with a generative model. The resulting free energy score space (FESS) takes into account latent structure of the data at various levels, and can be trivially shown to lead to classification performance that at least matches the performance of the free energy classifier based on the same generative model, and the same factorization of the posterior. We also show that in several typical vision and computational biology applications the classifiers optimized in FESS outperform the corresponding pure generative approaches, as well as a number of previous approaches to combining discriminating and generative models.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.241</guid>
  </item>
  <item>
     <title>PrePrint: UBoost: Boosting with the Universum</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.240</link>
     <description>It has been shown that the Universum data, which do not belong to either class of the classification problem of interest, may contain useful prior domain knowledge for training a classifier [1], [2]. In this work, we design a novel boosting algorithm that takes advantage of the available Universum data, hence the name UBoost. UBoost is a boosting implementation of Vapnik's alternative capacity concept to the large margin approach. In addition to the standard regularization term, UBoost also controls the learned model's capacity by maximizing the number of observed contradictions. Our experiments demonstrate that UBoost can deliver improved classification accuracy over standard boosting algorithms that use labeled data alone.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.240</guid>
  </item>
  <item>
     <title>PrePrint: Tracking-Learning-Detection</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.239</link>
     <description>This paper investigates long-term tracking of unknown objects in a video stream. The object is defined by its location and extent in a single frame. In every frame that follows, the task is to determine the object's location and extent or indicate that the object is not present. We propose a novel tracking framework (TLD) that explicitly decomposes the long-term tracking task into tracking, learning and detection. The tracker follows the object from frame to frame. The detector localizes all appearances that have been observed so far and corrects the tracker if necessary. The learning estimates detector's errors and updates it to avoid these errors in the future. We study how to identify detector's errors and learn from them. We develop a novel learning method (P-N learning) which estimates the errors by a pair of "experts'': (i) P-expert estimates missed detections, and (ii) N-expert estimates false alarms. The learning process is modeled as a discrete dynamical system and the conditions under which the learning guarantees improvement are found. We describe our real-time implementation of the TLD framework and the P-N learning. We carry out an extensive quantitative evaluation which shows a significant improvement over state-of-the-art approaches.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.239</guid>
  </item>
  <item>
     <title>PrePrint: Bilinear Modelling via Augmented Lagrange Multipliers (BALM)</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.238</link>
     <description>This paper presents a unified approach to solve different bilinear factorization problems in computer vision in the presence of missing data in the measurements. The problem is formulated as a constrained optimization where one of the factors must lie on a specific manifold. To achieve this, we introduce an equivalent reformulation of the bilinear factorization problem that decouples the core bilinear aspect from the manifold specificity. We then tackle the resulting constrained optimization problem via Augmented Lagrange Multipliers. The strength and the novelty of our approach is that this framework can handle seamlessly different computer vision problems. The algorithm is such that only a projector onto the manifold constraint is needed.We present experiments and results for some popular factorization problems in computer vision such as rigid, non-rigid and articulated Structure from Motion; photometric stereo and 2D-3D non-rigid registration.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.238</guid>
  </item>
  <item>
     <title>PrePrint: Robust and Efficient Ridge-Based Palmprint Matching</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.237</link>
     <description>Most of the existing palmprint recognition systems are based on encoding and matching creases which are not as reliable as ridges. Recently, several ridge-based palmprint matching algorithms have been proposed to fill the gap. However, palmprints differ from fingerprints in several aspects: 1) palmprints are much larger and thus contain a large number of minutiae, 2) palms are more deformable than fingertips, and 3) the quality and discrimination power of different regions in palmprints vary significantly. As a result, these matchers are unable to appropriately handle the distortion and noise despite heavy computational cost. Motivated by the matching strategies of human palmprint experts, we developed a novel palmprint recognition system. The main contributions are as follows: 1) statistics of major features in palmprints are quantitatively studied; 2) a segment-based matching and fusion algorithm is proposed to deal with the skin distortion and the varying discrimination power of different palmprint regions; and 3) to reduce the computational complexity, an orientation field based registration algorithm is designed for registering the palmprints into the same coordinate system before matching and a cascade filter is built to reject the non-mated gallery palmprints in early stage. Experimental results show that the proposed matcher outperforms the existing matchers a lot both in matching accuracy and speed.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.237</guid>
  </item>
  <item>
     <title>PrePrint: Motion Detail Preserving Optical Flow Estimation</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.236</link>
     <description>A common problem of optical flow estimation in the multi-scale variational framework is that fine motion structures cannot always be correctly estimated, especially for regions with significant and abrupt displacement variation. A novel extended coarse-to-fine (EC2F) refinement framework is introduced in this paper to address this issue, which reduces the reliance of flow estimates on their initial values propagated from the coarse level and enables recovering many motion details in each scale. The contribution of this paper also includes adaption of the objective function to handle outliers and development of a new optimization procedure. The effectiveness of our algorithm is borne out by the Middlebury optical flow benchmark and by experiments on challenging examples that involve large-displacement motion.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.236</guid>
  </item>
  <item>
     <title>PrePrint: Aggregating Local Images Descriptors into Compact Codes</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.235</link>
     <description>This paper addresses the problem of large-scale image search. Three constraints have to be taken into account: search accuracy, efficiency, and memory usage. We first present and evaluate different ways of aggregating local image descriptors into a vector and show that the Fisher kernel achieves better performance than the reference bag-of-visual words approach for any given vector dimension. We then jointly optimize dimensionality reduction and indexing in order to obtain a precise vector comparison as well as a compact representation. The evaluation shows that the image representation can be reduced to a few dozen bytes while preserving high accuracy. Searching a 100 million image dataset takes about 250 ms on one processor core.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.235</guid>
  </item>
  <item>
     <title>PrePrint: HMM-based Lexicon-driven and Lexicon-free Word Recognition for Online Handwritten Indic Scripts</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.234</link>
     <description>Research for recognizing online handwritten words in Indic scripts is at its early stages when compared to Latin and Oriental scripts. In this paper, we address this problem specifically for two major Indic scripts -- Devanagari and Tamil. In contrast to previous approaches, the techniques we propose are largely data-driven and script-independent. We propose two different techniques for word recognition based on Hidden Markov Models (HMM): lexicon-driven and lexicon-free. The lexicon-driven technique models each word in the lexicon as a sequence of symbol HMMs according to a standard symbol writing order derived from the phonetic representation. The lexicon-free technique uses a novel Bag-of-Symbols representation of the handwritten word that is independent of symbol order, and allows rapid pruning of the lexicon. On handwritten Devanagari word samples featuring both standard and nonstandard symbol writing orders, a combination of lexicon-driven and lexicon-free recognizers significantly outperforms either of them used in isolation. In contrast, most Tamil word samples feature the standard symbol order, and the lexicon-driven recognizer outperforms the lexicon-free one as well as their combination. The best recognition accuracies obtained for 20,000 word lexicons are 87.13% for Devanagari when the two recognizers are combined, and 91.8% for Tamil using the lexicon-driven technique.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.234</guid>
  </item>
  <item>
     <title>PrePrint: Elastic Geodesic Paths in Shape Space of Parametrized Surfaces</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.233</link>
     <description>This paper presents a novel Riemannian framework for shape analysis of parameterized surfaces. In particular, it provides efficient algorithms for computing geodesic paths which, in turn, are important for comparing, matching, and deforming surfaces. The novelty of this framework is that geodesics are invariant to the parameterizations of surfaces and other shape-preserving transformations of surfaces. The basic idea is to formulate a space of embedded surfaces (surfaces seen as embeddings of a unit sphere in R3) and impose a Riemannian metric on it in such a way that the re-parameterization group acts on this space by isometries. Under this framework, we solve two optimization problems. One, given any two surfaces at arbitrary rotations and parameterizations, we use a path-straightening approach to find a geodesic path between them under the chosen metric. Second, by modifying a technique presented in [24], we solve for the optimal rotation and parameterization (registration) between surfaces. Their combined solution provides an efficient mechanism for computing geodesic paths in shape spaces of parameterized surfaces. We illustrate these ideas using examples from shape analysis of anatomical structures and other general surfaces.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.233</guid>
  </item>
  <item>
     <title>PrePrint: Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.232</link>
     <description>Scene understanding includes many related sub-tasks, such as scene categorization, depth estimation, object detection, etc. Each of these sub-tasks is often notoriously hard, and state-of-the-art classifiers already exist for many of them. These classifiers operate on the same raw image and provide correlated outputs. It is desirable to have an algorithm that can capture such correlation without requiring any changes to the inner workings of any classifier. We propose Feedback Enabled Cascaded Classification Models (FE-CCM), that jointly optimizes all the sub-tasks, while requiring only a 'black-box' interface to the original classifier for each sub-task. We use a two-layer cascade of classifiers, which are repeated instantiations of the original ones, with the output of the first layer fed into the second layer as input. Our training method involves a feedback step that allows later classifiers to provide earlier classifiers information about which error modes to focus on. We show that our method significantly improves performance in all the sub-tasks in the domain of scene understanding, where we consider depth estimation, scene categorization, event categorization, object detection, geometric labeling and saliency detection. Our method also improves performance in two robotic applications: an object-grasping robot and an object-finding robot.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.232</guid>
  </item>
  <item>
     <title>PrePrint: Constrained Parametric Min-Cuts for Automatic Object Segmentation</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.231</link>
     <description>We present a novel framework for creating and ranking plausible object hypotheses in an image using bottom-up generation processes and mid-level selection cues. The object hypotheses are represented as figure-ground segmentations, and are extracted automatically, without prior knowledge about the properties of individual object classes, by solving a sequence of constrained parametric min-cut problems (CPMC) on a regular image grid. In a subsequent step, we learn to rank the corresponding segments by training a continuous model to predict their plausibility (putative overlap with ground truth) based on their mid-level region properties, then diversify the estimated overlap using maximum marginal relevance measures. We show that this algorithm significantly outperforms the state of the art for low-level segmentation in the VOC 2009 and 2010 datasets. It achieves the same average best segmentation covering on VOC2009 as the best performing technique to date [1], 0.61 when using just the top 7 ranked segments, instead of the full hierarchy in [1]. Our method achieves 0.78 average best covering using 154 segments. An extended version of the basic algorithm achieves 83% average per class object recall, using 200 segments per image on the VOC2010 segmentation dataset.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.231</guid>
  </item>
  <item>
     <title>PrePrint: Polynomial Eigenvalue Solutions to Minimal Problems in Computer Vision</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.230</link>
     <description>We present a method for solving systems of polynomial equations appearing in computer vision. This method is based on polynomial eigenvalue solvers and is more straightforward and easier to implement than the state-of-the-art Gr&#246;bner basis method since eigenvalue problems are well studied, easy to understand, and efficient and robust algorithms for solving these problems are available. We provide a characterization of problems that can be efficiently solved as polynomial eigenvalue problems and present a resultant based method for transforming a system of polynomial equations to a polynomial eigenvalue problem. We propose techniques that can be used to reduce the size of the computed polynomial eigenvalue problems. To show the applicability of the proposed polynomial eigenvalue method, we present the polynomial eigenvalue solutions to several important minimal relative pose problems.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2011.230</guid>
  </item>
   </channel>
</rss>
