<?xml version="1.0" encoding="ISO-8859-1"?>
<rss version="2.0">
<channel>
<title>IEEE Transactions on Pattern Analysis and Machine Intelligence</title>
<link>http://www.computer.org/tpami</link>
<description>The IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) is published monthly. Its Editorial Board strives to publish papers that present important research results within PAMI's scope. These include statistical and structural pattern recognition; image analysis; computational models of vision; computer vision systems; enhancement, restoration, segmentation, feature extraction, shape and texture analysis; applications of pattern analysis in medicine, industry, government, and the arts and sciences; artificial intelligence, knowledge representation, logical and probabilistic inference, learning, speech recognition, character and text recognition, syntactic and semantic processing, understanding natural language, expert systems, and specialized architectures for such processing.	</description>
	<language>en-us</language>
	<pubDate>Sat, 18 May 2013 10:00:06 GMT</pubDate>
	<image>
		<url>http://csdl.computer.org/common/images/logos/tpami.gif</url>
		<title>IEEE Computer Society</title>
		<description>List of recently published journal articles</description>
		<link>http://www.computer.org/tpami</link>
	</image>
  <item>
     <title>PrePrint: Learning with Box Kernels</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.73</link>
     <description>Supervised examples and prior knowledge on regions of the input space have been profitably integrated in kernel machines to improve the performance of classifiers in different real-world contexts. The proposed solutions, which rely on the unified supervision of points and sets, have been mostly based on specific optimization schemes in which, as usual, the kernel function operates on points only. In this paper, arguments from variational calculus are used to support the choice of a special class of kernels, referred to as box kernels, which emerges directly from the choice of the kernel function associated with a regularization operator. It is proven that there is no need to search for kernels to incorporate the structure deriving from the supervision of regions of the input space, since the optimal kernel arises as a consequence of the chosen regularization operator. Although most of the given results hold for sets, we focus attention on boxes, whose labeling is associated with their propositional description. Based on different assumptions, some representer theorems are given which dictate the structure of the solution in terms of box kernel expansion. Successful results are given for problems of medical diagnosis, image, and text categorization.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.73</guid>
  </item>
  <item>
     <title>PrePrint: Latent Dirichlet Allocation Models for Image Classification</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.69</link>
     <description>Two new extensions of latent Dirichlet allocation (LDA), denoted topic-supervised LDA and class-specific-simplex LDA (css-LDA), are proposed for image classification. An analysis of the supervised LDA models currently used for this task shows that the impact of class information on the topics discovered by these models is very weak in general. This implies that the discovered topics are driven by general image regularities, rather than the semantic regularities of interest for classification. To address this, topic-supervised-LDA models are introduced, which replace the automated topic discovery of LDA with specified topics, identical to the classes of interest for classification. While this results in improvements in classification accuracy over existing LDA models, it compromises the ability of LDA to discover unanticipated structure of interest. This limitation is addressed by the introduction of css-LDA, an LDA model with class supervision at the level of image features. In css-LDA topics are discovered per class, i.e. a single set of topics shared across classes is replaced by multiple class-specific topic sets. The css-LDA model is shown to combine the labeling strength of topic-supervision with the flexibility of topic-discovery. Its effectiveness is demonstrated through an extensive experimental evaluation, involving multiple benchmark datasets, where it is shown to outperform existing LDA based image classification approaches.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.69</guid>
  </item>
  <item>
     <title>PrePrint: Automatic Generation of Co-Embeddings from Relational Data with Adaptive Shaping.</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.66</link>
     <description>In this paper, we study the co-embedding problem of how to map different types of patterns into one common low-dimensional space, given only the associations (relation values) between samples. We conduct a generic analysis to discover the commonalities between existing co-embedding algorithms and indirectly related approaches, and investigate possible factors controlling the shapes and distributions of the co-embeddings. The primary contribution of this work is a novel method for computing co-embeddings, termed the automatic co-embedding with adaptive shaping (ACAS) algorithm, based on an efficient transformation of the co-embedding problem. Its advantages include flexible model adaptation to the given data, an economical set of model variables leading to a parametric co-embedding formulation, and a robust model fitting criterion for model optimization based on a quantization procedure. The secondary contribution of this work is the introduction of a set of generic schemes for the qualitative analysis and quantitative assessment of the output of co-embedding algorithms, using existing labelled benchmark datasets. Experiments with synthetic and real-world datasets show that the proposed algorithm is very competitive compared to existing ones.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.66</guid>
  </item>
  <item>
     <title>PrePrint: Efficient Subframe Video Alignment using Short Descriptors</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.56</link>
     <description>This paper addresses the problem of video alignment. We present efficient approaches that allow for spatio-temporal alignment of two sequences. Unlike most related works, we consider independently moving cameras that capture a 3D scene at different times. The novelty of the proposed method lies in the adaptation and extension of an efficient information retrieval framework that casts the sequences as an image database and a set of query frames respectively. The efficient retrieval builds on the recently proposed quad descriptor. In this context, we define the 3D \emph{Vote Space} (VS) by aggregating votes through a multi-querying (multiscale) scheme and we present two solutions based on VS entries; a causal solution that permits online synchronization and a global solution through multiscale dynamic programming. In addition, we extend the recently introduced ECC image-alignment algorithm to the temporal dimension that allows for spatial registration and synchronization refinement with subframe accuracy. We investigate full-search and quantization methods for short descriptors and we compare the proposed schemes with the state-of-the-art. Experiments with real videos by moving or static cameras demonstrate the efficiency of the proposed method and verify its effectiveness with respect to spatio-temporal alignment accuracy.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.56</guid>
  </item>
  <item>
     <title>PrePrint: Symmetric Fast Marching Schemes for Better Numerical Isotropy</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.52</link>
     <description>Existing Fast Marching methods solve the Eikonal equation using a continuous (first order) model to estimate the accumulated cost, but a discontinuous (zero order) model for the traveling cost at each grid point. As a result the estimate of the accumulated cost (calculated numerically) at a given point will vary based on the direction of the arriving front, introducing an anisotropy into the discrete algorithm even though the continuous PDE is itself isotropic. To remove this anisotropy, we propose two very different schemes. In the first model we utilize a continuous interpolation of the traveling cost, which is not biased by the direction of the propagating front. In the second model we upsample the traveling cost on a higher resolution grid to overcome the directional bias. We show the significance of removing the directional bias in the computation of the cost in some applications of fast marching method, demonstrating that both methods make the discrete implementation more isotropic in accordance with the underlying continuous PDE. We also compare the accuracy and computation time of our proposed methods with the existing state of the art fast marching techniques to demonstrate the superiority of our method.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.52</guid>
  </item>
  <item>
     <title>PrePrint: Multi-View Face Detection and Registration Requiring Minimal Manual Intervention</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.37</link>
     <description>Most face recognition systems require faces to be detected and localized a priori. In this paper, an approach to simultaneously detect and localize multiple faces having arbitrary views and different scales is proposed. The main contribution of this paper is the introduction of a face constellation, which enables multi-view face detection and localization. In contrast to other multi-view approaches that require many manually labeled images for training, the proposed face constellation requires only a single reference image of a face containing two manually indicated reference points for initialization. Subsequent training face images from arbitrary views are automatically added to the constellation (registered to the reference image) based on finding the correspondences between distinctive local features. Thus, the key advantage of the proposed scheme is the minimal manual intervention required to train the face constellation. We also propose an approach to identify distinctive correspondence points between pairs of face images in the presence of a large amount of false matches. To detect and localize multiple faces with arbitrary views, we then propose a probabilistic classifier based formulation to evaluate whether a local feature cluster corresponds to a face.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.37</guid>
  </item>
  <item>
     <title>PrePrint: Stereo Seam Carving A Geometrically Consistent Approach</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.46</link>
     <description>Image retargeting algorithms attempt to adapt the image content to the screen without distorting the important objects in the scene. Existing methods address retargeting of a single image. In this paper we propose a novel method for retargeting a pair of stereo images. Naively retargeting each image independently will distort the geometric structure and hence will impair the perception of the 3D structure of the scene. We show how to extend a single image seam carving to work on a pair of images. Our method minimizes the visual distortion in each of the images as well as the depth distortion. A key property of the proposed method is that it takes into account the visibility relations between pixels in the image pair (occluded and occluding pixels). As a result, our method guarantees, as we formally prove, that the retargeted pair is geometrically consistent with a feasible 3D scene, similar to the original one. Hence, the retargeted stereo pair can be viewed on a stereoscopic display or further processed by any computer vision algorithm. We demonstrate our method on a number of challenging indoor and outdoor stereo images.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.46</guid>
  </item>
  <item>
     <title>PrePrint: Multi-Exemplar Affinity Propagation</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.28</link>
     <description>Affinity Propagation (AP) clustering algorithm has received much attention in the past few years. AP is appealing because it is efficient, insensitive to initialization, and it produces clusters at a lower error rate than other exemplar-based methods. However, its single-exemplar model becomes inadequate when applied to model multi-subclasses in some situations such as scene analysis and character recognition. To remedy this deficiency, we have extended the single-exemplar model to a multi-exemplar one to create a new Multi-Exemplar Affinity Propagation (MEAP) algorithm. This new model determines automatically the number of exemplars in each cluster associated with a super exemplar to approximate the subclasses in the category. Solving the model is NP-hard and we tackle it with the max-sum belief propagation to produce neighborhood maximum clusters, with no need to specify beforehand the number of clusters, multi-exemplars, and super-exemplars. Also, utilizing the sparsity in the data, we are able to reduce substantially the computational time and storage. Experimental studies have shown MEAP's significant improvements over other algorithms on unsupervised image categorization and the clustering of handwritten digits.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.28</guid>
  </item>
  <item>
     <title>PrePrint: Learning AND-OR Templates for Object Recognition and Detection</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.35</link>
     <description>This paper presents a framework for unsupervised learning of a hierarchical reconfigurable image template - the AND-OR Template (AOT) for visual objects. The AOT includes: (1) hierarchical composition as ''AND'' nodes, (2) deformation and articulation of parts as geometric &#x0022;OR&#x0022; nodes, and (3) multiple ways of composition as structural &#x0022;OR&#x0022; nodes. The terminal nodes are hybrid image templates (HIT) \cite{hit} that are fully generative to the pixels. We show that both the structures and parameters of the AOT model can be learned in an unsupervised way from images using an information projection principle. The learning algorithm consists of two steps: i) a recursive block pursuit procedure to learn the hierarchical dictionary of primitives, parts and objects, and ii) a graph compression procedure to minimize model structure for better generalizability. We investigate the factors that influence how well the learning algorithm can identify the underlying AOT. And we propose a number of ways to evaluate the performance of the learned AOTs through both synthesized examples and real world images. Our model advances the state-of-the-art for object detection by improving the accuracy of template matching.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.35</guid>
  </item>
  <item>
     <title>PrePrint: Modeling Natural Images Using Gated MRFs</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.29</link>
     <description>This paper describes a Markov Random Field for real-valued image modeling that has two sets of latent variables. One set is used to gate the interactions between all pairs of pixels while the second set determines the mean intensities of each pixel. This is a powerful model with a conditional distribution over the input that is Gaussian with both mean and covariance determined by the configuration of latent variables, which is unlike previous models that were restricted to use Gaussians with either a fixed mean or a diagonal covariance matrix. Thanks to the increased flexibility, this gated MRF can generate more realistic samples after training on an unconstrained distribution of high-resolution natural images. Furthermore, the latent variables of the model can be inferred efficiently and can be used as very effective descriptors in recognition tasks. Both generation and discrimination drastically improve as layers of binary latent variables are added to the model, yielding a hierarchical model called a Deep Belief Network.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.29</guid>
  </item>
  <item>
     <title>PrePrint: Efficient Methods for Overlapping Group Lasso</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.17</link>
     <description>The group Lasso is an extension of the Lasso for feature selection on (predefined) non-overlapping groups of features. The non-overlapping group structure limits its applicability in practice. There have been several recent attempts to study a more general formulation, where groups of features are given, potentially with overlaps between the groups. The resulting optimization is, however, much more challenging to solve due to the group overlaps. In this paper, we consider the efficient optimization of the overlapping group Lasso penalized problem. We reveal several key properties of the proximal operator associated with the overlapping group Lasso, and compute the proximal operator by solving the smooth and convex dual problem, which allows the use of the gradient descent type of algorithms for the optimization. Our methods and theoretical results are then generalized to tackle the general overlapping group Lasso formulation based on the Lq norm. We further extend our algorithm to solve a non-convex overlapping group Lasso formulation based on the capped norm regularization, which reduces the estimation bias introduced by the convex penalty. Our empirical evaluations using both synthetic and real data demonstrate the efficiency of the proposed algorithm. Results also demonstrate the effectiveness of the non-convex formulation for overlapping group Lasso.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.17</guid>
  </item>
  <item>
     <title>PrePrint: Range Image Registration using a Photometric Metric Under unknown Lighting</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.21</link>
     <description>Based on the spherical harmonics representation of image formation, we derive a new photometric metric for evaluating the correctness of a given rigid transformation aligning two overlapping range images captured under unknown, distant and general illumination. We estimate the surrounding illumination and albedo values of points of the two range images from the point correspondences induced by the input transformation. We then synthesize the color of both range images using albedo values transferred using the point correspondences to compute the photometric re-projection error. This way allows us to accurately register two range images by finding the transformation that minimizes the photometric re-projection error. We also propose a practical method using the proposed photometric metric to register pairs of range images devoid of salient geometric features, captured under unknown lighting. Our method uses a hypothesize-and-test strategy to search for the transformation that minimizes our photometric metric. Transformation candidates are efficiently generated by employing the spherical representation of each range image. Experimental results using both synthetic and real data demonstrate the usefulness of the proposed metric.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.21</guid>
  </item>
  <item>
     <title>PrePrint: Joint Histogram Based Cost Aggregation for Stereo Matching</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.15</link>
     <description>This paper presents a novel method for performing efficient cost aggregation in stereo matching. The cost aggregation problem is re-formulated from a perspective of a histogram, giving us a potential to reduce the complexity of the cost aggregation in stereo matching significantly. Different from previous methods which have tried to reduce the complexity in terms of the size of an image and a matching window, our approach focuses on reducing the computational redundancy which exists among the search range, caused by a repeated filtering for all the hypotheses. Moreover, we also reduce the complexity of the window-based filtering through an efficient sampling scheme inside the matching window. The trade-off between accuracy and complexity is extensively investigated by varying the parameters used in the proposed method. Experimental results show that the proposed method provides high-quality disparity maps with a low complexity and outperforms existing local methods. This work also provides new insights into complexity-constrained stereo matching algorithm design.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.15</guid>
  </item>
  <item>
     <title>PrePrint: WESD - Weighted Spectral Distance for Measuring Shape Dissimilarity</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2012.275</link>
     <description>This article presents a new distance for measuring shape dissimilarity between objects. Recent publications introduced the use of eigenvalues of the Laplace operator as compact shape descriptors. Here, we revisit the eigenvalues to define a proper distance, called Weighted Spectral Distance (WESD), for quantifying shape dissimilarity. The definition of WESD is derived through analysing the heat-trace. This analysis provides the proposed distance an intuitive meaning and mathematically links it to the intrinsic geometry of objects. We analyse the resulting distance definition, present and prove its important theoretical properties. Some of these properties include: i) WESD is defined over the entire sequence of eigenvalues yet it is guaranteed to converge, ii) it is a pseudometric, iii) it is accurately approximated with a finite number of eigenvalues, and iv) it can be mapped to the [0,1) interval. Lastly, experiments conducted on synthetic and real objects are presented. These experiments highlight the practical benefits of WESD for applications in vision and medical image analysis.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2012.275</guid>
  </item>
  <item>
     <title>PrePrint: Towards a Theory of Statistical Tree-Shape Analysis</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2012.265</link>
     <description>In order to develop statistical methods for shapes with a tree-structure, we construct a shape space framework for tree-shapes and study metrics on the shape space. This shape space has singularities, which correspond to topological transitions in the represented trees. We study two closely related metrics on the shape space, TED and QED. QED is a quotient Euclidean distance arising naturally from the shape space formulation, while TED is the classical tree edit distance. Using Gromov's metric geometry we gain new insight into the geometries defined by TED and QED. We show that the new metric QED has nice geometric properties which are needed for statistical analysis: geodesics always exist, and are generically locally unique. Following this we can also show existence and generic local uniqueness of average trees for QED. TED, while having some algorithmic advantages, does not share these advantages. Along with the theoretical framework we provide experimental proof-of-concept results on synthetic data trees as well as small airway trees from pulmonary CT scans. This way, we illustrate that our framework has promising theoretical and qualitative properties necessary to build a theory of statistical tree-shape analysis.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2012.265</guid>
  </item>
  <item>
     <title>PrePrint: Calibration of Smooth Camera Models</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2012.258</link>
     <description>Generic imaging models can be used to represent any camera. Current generic models are discrete and define a mapping between each pixel in the image and a straight line in 3D space. This paper presents a modification of the generic camera model that allows the simplification of the calibration procedure. The only requirement is that the coordinates of the 3D projecting lines are related by functions that vary smoothly across space. Such model is obtained by modifying the general imaging model using radial basis functions to interpolate image coordinates and 3D lines, thereby allowing both an increase in resolution (due to their continuous nature) and a more compact representation. Using this variation of the general imaging model we also develop a calibration procedure. This procedure only requires that a 3D point be matched to each pixel. In addition not all the pixels need to be calibrated. As a result the complexity of the procedure is significantly decreased. Normalization is applied to the coordinates of both image and 3D points which increases the accuracy of the calibration. Results with both synthetic and real data sets show that the model and calibration procedure are easily applicable and provide accurate calibration results.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2012.258</guid>
  </item>
  <item>
     <title>PrePrint: Paired Regions for Shadow Detection and Removal</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2012.214</link>
     <description>In this paper, we address the problem of shadow detection and removal from single images of natural scenes. Different from traditional methods that explore pixel or edge information, we employ a region based approach. In addition to considering individual regions separately, we predict relative illumination conditions between segmented regions from their appearances and perform pairwise classification based on such information. Classification results are used to build a graph of segments and graph-cut is used to solve the labeling of shadow and non-shadow regions. Detection results are later refined by image matting, and the shadow free image is recovered by relighting each pixel based on our lighting model. We evaluate our method on the shadow detection dataset in [1]. In addition, we created a new dataset with shadow-free ground truth images, which provides a quantitative basis for evaluating shadow removal. We study the effectiveness of features for both unary and pairwise classification.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2012.214</guid>
  </item>
  <item>
     <title>PrePrint: On Differential Photometric Reconstruction for Unknown, Isotropic BRDFs</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2012.217</link>
     <description>We present a comprehensive theory of photometric reconstruction from image derivatives, in the presence of a general, unknown isotropic BRDF. We derive precise topological classes up to which the surface may be determined and specify exact priors for a full geometric reconstruction. These results are the culmination of a series of fundamental observations. First, we exploit the linearity of differentiation to discover BRDF-independent photometric invariants. For the problem of shape from shading, we show that isocontours of constant magnitude of the gradient may be recovered. For the problem of photometric stereo, we derive a photometric flow that relates image derivatives to surface geometry, using just two measurements of spatial and temporal image derivatives from unknown light directions on a circle. The photometric flow is shown to determine the surface up to isocontours of constant magnitude of the surface gradient, as well as isocontours of constant depth. Further, we prove that specification of the surface normal at a single point completely determines the surface depth from these isocontours. Additionally, we propose practical algorithms that require initial or boundary information, but recover depth from lower order derivatives. Our theoretical results are illustrated with several examples on synthetic and real data.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2012.217</guid>
  </item>
  <item>
     <title>IEEE Transactions on Pattern Analysis and Machine Intelligence - </title>
     <link>http://opac.ieeecomputersociety.org/opac?year=2013&amp;volume=12&amp;issue=07&amp;acronym=tpami</link>
     <description>IEEE Transactions on Pattern Analysis and Machine Intelligence</description>
     <guid isPermaLink="true">http://www.computer.org/portal/site/tpami/</guid>
  </item>
  <item>
     <title>PrePrint: A Search-and-Validate Method for Face Identification from Single Line Drawings</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.82</link>
     <description>Several studies have been made in finding the faces of an object depicted in a line drawing, but the problem has not been completely solved. Although existing methods can find the correct faces in most cases, there is no mechanism to ascertain that they are indeed correct, leaving the human user to do so. This paper uses a two-stage approach &amp;amp;#x2013; find potential faces then validate their correctness &amp;amp;#x2013; to ensure that only correct faces are delivered ultimately. The face finding itself uses a double breadth-first search algorithm, which yields the shortest path, to find the potential faces. The basic premise is that the smallest faces found are more likely the correct ones. They serve as the &amp;amp;#x201C;seed&amp;amp;#x201D; potential faces, from which the algorithm proceeds to search for more faces. If the potential faces found satisfy the validation rules, then they are accepted as correct. Otherwise, the wrong potential faces are identified and removed, and new ones found in their place. The validation process is then repeated. The algorithm is fast and reliable, and can deal with planar-faced manifold and non-manifold objects. Our extensive tests show that the method can deal with most cases efficiently, including those that previous methods cannot solve.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.82</guid>
  </item>
  <item>
     <title>PrePrint: Groupwise Elastic Registration by a New Sparsity-Promoting Metric: Application to the Alignment of Cardiac Magnetic Resonance Perfusion Images</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.74</link>
     <description>This paper proposes a methodology for the joint alignment of a sequence of images based on a groupwise registration procedure by using a new family of metrics that exploit the expected sparseness of the temporal intensity curves corresponding to the aligned points. Therefore, this methodology is able to tackle the alignment of temporal sequences of images in which the represented phenomenon varies in time. Specifically, we have applied it to the correction of motion in contrast-enhanced first-pass perfusion cardiac magnetic resonance images. The time sequence is elastically registered as a whole by using the aforementioned family of multi-image metrics and jointly optimizing the parameters of the transformations involved. The proposed metrics are able to cope with dynamic changes in the intensity content of corresponding points in the sequence guided by the assumption that these changes allow for a sparse representation in a properly selected frame. Results have shown the statistically significant improvement in performance of the proposed metric with respect to previous groupwise registration metrics for the problem at hand, which is especially relevant in order to correct for elastic deformations.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.74</guid>
  </item>
  <item>
     <title>PrePrint: A Bag-of-Features Framework to Classify Time Series</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.72</link>
     <description>Time series classification is an important task with many challenging applications. A nearest-neighbor classifier with dynamic time warping (DTW) distance is a strong solution in this context. On the other hand, feature-based approaches have been proposed as both classifiers and to provide insight into the series, but these approaches have problems handling translations and dilations in local patterns. Considering these shortcomings, we present a framework to classify time series based on a bag-of-features representation (TSBF). Multiple subsequences selected from random locations and of random lengths are partitioned into shorter intervals to capture the local information. Consequently, features computed from these subsequences measure properties at different locations and dilations when viewed from the original series. This provides a feature-based approach that can handle warping (although differently from DTW). Moreover, a supervised learner (that handles mixed data types, different units, etc.) integrates location information into a compact codebook through class probability estimates. Additionally, relevant global features can easily supplement the codebook. TSBF is compared to nearest-neighbor classifiers and other alternatives (bag-of-words strategies, sparse spatial sample kernels, shapelets). Our experimental results show that TSBF provides better results than competitive methods on benchmark datasets from the UCR time series database.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.72</guid>
  </item>
  <item>
     <title>PrePrint: Mapping from Frame-Driven to Frame-Free Event-Driven Vision Systems by Low-Rate Rate-Coding and Coincidence Processing. Application to Feed Forward ConvNets</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.71</link>
     <description>Event-driven visual sensors have attracted interest from a number of different research communities. They provide visual information in quite a different way from conventional video systems consisting of sequences of still images rendered at &amp;amp;#x201C;frame rate&amp;amp;#x201D;. Event-driven vision sensors take inspiration from biology. A special type of Event-driven sensor is the so called Dynamic-Vision-Sensor (DVS) where each pixel computes relative changes of light, or &amp;amp;#x201C;temporal contrast&amp;amp;#x201D;. Pixel events become available with micro second delays with respect to &amp;amp;#x201C;reality&amp;amp;#x201D;. These events can be processed &amp;amp;#x201C;as they flow&amp;amp;#x201D; by a cascade of event (convolution) processors. As a result, input and output event flows are practically coincident, and objects can be recognized as soon as the sensor provides enough meaningful events. In this paper we present a methodology for mapping from a properly trained neural network in a conventional Frame-driven representation, to an Event-driven representation. The method is illustrated by studying Event-driven Convolutional Neural Networks (ConvNet) trained to recognize rotating human silhouettes or high speed poker card symbols. The Event-driven ConvNet is fed with recordings obtained from a real DVS camera. The Event-driven ConvNet is simulated with a dedicated Event-driven simulator, and consists of a number of Event-driven processing modules the characteristics of which are obtained from individually manufactured hardware modules.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.71</guid>
  </item>
  <item>
     <title>PrePrint: Monotonicity and Error Type Differentiability in Performance Measures for Target Detection and Tracking in Video</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.70</link>
     <description>There exists an abundance of systems and algorithms for multiple target detection and tracking in video, and many measures for evaluating the quality of their output have been proposed. The contribution of this paper lies in the following: first, it argues that such performance measures should have two fundamental properties - monotonicity and error type differentiability; second, it shows that the recently proposed measures do not have either of these properties and are thus less usable; third, it composes a set of simple measures, partly built on common practice, that does have these properties. The informativeness of the proposed set of performance measures is demonstrated through their application on face detection and tracking results.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.70</guid>
  </item>
  <item>
     <title>PrePrint: Pose-Robust Recognition of Low-Resolution Face Images</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.68</link>
     <description>Face images captured by surveillance cameras usually have poor resolution in addition to uncontrolled poses and illumination conditions, all of which adversely affect performance of face matching algorithms. In this paper, we develop a completely automatic, novel approach for matching surveillance quality facial images to high resolution images in frontal pose which are often available during enrollment. The proposed approach uses multidimensional scaling to simultaneously transform the features from the poor quality probe images and the high quality gallery images in such a manner that the distances between them approximate the distances had the probe images been captured in the same conditions as the gallery images. Tensor analysis is used for facial landmark localization in the low-resolution uncontrolled probe images for computing the features. Thorough evaluation on the Multi-PIE dataset [1] and comparisons with state-of-the-art super-resolution and classifier-based approaches are performed to illustrate the usefulness of the proposed approach. Experiments on surveillance imagery further signify the applicability of the framework. We also show the usefulness of the proposed approach for the application of tracking and recognition in surveillance videos.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.68</guid>
  </item>
  <item>
     <title>PrePrint: Minimum Near-Convex Shape Decomposition</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.67</link>
     <description>Shape decomposition is a fundamental problem for part-based shape representation. We propose the Minimum Near-Convex Decomposition (MNCD) to decompose arbitrary shapes into minimum number of &amp;amp;#x0022;near-convex&amp;amp;#x0022; parts. The near-convex shape decomposition is formulated as a discrete optimization problem by minimizing the number of non-intersecting cuts. Two perception rules are imposed as constraints into our objective function to improve the visual naturalness of the decomposition. With the degree of near-convexity a user specified parameter, our decomposition is robust to local distortions and shape deformation. The optimization can be efficiently solved via Binary Integer Linear Programming. Both theoretical analysis and experiment results show that our approach outperforms the state-of-the-art results without introducing redundant parts, and thus leads to robust shape representation.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.67</guid>
  </item>
  <item>
     <title>PrePrint: Temporal Localization of Actions with Actoms</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.65</link>
     <description>We address the problem of localizing actions, such as opening a door, in hours of challenging video data. We propose a model based on a sequence of atomic action units, termed &#x0022;actoms&#x0022;, that are semantically meaningful and characteristic for the action. Our Actom Sequence Model (ASM) represents an action as a sequence of histograms of actom-anchored visual features, which can be seen as a temporally structured extension of the bag-of-features. Training requires the annotation of actoms for action examples. At test time, actoms are localized automatically based on a non-parametric model of the distribution of actoms, which also acts as a prior on an action's temporal structure. We present experimental results on two recent benchmarks for action localization: &#x0022;Coffee and Cigarettes&#x0022; and the &#x0022;DLSBP&#x0022; dataset. We also adapt our approach to a classification-by-localization set-up and demonstrate its applicability on the challenging &#x0022;Hollywood 2&#x0022; dataset. We show that our ASM method outperforms the current state of the art in temporal action localization, as well as baselines that localize actions with a sliding window method.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.65</guid>
  </item>
  <item>
     <title>PrePrint: A Framework for Automatic Modeling from Pointcloud Data</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.64</link>
     <description>We propose a complete framework for the automatic modeling from pointcloud data. Initially, the pointcloud data is pre-processed into manageable datasets, which are then separated into clusters using a novel two-step, unsupervised clustering algorithm. The boundaries extracted for each cluster are then simplified and refined using a fast energy minimization process. Finally, 3D models are generated based on the roof outlines. The proposed framework has been extensively tested and the results are reported.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.64</guid>
  </item>
  <item>
     <title>PrePrint: FAIR: a Fast Algorithm for Document Image Restoration.</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.63</link>
     <description>We present in this paper the FAIR algorithm: a Fast Algorithm for document Image Restoration. This algorithm has been submitted to different contests where it showed good performance in comparison to the state of the art. In addition, this method is scale invariant and fast enough to be used in real-time applications. The method is based on a double-threshold edge detection approach which makes it possible to detect small details while remaining robust against noise. The performance of the proposition is evaluated on several types of degraded document images where considerable background noise or variation in contrast and illumination exist.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.63</guid>
  </item>
  <item>
     <title>PrePrint: Improved Object Categorization and Detection Using Comparative Object Similarity</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.58</link>
     <description>Due to the intrinsic long-tailed distribution of objects in the real world, we are unlikely to be able to train an object recognizer/ detector with many visual examples for each category. We have to share visual knowledge between object categories to enable learning with few or no training examples. In this paper, we show that local object similarity information &amp;amp;#x2014; statements that pairs of categories are similar or dissimilar &amp;amp;#x2014; is a very useful cue to tie different categories to each other for effective knowledge transfer. The key insight: given a set of object categories which are similar and a set of categories which are dissimilar, a good object model should respond more strongly to examples from similar categories than to examples from dissimilar categories. To exploit this category dependent similarity regularization, we develop a regularized kernel machine algorithm to train kernel classifiers for categories with few or no training examples. We also adapt the state-ofthe- art object detector [10] to encode object similarity constraints. Our experiments on hundreds of categories from the Labelme dataset show that our regularized kernel classifiers can make significant improvement on object categorization. We also evaluate the improved object detector on the PASCAL VOC 2007 benchmark dataset.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.58</guid>
  </item>
  <item>
     <title>PrePrint: Two Cloud-Based Cues for Estimating Scene Structure and Camera Calibration</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.55</link>
     <description>We describe algorithms that use cloud shadows as a form of stochastically structured light to support 3D scene geometry estimation. Taking video captured from a static outdoor camera as input, we use the relationship of the time series of intensity values between pairs of pixels as the primary input to our algorithms. We describe two cues that relate the 3D distance between a pair of points to the pair of intensity time series. The first cue results from the fact that two pixels that are nearby in the world are more likely to be under a cloud at the same time than two distant clouds. We describe methods for using this cue to estimate focal length and scene structure. The second cue is based on the motion of shadow clouds across the scene; this cue results in a set of linear constraints on scene structure. These constraints have an inherent ambiguity, which we show how to overcome by combining the cloud motion cue with the spatial cue. We evaluate our method on several time lapses of real outdoor scenes.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.55</guid>
  </item>
  <item>
     <title>PrePrint: Semi-Supervised Video Segmentation Using Tree Structured Graphical Models</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.54</link>
     <description>We present a novel patch-based probabilistic graphical model for semi-supervised video segmentation. At the heart of our model is a temporal tree structure which links patches in adjacent frames through the video sequence. This permits exact inference of pixel labels without resorting to traditional short time-window based video processing or instantaneous decision making. The input to our algorithm are labelled key frame(s) of a video sequence and the output is pixel-wise labels along with their confidences. We propose an efficient inference scheme that performs exact inference over the temporal tree, and optionally a per frame label smoothing step using loopy BP, to estimate pixel-wise labels and their posteriors. These posteriors are used to learn pixel unaries by training a Random Decision Forest in a semi-supervised manner. These unaries are used in a second iteration of label inference to improve the segmentation quality. We demonstrate the efficacy of our proposed algorithm using several qualitative and quantitative tests on both foreground/background and multi-class video segmentation problems using publicly available and our own datasets.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.54</guid>
  </item>
  <item>
     <title>PrePrint: Facial Age Estimation by Learning from Label Distributions</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.51</link>
     <description>One of the main difficulties in facial age estimation is that the learning algorithms cannot expect sufficient and complete training data. Fortunately, the faces at close ages look quite similar since aging is a slow and smooth process. Inspired by this observation, instead of considering each face image as an instance with one label (age), this paper regards each face image as an instance associated with a label distribution. The label distribution covers a certain number of class labels, representing the degree that each label describes the instance. Through this way, one face image can contribute to not only the learning of its chronological age, but also the learning of its adjacent ages. Two algorithms named IIS-LLD and CPNN are proposed to learn from such label distributions. Experimental results on two aging face databases show remarkable advantages of the proposed label distribution learning algorithms over the compared single-label learning algorithms, either specially designed for age estimation or for general purpose.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.51</guid>
  </item>
  <item>
     <title>PrePrint: Markerless Motion Capture of Multiple Characters Using Multi-View Image Segmentation</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.47</link>
     <description>Capturing the skeleton motion and detailed time-varying surface geometry of multiple, closely interacting persons is a very challenging task, even in a multi-camera setup, due to frequent occlusions and ambiguities in feature-to-person assignments. In order to address this task, we propose a framework that exploits multi-view image segmentation. To this end, a probabilistic shape and appearance model is employed to segment the input images and to assign each pixel uniquely to one person. Given the articulated template models of each person and the labeled pixels, a combined optimization scheme, which splits the skeleton pose optimization problem into a local one and a lower dimensional global one, is applied one-by-one to each individual, followed with surface estimation to capture detailed non-rigid deformations. We show on various sequences that our approach can capture the 3D motion of humans accurately even if they move rapidly, if they wear wide apparel, and if they are engaged in challenging multi-person motions, including dancing, wrestling, and hugging.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.47</guid>
  </item>
  <item>
     <title>PrePrint: 3D Face Recognition Under Expressions, Occlusions and Pose Variations</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.48</link>
     <description>We propose a novel geometric framework for analyzing 3D faces, with the specific goals of comparing, matching, and averaging their shapes. Here we represent facial surfaces by radial curves emanating from the nose tips and use elastic shape analysis of these curves to develop a Riemannian framework for analyzing shapes of full facial surfaces. This representation, along with the elastic Riemannian metric, seems natural for measuring facial deformations and is robust to challenges such as large facial expressions (especially those with open mouths), large pose variations, missing parts, and partial occlusions due to glasses, hair, etc. This framework is shown to be promising from both - empirical and theoretical - perspectives. In terms of the empirical evaluation, our results match or improve the state-of-the-art methods on three prominent databases: FRGCv2, GavabDB, and Bosphorus, each posing a different type of challenge. From a theoretical perspective, this framework allows for formal statistical inferences, such as the estimation of missing facial parts using PCA on tangent spaces and computing average shapes.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.48</guid>
  </item>
  <item>
     <title>PrePrint: A Coarse to Fine Minutiae-Based Latent Palmprint Matching</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.39</link>
     <description>With the availability of live-scan palmprint technology, high resolution palmprint recognition has started to receive significant attention in forensics and law enforcement. In forensic applications, latent palmprints provide critical evidence as it is estimated that about 30 percent of the latents recovered at crime scenes are those of palms. Considering the large number of minutiae and large area of foreground region in full palmprints, novel strategies need to be developed for efficient and robust latent palmprint matching. In this paper, a coarse to fine matching strategy based on minutiae clustering and minutiae match propagation is designed specifically for palmprint matching. The proposed palmprint matching algorithm has been evaluated on a latent-to-full palmprint database consisting of 446 latents and 12,489 background full prints. The matching results show a rank-1 identification accuracy of 79.4%, which is significantly higher than the 60.8% identification accuracy of a state of the art latent palmprint matching algorithm on the same latent database. The average computation time of our algorithm for a single latent-to-full match is about 141ms for genuine match and 50ms for impostor match, on a Windows XP desktop system with 2.2GHz CPU and 1.00GB RAM.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.39</guid>
  </item>
  <item>
     <title>PrePrint: Calibration by Correlation using Metric Embedding from Non-Metric Similarities</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.34</link>
     <description>This paper presents a new intrinsic calibration method that allows us to calibrate a generic single-view point camera. From the video sequence obtained while the camera undergoes random motion, we compute the pairwise time correlation of the luminance signal for the pixels. We show that the pairwise correlation of any pixels pair is a function of the distance between the pixel directions on the visual sphere. This leads to formalizing calibration as a problem of metric embedding from non-metric measurements: we want to find the disposition of pixels on the visual sphere, from similarities that are an unknown function of the distances. This problem is a generalization of multidimensional scaling (MDS) that has so far resisted a comprehensive observability analysis and a generic solution. We show that the observability depends both on the local geometric properties as well as on the global topological properties of the target manifold. It follows that, in contrast to the Euclidean case, on the sphere we can recover the scale of the points distribution. We describe an algorithm that is robust across manifolds and can recover a metrically accurate solution when the metric information is observable. We demonstrate the performance of the algorithm for several cameras (pin-hole, fish-eye, omnidirectional).</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.34</guid>
  </item>
  <item>
     <title>PrePrint: Exhaustive Linearization for Robust Camera Pose and Focal Length Estimation</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.36</link>
     <description>We propose a novel approach for the estimation of the pose and focal length of a camera from a set of 3D-to-2D point correspondences. Our method compares favorably to competing approaches in that it is both more accurate than existing closed form solutions, as well as faster and also more accurate than iterative ones. Our approach is inspired on the EPnP algorithm, a recent O(n) solution for the calibrated case. Yet, we show that considering the focal length as an additional unknown renders the linearization and relinearization techniques of the original approach no longer valid, especially with large amounts of noise. We present new methodologies to circumvent this limitation termed exhaustive linearization and exhaustive relinearization which perform a systematic exploration of the solution space in closed form. The method is evaluated on both real and synthetic data, and our results show that besides producing precise focal length estimation, the retrieved camera pose is almost as accurate as the one computed using the EPnP, which assumes a calibrated camera.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.36</guid>
  </item>
  <item>
     <title>PrePrint: Modeling Temporal Interactions with Interval Temporal Bayesian Networks for Complex Activity Recognition</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.33</link>
     <description>Complex activities typically consist of multiple primitive events happening in parallel or sequentially over a period of time. Understanding such activities requires recognizing not only each individual event but, more importantly, capturing their spatiotemporal dependencies over different time intervals. Most of current graphical model-based approaches have several limitations. First, time-sliced graphical models such as Hidden Markov Models (HMMs) and Dynamic Bayesian Networks are typically based on points of time and they hence can only capture three temporal relations: precedes, follows, and equals. Second, HMMs are probabilistic finite-state machine that grow exponentially as the number of parallel events increases. Third, other approaches such as syntactic and description-based methods, while rich in modeling temporal relationships, do not have the expressive power to capture uncertainties. To address these issues, we introduce the Interval Temporal Bayesian Network (ITBN), a novel graphical model that combines the Bayesian Network with the Interval Algebra (IA) to explicitly model the temporal dependencies over time intervals. Advanced machine learning methods are introduced to learn the ITBN model structure and parameters. Experimental results show that by reasoning with spatiotemporal dependencies the proposed model leads to a significantly improved performance when modeling and recognizing complex activities involving both parallel and sequential events.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.33</guid>
  </item>
  <item>
     <title>PrePrint: Projective Multi-view Structure and Motion from Element-wise Factorization</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.20</link>
     <description>The Sturm-Triggs type iteration is a classic approach for solving the projective structure-from-motion (SfM) factorization problem, which iteratively solves the projective depths, scene structure and camera motions in an alternated fashion. Like many other iterative algorithms, the Sturm-Triggs iteration suffers from common drawbacks such as requiring a good initialization, the iteration may not converge or only converge to a local minimum, etc. In this paper, we formulate the projective SfM problem as a novel and original element-wise factorization (i.e., Hadamard factorization) problem, as opposed to the conventional matrix factorization. Thanks to this formulation, we are able to solve the projective depths, structure and camera motions simultaneously by convex optimization. To address the scalability issue, we adopt a continuation based algorithm. Our method is a global method, in the sense that it is guaranteed to obtain a globally-optimal solution up to relaxation gap. Another advantage is that, our method can handle challenging real-world situations such as missing data and outliers quite easily, and all in a natural and unified manner. Extensive experiments on both synthetic and real images show comparable results compared with the state-of-the-art methods.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.20</guid>
  </item>
  <item>
     <title>PrePrint: Localizing Parts of Faces Using a Consensus of Exemplars</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.23</link>
     <description>We present a novel approach to localizing parts in images of human faces. The approach combines the output of local detectors with a non-parametric set of global models for the part locations based on over one thousand hand-labeled exemplar images. By assuming that the global models generate the part locations as hidden variables, we derive a Bayesian objective function. This function is optimized using a consensus of models for these hidden variables. The resulting localizer handles a much wider range of expression, pose, lighting and occlusion than prior ones. We show excellent performance on real-world face datasets such as Labeled Faces in the Wild (LFW) and a new Labeled Face Parts in the Wild (LFPW), and show that our localizer achieves state-of-the-art performance on the less challenging BioID dataset.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.23</guid>
  </item>
  <item>
     <title>PrePrint: Fast Detection of Dense Subgraphs with Iterative Shrinking and Expansion</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.16</link>
     <description>In this paper, we propose an efficient algorithm to detect dense subgraphs of a weighted graph. The proposed algorithm, called Shrinking and Expansion Algorithm (SEA), iterates between two phases, namely, expansion phase and shrink phase, until convergence. For a current subgraph, the expansion phase adds the most related vertices based on the average affinity between each vertex and the subgraph. The shrink phase considers all pairwise relations in the current subgraph and filters out vertices whose average affinities to other vertices are smaller than the average affinity of the result subgraph. In both phases, SEA operates on small subgraphs, thus it is very efficient. Significant dense subgraphs are robustly enumerated by running SEA from each vertex of the graph. We evaluate SEA on two different applications: solving correspondence problems and cluster analysis. Both theoretic analysis and experimental results show that SEA is very efficient and robust, especially when there exist large amount of noises in edge weights.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.16</guid>
  </item>
  <item>
     <title>PrePrint: Stacked Autoencoders for Unsupervised Feature Learning and Multiple Organ Detection in a Pilot Study Using 4D Patient Data</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2012.277</link>
     <description>Medical image analysis remains a challenging application area for artificial intelligence. When applying machine learning, obtaining ground-truth labels for supervised learning is more difficult than in many more common applications of machine learning. This is especially so for datasets with abnormalities, as tissue types and the shapes of the organs in these datasets differ widely. However, organ detection in such an abnormal dataset may have many promising potential real world applications such as automatic diagnosis, automated radiotherapy planning, and medical image retrieval, where new multi-modal medical images provide more information about the imaged tissues for diagnosis. Here we test the application of deep learning methods to organ identification in magnetic resonance medical images, with visual and temporal hierarchical features learnt to categorise object classes from an unlabelled multi-modal DCE-MRI dataset, so that only a weakly supervised training is required for a classifier. A probabilistic patch-based method was employed for multiple organ detection, with the features learnt from the deep learning model. This shows the potential of the deep learning model for application to medical images, despite the difficulty of obtaining libraries of correctly labelled training datasets, and despite the intrinsic abnormalities present in patient datasets.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2012.277</guid>
  </item>
  <item>
     <title>PrePrint: Scaling up Spike-and-Slab Models for Unsupervised Feature Learning</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2012.273</link>
     <description>We describe the use of two spike-and-slab models for modeling real-valued data, with an emphasis on their applications to object recognition. The &amp;amp;#64257;rst model, which we call spike-and-slab sparse coding (S3C), is a pre-existing model for which we introduce a faster approximate inference algorithm. We introduce a deep variant of S3C which we call the partially directed deep Boltzmann machine (PD-DBM) and extend our S3C inference algorithm for use on this model. We describe learning procedures for each. We demonstrate that our inference procedure for S3C enables scaling the model to unprecedently large problem sizes, and demonstrate that using S3C as a feature extractor results in very good object recognition performance, particularly when the number of labeled examples is low. We show that the PD-DBM generates better samples than its shallow counterpart, and that unlike DBMs or DBNs, the PD-DBM may be trained successfully without greedy layerwise training.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2012.273</guid>
  </item>
  <item>
     <title>PrePrint: Characterizing Humans on Riemannian Manifolds</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2012.263</link>
     <description>In surveillance applications, head and body orientation of people is of primary importance for assessing many behavioural traits. Unfortunately, in this context people is often encoded by few, noisy pixels, so that their characterization is difficult. We face this issue, proposing a computational framework which is based on an expressive descriptor, the covariance of features. Covariances have been employed for pedestrian detection purposes, actually, a binary classification problem on Riemannian manifolds. In this paper, we show how to extend to the multi-classification case, presenting a novel descriptor, named Weighted ARray of COvariances, WARCO, especially suited for dealing with tiny image representations. The extension requires a novel differential geometry approach, in which covariances are projected on a unique tangent space, where standard machine learning techniques can be applied. In particular, we adopt the Campbell-Baker-Hausdorff expansion as a means to approximate on the tangent space the genuine (geodesic) distances on the manifold, in a very efficient way. We test our methodology on multiple benchmark datasets, and also propose new testing sets, getting convincing results in all the cases.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2012.263</guid>
  </item>
  <item>
     <title>PrePrint: Articulated Human Detection with Flexible Mixtures-of-Parts</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2012.261</link>
     <description>We describe a method for articulated human detection and human pose estimation in static images based on a new representation of deformable part models. Rather than modeling articulation using a family of warped (rotated and foreshortened) templates, we use a mixture of small, non-oriented parts. We describe a general, flexible mixture model that jointly captures spatial relations between part locations and co-occurence relations between part mixtures, augmenting standard pictorial structure models that encode just spatial relations. Our models have several notable properties: (1) they efficiently model articulation by sharing computation across similar warps (2) they efficiently model an exponentially-large set of global mixtures through composition of local mixtures and (3) they capture the dependency of global geometry on local appearance (parts look different at different locations). When relations are tree-structured, our models can be efficiently optimized with dynamic programming. We introduce novel criteria for evaluating pose estimation and human detection, both separately and jointly. We show that currently-used evaluation criteria may conflate these two issues. We present experimental results on standard benchmarks that suggest our approach is the state-of-the-art system for pose estimation, improving past work on the challenging Parse and Buffy datasets, while being orders of magnitude faster.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2012.261</guid>
  </item>
  <item>
     <title>PrePrint: Efficient Human Pose Estimation from Single Depth Images</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2012.241</link>
     <description>We describe two new approaches to human pose estimation. Both can quickly and accurately predict the 3D positions of body joints from a single depth image, without using any temporal information. The key to both approaches is the use of a large, realistic, and highly varied synthetic set of training images. This allows us to learn models that are largely invariant to factors such as pose, body shape, field-of-view cropping, and clothing. Our first approach employs an intermediate body parts representation, designed so that an accurate per-pixel classification of the parts will localize the joints of the body. The second approach instead directly regresses the positions of body joints. By using simple depth pixel comparison features, and parallelizable decision forests, both approaches can run super-realtime on consumer hardware. Our evaluation investigates many aspects of our methods, and compares the approaches to each other and to the state of the art. Results on silhouettes suggest broader applicability to other imaging modalities.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2012.241</guid>
  </item>
  <item>
     <title>PrePrint: USAC: A Universal Framework for Random Sample Consensus</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2012.257</link>
     <description>A computational problem that arises frequently in computer vision is that of estimating the parameters of a model from data that has been contaminated by noise and outliers. More generally, any practical system that seeks to estimate quantities from noisy data measurements must some means of dealing with data contamination. The Random Sample Consensus (RANSAC) algorithm is one of the most popular tools for robust estimation. Recent years have seen an explosion of activity in this area, leading to the development of a number of techniques that improve upon the efficiency and robustness of the basic algorithm. In this paper, we present a comprehensive overview of recent research in RANSAC-based robust estimation, by analyzing various approaches that have been explored over the years. We provide a common context for this analysis by introducing a new framework, which we call Universal RANSAC (USAC). USAC extends the hypothesize-and-verify structure of RANSAC to incorporate a number of important practical and computational considerations. In addition, we provide a general-purpose C++ software library that implements the USAC framework by leveraging state of the art algorithms for the various modules. The implementation we provide can be used by researchers either as a stand-alone tool for robust estimation, or as a benchmark for evaluating new techniques.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2012.257</guid>
  </item>
  <item>
     <title>PrePrint: Image-Based Separation of Reflective and Fluorescent Components using Illumination Variant and Invariant Color</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2012.255</link>
     <description>Traditionally researchers tend to exclude fluorescence from color appearance algorithms in computer vision and image processing because of its complexity. In reality, fluorescence is a very common phenomenon observed in many objects, from gems and corals, to different kinds of writing paper, and to our clothes. In this paper, we provide detailed theories of fluorescence phenomenon. In particular, we show that the color appearance of fluorescence is unaffected by illumination in which it differs from ordinary reflectance. Moreover, we show that the color appearance of objects with reflective and fluorescent components can be represented as a linear combination of the two components. A linear model allows us to separate the two components using images taken under unknown illuminants using independent component analysis(ICA). The effectiveness of the proposed method is demonstrated using digital images of various fluorescent objects.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2012.255</guid>
  </item>
  <item>
     <title>PrePrint: Keeping a Pan-Tilt-Zoom Camera Calibrated</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPAMI.2012.250</link>
     <description>Pan-tilt-zoom (PTZ) cameras are pervasive in modern surveillance systems. However, we demonstrate that the (pan, tilt) coordinates reported by PTZ cameras become inaccurate after many hours of operation, endangering tracking and 3D localization algorithms that rely on the accuracy of such values. To solve this problem, we propose a complete model for a pan-tilt-zoom camera that explicitly reflects how focal length and lens distortion vary as a function of zoom scale. We show how the parameters of this model can be quickly and accurately estimated using a series of simple initialization steps followed by a nonlinear optimization. Our method requires only ten images to achieve accurate calibration results. Next, we show how the calibration parameters can be maintained using a one-shot dynamic correction process; this ensures that the camera returns the same field of view every time the user requests a given (pan, tilt, zoom), even after hundreds of hours of operation. The dynamic calibration algorithm is based on matching the current image against a stored feature library created at the time the PTZ camera is mounted. We evaluate the calibration and dynamic correction algorithms on both experimental and real-world datasets, demonstrating the effectiveness of the techniques.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPAMI.2012.250</guid>
  </item>
   </channel>
</rss>