Creat membership Creat membership
Sign in

Forgot password?

Confirm
  • Forgot password?
    Sign Up
  • Confirm
    Sign In
home > search

Now showing items 1 - 16 of 79

  • COARSE-TO-FINE HAND DETECTION METHOD USING DEEP NEURAL NETWORK

    A detection process to identify one or more areas containing a hand or hands of one or more subjects in an image is provided. The detection process can start with coarsely locating one or more segments in the image that contain portions of the hand (s) of the subject (s) in the image using a coarse CNN. The detection process can then combine these segments to obtain the one or more areas capturing the hand (s) of the subject (s) in the image. The combined area (s) can then be fed to a grid-based deep neural network finely detect area (s) in the image that contain only the hand (s) of the subject (s) captured.
    Download Collect
  • Visual concept conjunction learning with recurrent neural networks

    Liang, Kongming   Chang, Hong   Shan, Shiguang   Chen, Xilin  

    Learning the conjunction of multiple visual concepts shows practical significance in various real world applications (e.g. multi-attribute image retrieval and visual relationship detection). In this paper, we propose Concept Conjunction Recurrent Neural Network ((CRNN)-R-2) to tackle this problem. With our model, visual concepts involved in a conjunction are mapped into the hidden units and combined in a recurrent way to generate the representation of the concept conjunction, which is then used to compute a concept conjunction classifier as the output. We also present an order invariant version of the proposed method based on attention mechanism to learn the tasks without pre-defined concept order. To tackle concept conjunction learning from multiple semantic domains, we introduce a multiplicative framework to learn the joint representation. Experimental results on multi-attribute image retrieval and visual relationship detection show that our method achieves significantly better performance than other related methods on various datasets. (C) 2019 Published by Elsevier B.V.
    Download Collect
  • Deep Heterogeneous Hashing for Face Video Retrieval

    Qiao, Shishi   Wang, Ruiping   Shan, Shiguang   Chen, Xilin  

    Retrieving videos of a particular person with face image as query via hashing technique has many important applications. While face images are typically represented as vectors in Euclidean space, characterizing face videos with some robust set modeling techniques (e.g. covariance matrices as exploited in this study, which reside on Riemannian manifold), has recently shown appealing advantages. This hence results in a thorny heterogeneous spaces matching problem. Moreover, hashing with handcrafted features as done in many existing works is clearly inadequate to achieve desirable performance for this task. To address such problems, we present an end-to-end Deep Heterogeneous Hashing (DHH) method that integrates three stages including image feature learning, video modeling, and heterogeneous hashing in a single framework, to learn unified binary codes for both face images and videos. To tackle the key challenge of hashing on manifold, a well-studied Riemannian kernel mapping is employed to project data (i.e. covariance matrices) into Euclidean space and thus enables to embed the two heterogeneous representations into a common Hamming space, where both intra-space discriminability and inter-space compatibility are considered. To perform network optimization, the gradient of the kernel mapping is innovatively derived via structured matrix backpropagation in a theoretically principled way. Experiments on three challenging datasets show that our method achieves quite competitive performance compared with existing hashing methods.
    Download Collect
  • Deformable face net for pose invariant face recognition

    He, Mingjie   Zhang, Jie   Shan, Shiguang   Kan, Meina   Chen, Xilin  

    Unconstrained face recognition still remains a challenging task due to various factors such as pose, expression, illumination, partial occlusion, etc. In particular, the most significant appearance variations are stemmed from poses which leads to severe performance degeneration. In this paper, we propose a novel Deformable Face Net (DFN) to handle the pose variations for face recognition. The deformable convolution module attempts to simultaneously learn face recognition oriented alignment and identity-preserving feature extraction. The displacement consistency loss (DCL) is proposed as a regularization term to enforce the learnt displacement fields for aligning faces to be locally consistent both in the orientation and amplitude since faces possess strong structure. Moreover, the identity consistency loss (ICL) and the pose-triplet loss (PTL) are designed to minimize the intra-class feature variation caused by different poses and maximize the inter-class feature distance under the same poses. The proposed DFN can effectively handle pose invariant face recognition (PIFR). Extensive experiments show that the proposed DFN outperforms the state-of-the-art methods, especially on the datasets with large poses. (C) 2019 Published by Elsevier Ltd.
    Download Collect
  • Hierarchical Attention for Part-Aware Face Detection

    Wu, Shuzhe   Kan, Meina   Shan, Shiguang   Chen, Xilin  

    Expressive representations for characterizing face appearances are essential for accurate face detection. Due to different poses, scales, illumination, occlusion, etc, face appearances generally exhibit substantial variations, and the contents of each local region (facial part) vary from one face to another. Current detectors, however, particularly those based on convolutional neural networks, apply identical operations (e.g. convolution or pooling) to all local regions on each face for feature aggregation (in a generic sliding-window configuration), and take all local features as equally effective for the detection task. In such methods, not only is each local feature suboptimal due to ignoring region-wise distinctions, but also the overall face representations are semantically inconsistent. To address the issue, we design a hierarchical attention mechanism to allow adaptive exploration of local features. Given a face proposal, part-specific attention modeled as learnable Gaussian kernels is proposed to search for proper positions and scales of local regions to extract consistent and informative features of facial parts. Then face-specific attention predicted with LSTM is introduced to model relations between the local parts and adjust their contributions to the detection tasks. Such hierarchical attention leads to a part-aware face detector, which forms more expressive and semantically consistent face representations. Extensive experiments are performed on three challenging face detection datasets to demonstrate the effectiveness of our hierarchical attention and make comparisons with state-of-the-art methods.
    Download Collect
  • Locality-constrained framework for face alignment

    Zhang, Jie   Zhao, Xiaowei   Kan, Meina   Shan, Shiguang   Chai, Xiujuan   Chen, Xilin  

    Although the conventional active appearance model (AAM) has achieved some success for face alignment, it still suffers from the generalization problem when be applied to unseen subjects and images. To deal with the generalization problem of AAM, we first reformulate the original AAM as sparsity-regularized AAM, which can achieve more compact/better shape and appearance priors by selecting nearest neighbors as the bases of the shape and appearance model. To speed up the fitting procedure, the sparsity in sparsity-regularized AAM is approximated by using the locality (i.e., K-nearest neighbor), and thus inducing the locality-constrained active appearance-model (LC-AAM). The LC-AAM solves a constrained AAM-like fitting problem with the K-nearest neighbors as the bases of shape and appearance model. To alleviate the adverse influence of inaccurate K-nearest neighbor results, the locality constraint is further embedded in the discriminative fitting method denoted as LC-DFM, which can find better K-nearest neighbor results by employing shape-indexed feature, and can also tolerate some inaccurate neighbors benefited from the regression model rather than the generative model in AAM. Extensive experiments on several datasets demonstrate that our methods outperform the state-of-the-arts in both detection accuracy and generalization ability.
    Download Collect
  • AttGAN: Facial Attribute Editing by Only Changing What You Want

    He, Zhenliang   Zuo, Wangmeng   Kan, Meina   Shan, Shiguang   Chen, Xilin  

    Facial attribute editing aims to manipulate single or multiple attributes on a given face image, i.e., to generate a new face image with desired attributes while preserving other details. Recently, the generative adversarial net (GAN) and encoder-decoder architecture are usually incorporated to handle this task with promising results. Based on the encoder-decoder architecture, facial attribute editing is achieved by decoding the latent representation of a given face conditioned on the desired attributes. Some existing methods attempt to establish an attribute-independent latent representation for further attribute editing. However, such attribute-independent constraint on the latent representation is excessive because it restricts the capacity of the latent representation and may result in information loss, leading to over-smooth or distorted generation. Instead of imposing constraints on the latent representation, in this work, we propose to apply an attribute classification constraint to the generated image to just guarantee the correct change of desired attributes, i.e., to change what you want. Meanwhile, the reconstruction learning is introduced to preserve attribute-excluding details, in other words, to only change what you want. Besides, the adversarial learning is employed for visually realistic editing. These three components cooperate with each other forming an effective framework for high quality facial attribute editing, referred as AttGAN. Furthermore, the proposed method is extended for attribute style manipulation in an unsupervised manner. Experiments on two wild datasets, CelebA and LFW, show that the proposed method outperforms the state-of-the-art on realistic attribute editing with other facial details well preserved.
    Download Collect
  • Deep Supervised Hashing for Fast Image Retrieval

    Liu, Haomiao   Wang, Ruiping   Shan, Shiguang   Chen, Xilin  

    In this paper, we present a new hashing method to learn compact binary codes for highly efficient image retrieval on large-scale datasets. While the complex image appearance variations still pose a great challenge to reliable retrieval, in light of the recent progress of Convolutional Neural Networks (CNNs) in learning robust image representation on various vision tasks, this paper proposes a novel Deep Supervised Hashing method to learn compact similarity-preserving binary code for the huge body of image data. Specifically, we devise a CNN architecture that takes pairs/triplets of images as training inputs and encourages the output of each image to approximate discrete values (e.g. +1). To this end, the loss functions are elaborately designed to maximize the discriminability of the output space by encoding the supervised information from the input image pairs/triplets, and simultaneously imposing regularization on the real-valued outputs to approximate the desired discrete values. For image retrieval, new-coming query images can be easily encoded by forward propagating through the network and then quantizing the network outputs to binary codes representation. Extensive experiments on three large scale datasets CIFAR-10, NUS-WIDE, and SVHN show the promising performance of our method compared with the state-of-the-arts.
    Download Collect
  • Adaptive Metric Learning For Zero-Shot Recognition

    Jiang, Huajie   Wang, Ruiping   Shan, Shiguang   Chen, Xilin  

    Zero-shot learning (ZSL) has enjoyed great popularity in recent years due to its ability to recognize novel objects, where semantic information is exploited to build up relations among different categories. Traditional ZSL approaches usually focus on learning more robust visual-semantic embeddings among seen classes and directly apply them to the unseen classes without considering whether they are suitable. It is well known that domain gap exists between seen and unseen classes. In order to tackle such problem, we propose a novel adaptive metric learning approach to measure the compatibility between visual samples and class semantics, where class similarities are utilized to adapt the visual-semantic embedding to the unseen classes. Extensive experiments on four benchmark ZSL datasets show the effectiveness of the proposed approach.
    Download Collect
  • Deep Learning for Pattern Recognition

    Zhang, Zhaoxiang   Shan, Shiguang   Fang, Yi   Shao, Ling  

    Download Collect
  • Deep Learning for Pattern Recognition

    Zhang, Zhaoxiang   Shan, Shiguang   Fang, Yi   Shao, Ling  

    Download Collect
  • Hierarchical Attention for Part-Aware Face Detection

    Wu, Shuzhe   Kan, Meina   Shan, Shiguang   Chen, Xilin  

    Download Collect
  • Deep Supervised Hashing for Fast Image Retrieval

    Liu, Haomiao   Wang, Ruiping   Shan, Shiguang   Chen, Xilin  

    Download Collect
  • RGB-D Face Recognition via Deep Complementary and Common Feature Learning

    Zhang, Hao   Han, Hu   Cui, Jiyun   Shan, Shiguang   Chen, Xilin  

    RGB-D face recognition has attracted increasing attentions in recent years because of its robustness in unconstrained environment. However, existing approaches either handle individual modalities using completely separate pipelines or treat all the modalities equally using the same pipeline. Such approaches did not adequately consider the modality differences and exploit the modality correlations. We propose a novel approach for RGB-D face recognition that is able to learn complementary features from multiple modalities and common features between different modalities. Specifically, we introduce a joint loss taking activation from both modality-specific feature learning networks, and enforcing the features to be learned in a complementary way. We further extend the capability of this multi-modality (e.g., RGB-D vs. RGB-D) matcher into cross-modality (e.g., RGB vs. RGB-D) scenarios by learning a common feature transformation mapping different modalities into the same feature space. Experimental results on a number of public RGB-D face databases (e.g., EURECOM, VAP, IIIT-D, and BUAA), and a large RGB-D database we collected, show the impressive performance of the proposed approach.
    Download Collect
  • Improving 2D Face Recognition via Discriminative Face Depth Estimation

    Cui, Jiyun   Zhang, Hao   Han, Hu   Shan, Shiguang   Chen, Xilin  

    As face recognition progresses from constrained scenarios to unconstrained scenarios, new challenges such as large pose, bad illumination, and partial occlusion, are encountered. While 3D or multi-modality RGB-D sensors are helpful for face recognition systems to achieve robustness against these challenges, the requirement of new sensors limits their application scenarios. In our paper, we propose a discriminative face depth estimation approach to improve 2D face recognition accuracies under unconstrained scenarios. Our discriminative depth estimation method uses a cascaded FCN and CNN architecture, in which FCN aims at recovering the depth from an RGB image, and CNN retains the separability of individual subjects. The estimated depth information is then used as a complementary modality to RGB for face recognition tasks. Experiments on two public datasets and a dataset we collect show that the proposed face recognition method using RGB and estimated depth information can achieve better accuracy than using RGB modality alone.
    Download Collect
  • HeadNet: Pedestrian Head Detection Utilizing Body in Context

    Chen, Gang   Cai, Xufen   Han, Hu   Shan, Shiguang   Chen, Xilin  

    Pedestrian head with arbitrary poses and size is prohibitively difficult to detect in many real world applications. An appealing alternative is to utilize object detection technologies, which tend to be more and more mature and faster. However, general object detection technologies can hardly work in complicated scenarios where many heads are often too small to detect. In this paper, we present a novel approach that learns a semantic connection between pedestrian head and other body parts for head detection. Specifically, the proposed model, named as HeadNet, is based on PVANet backbone and also introduces beneficial strategies including online hard example mining (OHEM), fine-grained feature maps, Rot Align and Body in Context (BiC). Experiments demonstrate that our approach is able to utilize spatial semantics of the entire body effectively, and gains inspiring performance for pedestrian head detection.
    Download Collect
1 2 3 4 5

Contact

If you have any feedback, Please follow the official account to submit feedback.

Turn on your phone and scan

Submit Feedback