Creat membership Creat membership
Sign in

Forgot password?

Confirm
  • Forgot password?
    Sign Up
  • Confirm
    Sign In
home > search

Now showing items 1 - 16 of 117

  • Robust visual tracking with channel attention and focal loss

    Li, Dongdong   Wen, Gongjian   Kuai, Yangliu   Zhu, Lingxiao   Porikli, Fatih  

    Recently, the tracking community leads a fashion of end-to-end feature representation learning for visual tracking. Previous works treat all feature channels and training samples equally during training. This ignores channel interdependencies and foreground-background data imbalance, thus limiting the tracking performance. To tackle these problems, we introduce channel attention and focal loss into the network design to enhance feature representation learning. Specifically, a Squeeze-and-Excitation (SE) block is coupled to each convolutional layer to generate channel attention. Channel attention reflects the channel-wise importance of each feature channel and is used for feature weighting in online tracking. To alleviate the foreground-background data imbalance, we propose a focal logistic loss by adding a modulating factor to the logistic loss, with two tunable focusing parameters. The focal logistic loss down-weights the loss assigned to easy examples in the background area. Both the SE block and focal logistic loss are computationally lightweight and impose only a slight increase in model complexity. Extensive experiments are performed on three challenging tracking datasets including OTB100, UAV123 and TC128. Experimental results demonstrate that the enhanced tracker achieves significant performance improvement while running at a real-time frame-rate of 66 fps. (C) 2019 Published by Elsevier B.V.
    Download Collect
  • Semi-Supervised Video Object Segmentation with Super-Trajectories

    Wang, Wenguan   Shen, Jianbing   Porikli, Fatih   Yang, Ruigang  

    We introduce a semi-supervised video segmentation approach based on an efficient video representation, called as "super-trajectory". A super-trajectory corresponds to a group of compact point trajectories that exhibit consistent motion patterns, similar appearances, and close spatiotemporal relationships. We generate the compact trajectories using a probabilistic model, which enables handling of occlusions and drifts effectively. To reliably group point trajectories, we adopt the density peaks based clustering algorithm that allows capturing rich spatiotemporal relations among trajectories in the clustering process. We incorporate two intuitive mechanisms for segmentation, called as reverse-tracking and object re-occurrence, for robustness and boosting the performance. Building on the proposed video representation, our segmentation method is discriminative enough to accurately propagate the initial annotations in the first frame onto the remaining frames. Our extensive experimental analyses on three challenging benchmarks demonstrate that, our method is capable of extracting the target objects from complex backgrounds, and even reidentifying them after prolonged occlusions, producing high-quality video object segments. The code and results are available at: https://github.com/wenguanwang/SupertrajectorySeg.
    Download Collect
  • Regularization of deep neural networks with spectral dropout

    Khan, Salman H.   Hayat, Munawar   Porikli, Fatih  

    The big breakthrough on the ImageNet challenge in 2012 was partially due to the 'Dropout' technique used to avoid overfitting. Here, we introduce a new approach called `Spectral Dropout' to improve the generalization ability of deep neural networks. We cast the proposed approach in the form of regular Convolutional Neural Network (CNN) weight layers using a decorrelation transform with fixed basis functions. Our spectral dropout method prevents overfitting by eliminating weak and 'noisy' Fourier domain coefficients of the neural network activations, leading to remarkably better results than the current regularization methods. Furthermore, the proposed is very efficient due to the fixed basis functions used for spectral transformation. In particular, compared to Dropout and Drop-Connect, our method significantly speeds up the network convergence rate during the training process (roughly x2), with considerably higher neuron pruning rates (an increase of similar to 30%). We demonstrate that the spectral dropout can also be used in conjunction with other regularization approaches resulting in additional performance gains. Published by Elsevier Ltd.
    Download Collect
  • Image Deblurring with a Class-Specific Prior

    Anwar, Saeed   Huynh, Cong Phuoc   Porikli, Fatih  

    A fundamental problem in image deblurring is to recover reliably distinct spatial frequencies that have been suppressed by the blur kernel. To tackle this issue, existing image deblurring techniques often rely on generic image priors such as the sparsity of salient features including image gradients and edges. However, these priors only help recover part of the frequency spectrum, such as the frequencies near the high-end. To this end, we pose the following specific questions: (i) Does any image class information offer an advantage over existing generic priors for image quality restoration? (ii) If a class-specific prior exists, how should it be encoded into a deblurring framework to recover attenuated image frequencies? Throughout this work, we devise a class-specific prior based on the band-pass filter responses and incorporate it into a deblurring strategy. More specifically, we show that the subspace of band-pass filtered images and their intensity distributions serve as useful priors for recovering image frequencies that are difficult to recover by generic image priors. We demonstrate that our image deblurring framework, when equipped with the above priors, significantly outperforms many state-of-the-art methods using generic image priors or class-specific exemplars.
    Download Collect
  • Real-Time Deep Tracking via Corrective Domain Adaptation

    Li, Hanxi   Wang, Xinyu   Shen, Fumin   Li, Yi   Porikli, Fatih   Wang, Mingwen  

    Visual tracking is one of the fundamental problems in computer vision. Recently, some deep-learning-based tracking algorithms have been illustrating record-breaking performances. However, due to the high complexity of neural networks, most deep trackers suffer from low tracking speed and are, thus, impractical in many real-world applications. Some recently proposed deep trackers with smaller network structure achieve high efficiency while at the cost of significant decrease in precision. In this paper, we propose to transfer the deep feature, which is learned originally for image classification to the visual tracking domain. The domain adaptation is achieved via some "grafted" auxiliary networks, which are trained by regressing the object location in tracking frames. This adaptation improves the tracking performance significantly both on accuracy and efficiency. The yielded deep tracker is real time and also illustrates the state-of-the-art accuracies in the experiment involving two well-adopted benchmarks with more than 100 test videos. Furthermore, the adaptation is also naturally used for introducing the objectness concept into visual tracking. This removes a long-standing target ambiguity in visual tracking tasks, and we illustrate the empirical superiority of the more well-defined task.
    Download Collect
  • Indoor Scene Understanding in 2.5/3D for Autonomous Agents: A Survey

    Naseer, Muzammal   Khan, Salman   Porikli, Fatih  

    With the availability of low-cost and compact 2.5/3D visual sensing devices, computer vision community is experiencing a growing interest in visual scene understanding of indoor environments. This survey paper provides a comprehensive background to this research topic. We begin with a historical perspective, followed by a popular 3D data representation and a comparative analysis of available datasets. Before delving into the application specific details, this survey provides a succinct introduction to the core technologies that are the underlying methods extensively used in this paper. Afterwards, we review the developed techniques according to a taxonomy based on the scene understanding tasks. This covers holistic indoor scene understanding as well as subtasks, such as scene classification, object detection, pose estimation, semantic segmentation, 3D reconstruction, saliency detection, physics-based reasoning, and affordance prediction. Later on, we summarize the performance metrics used for evaluation in different tasks and a quantitative comparison among the recent state-of-the-art techniques. We conclude this review with the current challenges and an outlook toward the open research problems requiring further investigation.
    Download Collect
  • Saliency Integration: An Arbitrator Model

    Xu, Yingyue   Hong, Xiaopeng   Porikli, Fatih   Liu, Xin   Chen, Jie   Zhao, Guoying  

    Saliency integration has attracted much attention on unifying saliency maps from multiple saliency models. Previous offline integration methods usually face two challenges: 1) if most of the candidate saliency models misjudge the saliency on an image, the integration result will lean heavily on those inferior candidate models; and 2) an unawareness of the ground truth saliency labels brings difficulty in estimating the expertise of each candidate model. To address these problems, in this paper, we propose an arbitrator model (AM) for saliency integration. First, we incorporate the consensus of multiple saliency models and the external knowledge into a reference map to effectively rectify the misleading by candidate models. Second, our quest for ways of estimating the expertise of the saliency models without ground truth labels gives rise to two distinct online model-expertise estimation methods. Finally, we derive a Bayesian integration framework to reconcile the saliency models of varying expertise and the reference map. To extensively evaluate the proposed AM model, we test 27 state-of-the-art saliency models, covering both traditional and deep learning ones, on various combinations over four datasets. The evaluation results show that the AM model improves the performance substantially compared to the existing state-of-the-art integration methods, regardless of the chosen candidate saliency models.
    Download Collect
  • Identity-Preserving Face Recovery from Stylized Portraits

    Shiri, Fatemeh   Yu, Xin   Porikli, Fatih   Hartley, Richard   Koniusz, Piotr  

    Given an artistic portrait, recovering the latent photorealistic face that preserves the subject's identity is challenging because the facial details are often distorted or fully lost in artistic portraits. We develop an Identity-preserving Face Recovery from Portraits method that utilizes a Style Removal network (SRN) and a Discriminative Network (DN). Our SRN, composed of an autoencoder with residual block-embedded skip connections, is designed to transfer feature maps of stylized images to the feature maps of the corresponding photorealistic faces. Owing to the Spatial Transformer Network, SRN automatically compensates for misalignments of stylized portraits to output aligned realistic face images. To ensure the identity preservation, we promote the recovered and ground truth faces to share similar visual features via a distance measure which compares features of recovered and ground truth faces extracted from a pre-trained FaceNet network. DN has multiple convolutional and fully-connected layers, and its role is to enforce recovered faces to be similar to authentic faces. Thus, we can recover high-quality photorealistic faces from unaligned portraits while preserving the identity of the face in an image. By conducting extensive evaluations on a large-scale synthesized dataset and a hand-drawn sketch dataset, we demonstrate that our method achieves superior face recovery and attains state-of-the-art results. In addition, our method can recover photorealistic faces from unseen stylized portraits, artistic paintings, and hand-drawn sketches.
    Download Collect
  • METHOD FOR SEGMENTING DATA

    A method segments n-dimensional by first determining prior information from the data. A fidelity term is determined from the prior information, and the data are represented as a graph. A graph Laplacian is determined from the graph from the graph, and a Laplacian spectrum constraint is determined from the graph Laplacian. Then, an objective function is minimized according to the fidelity term and the Laplacian spectrum constraint to identify a segment of target points in the data.
    Download Collect
  • Quadruplet Network With One-Shot Learning for Fast Visual Object Tracking

    Dong, Xingping   Shen, Jianbing   Wu, Dongming   Guo, Kan   Jin, Xiaogang   Porikli, Fatih  

    In the same vein of discriminative one-shot learning, Siamese networks allow recognizing an object from a single exemplar with the same class label. However, they do not take advantage of the underlying structure of the data and the relationship among the multitude of samples as they only rely on the pairs of instances for training. In this paper, we propose a new quadruplet deep network to examine the potential connections among the training instances, aiming to achieve a more powerful representation. We design a shared network with four branches that receive a multi-tuple of instances as inputs and are connected by a novel loss function consisting of pair loss and triplet loss. According to the similarity metric, we select the most similar and the most dissimilar instances as the positive and negative inputs of triplet loss from each multi-tuple. We show that this scheme improves the training performance. Furthermore, we introduce a new weight layer to automatically select suitable combination weights, which will avoid the conflict between triplet and pair loss leading to worse performance. We evaluate our quadruplet framework by model-free tracking-by-detection of objects from a single initial exemplar in several visual object tracking benchmarks. Our extensive experimental analysis demonstrates that our tracker achieves superior performance with a real-time processing speed of 78 frames/s. Our source code is available.
    Download Collect
  • Learning Padless Correlation Filters for Boundary-Effect Free Tracking

    Li, Dongdong   Wen, Gongjian   Kuai, Yangliu   Porikli, Fatih  

    Recently, discriminative correlation filters (DCFs) have achieved enormous popularity in the tracking community due to high accuracy and beyond real-time speed. Among different DCF variants, spatially regularized discriminative correlation filters (SRDCFs) demonstrate excellent performance in suppressing boundary effects induced from circularly shifted training samples. However, SRDCF have two drawbacks which may be the bottlenecks for further performance improvement. First, SRDCF needs to construct an element-wise regularization weight map which can lead to poor tracking performance without careful tunning. Second, SRDCF does not guarantee zero correlation filter values outside the target bounding box. These small but nonzero filter values away from the filter center hardly contribute to target location but induce boundary effects. To tackle these drawbacks, we revisit the standard SRDCF formulation and introduce padless correlation filters (PCFs) which totally remove boundary effects. Compared with SRDCF that penalizes filter values with spatial regularization weights, PCF directly guarantee zero filter values outside the target hounding box with a binary mask. Experimental results on the OTB2013, OTB2015 and VOT2016 data sets demonstrate that PCF achieves real-time frame-rates and favorable tracking performance compared with state-of-the-art trackers.
    Download Collect
  • A Cascaded Convolutional Neural Network for Single Image Dehazing

    Li, Chongyi   Guo, Jichang   Porikli, Fatih   Fu, Huazhu   Pang, Yanwei  

    Images captured under outdoor scenes usually suffer from low contrast and limited visibility due to suspended atmospheric particles, which directly affects the quality of photographs. Despite numerous image dehazing methods have been proposed, effective hazy image restoration remains a challenging problem. Existing learning-based methods usually predict the medium transmission by convolutional neural networks (CNNs), but ignore the key global atmospheric light. Different from previous learning-based methods, we propose a flexible cascaded CNN for single hazy image restoration, which considers the medium transmission and global atmospheric light jointly by two task-driven subnetworks. Specifically, the medium transmission estimation subnetwork is inspired by the densely connected CNN while the global atmospheric light estimation subnetwork is a light-weight CNN. Besides, these two subnetworks are cascaded by sharing the common features. Finally, with the estimated model parameters, the haze free image is obtained by the atmospheric scattering model inversion, which achieves more accurate and effective restoration performance. Qualitatively and quantitatively experimental results on the synthetic and real-world hazy images demonstrate that the proposed method effectively removes haze from such images, and outperforms several state-of-the-art dehazing methods.
    Download Collect
  • Feature Mask Network for Person Re-identification

    Ding, Guodong   Khan, Salman   Tang, Zhenmin   Porikli, Fatih  

    Download Collect
  • Identity-Preserving Face Recovery from Stylized Portraits

    Shiri, Fatemeh   Yu, Xin   Porikli, Fatih   Hartley, Richard   Koniusz, Piotr  

    Download Collect
  • Indoor Scene Understanding in 2.5/3D for Autonomous Agents: A Survey

    Naseer, Muzammal   Khan, Salman   Porikli, Fatih  

    Download Collect
  • Video Saliency Detection via Sparsity-Based Reconstruction and Propagation

    Cong, Runmin   Lei, Jianjun   Fu, Huazhu   Porikli, Fatih   Huang, Qingming   Hou, Chunping  

    Video saliency detection aims to continuously discover the motion-related salient objects from the video sequences. Since it needs to consider the spatial and temporal constraints jointly, video saliency detection is more challenging than image saliency detection. In this paper, we propose a new method to detect the salient objects in video based on sparse reconstruction and propagation. With the assistance of novel static and motion priors, a single-frame saliency model is first designed to represent the spatial saliency in each individual frame via the sparsity-based reconstruction. Then, through a progressive sparsity-based propagation, the sequential correspondence in the temporal space is captured to produce the inter-frame saliency map. Finally, these two maps are incorporated into a global optimization model to achieve spatio-temporal smoothness and global consistency of the salient object in the whole video. The experiments on three large-scale video saliency datasets demonstrate that the proposed method outperforms the state-of-the-art algorithms both qualitatively and quantitatively.
    Download Collect
1 2 3 4 5 6 7 8

Contact

If you have any feedback, Please follow the official account to submit feedback.

Turn on your phone and scan

Submit Feedback