Combining local appearance and holistic view:
Dual-source deep neural networks for human pose estimation Xiaochuan Fan, Kang Zheng, Yuewei Lin, Song Wang CVPR 2015

Idea

Most of the previous works on human pose estimation are based on the two-layer part-based model. The first layer focuses on local (body) part appearance and the second layer imposes the contextual relations between local parts. These pose estimation methods using part-based models are usually sensitive to noise and the graphical model lacks expressiveness to model complex human poses. Furthermore, most of these methods search for each local part independently and the local appearance may not be sufficiently discriminative for identifying each local part reliably.

In this work, we propose a dual-source CNN (DS-CNN) based method for human pose estimation without using any explicit graphical model. This proposed method utilizes both the local part appearance in image patches (part patch) and the holistic view of the full human body (body patch) for more accurate human pose estimation. By adding a binary mask to the body patch, we novelly integrate global and local information to one input (body patch). By taking both the local patch and the body patch as inputs, the proposed DS-CNN performs a unified learning to achieve both joint detection, which determines whether an object proposal contains a body joint, and joint localization, which finds the exact location of the joint in the object proposal.


Paper

[PDF]


About Performance

In the paper, we evaluated our work using the PCP metric which considers a part prediction correct if its endpoints are closer to their ground-truth locations than half the length of the limb on average over the two endpoints. Moreover, multiple candidate predictions from models with different input dimensions were also used to match the ground-truth. However, some previous methods were evaluated using single candidate and stricter PCP. In order to compare our work with state-of-the-art methods fairly, we provide predictions and the comparison of strict PCP results and PDJ results using single candidate on the Leeds Sport Pose (LSP) dataset with Person-Centric (PC) annotations. Thank Pishchulin et al. for their evaluation framework.

Strict PCP (our predictions)

Method Torso Upper leg Lower leg Upper arm Forearm Head Full body
Wang&Li, CVPR'13 87.5 56.0 55.8 43.1 32.1 79.1 54.1
Pishchulin et al., ICCV'13 88.7 63.6 58.4 46.0 35.2 85.1 58.0
Tompson et al., NIPS'14 90.3 70.4 61.1 63.0 51.2 83.7 66.6
Chen&Yuille, NIPS'14 96.0 77.2 72.2 69.7 58.1 85.6 73.6
Ours 95.4 77.7 69.8 62.8 49.1 86.6 70.1

PDJ curve


PDJ value (normalized distance equal to 0.2) (our predictions)

Method Head Shoulder Elbow Wrist Hip Knee Ankle Full body
Wang&Li, CVPR'13 84.7 57.1 43.7 36.7 56.7 52.4 50.8 54.6
Pishchulin et al. 87.2 56.7 46.7 38.0 61.0 57.5 52.7 57.1
Tompson et al., NIPS'14 90.6 79.2 67.9 63.4 69.5 71.0 64.2 72.0
Chen&Yuille, NIPS'14 91.8 78.2 71.8 65.5 73.3 70.2 63.4 73.4
Ours 92.4 75.2 65.3 64.0 75.7 68.3 70.4 73.0

Code

[DS-CNN data layer] [test code] [trained model]


Must See Places In Paris