Skip to content

Latest commit

 

History

History
30 lines (21 loc) · 1.82 KB

perceiving_humans.md

File metadata and controls

30 lines (21 loc) · 1.82 KB

September 2021

tl;dr: Improved version of Monoloco (monoloco++) and application in social distancing.

Overall impression

This paper builds upon the previous work of MonoLoco.

The low-dimensional representation of humans give it more generalization. It escapes the image domain and reduce the input dimensionality. This makes the skeleton-baesd network extremely fast to train (2 min on a single 1080Ti GPU card).

Monoloco++ beats mono3D baselines (such as SMOKE). They have roughly the same performance on easy cases, but much better performance in medium/hard cases. And monoloco++ has higher recall than SMOKE (39% --> 70%).

Key ideas

  • Difference between monoloco++ and MonoLoco
    • Multi-task prediction of combining 3D localization, orientation and bbox. monoloco only predicts the 3D localization (distance and its uncertainty).
    • Use of spherical coordinate system (why use radial distance instead of depth?)
  • Social interaction and distancing. People tend to arrange themselves spontaneously in a specific configuration called F-formation.
    • o-space (free space between a group of people)
    • p-space (donut space enclosed by concentric rings covering people)
    • r-space (beyond the concentric rings)
    • The presence of F-formation is based on finding that two people are within a certain distance, their o-space is not intruded, and looking inward the o-space.

Technical details

  • The ill-posed problem of 3D localization of human, the task error roughly scales linearly, is about 1 meter in 20 meters.
  • Laplacian loss (with aleatoric uncertainty) works roughly the same as L1 loss.

Notes

  • Questions and notes on how to improve/revise the current work