AugurSense: Next Generation Multi-Camera Human Movement Analytics — Part 1

Imesha Sudasingha
6 min readJan 7, 2020

During the final year of my bachelors degree at University of Moratuwa, Sri Lanka, I worked on a novel research along with 3 of my colleagues ( Madhawa Vidanapathirana, Pasindu Kanchana and Jayan Vidanapathirana) under the supervision of Dr. Indika Perera. The research was titled “Human Analytics Using Multiple Non-Invasive Video Feeds” and we were focusing on:

A real-time method to analyze time synchronized video feeds obtained from multiple fixed cameras in a monitored environment to generate human movement analytics with respect to ground plane.

A map view of resultant map generated by AugurSense by processing two time synchronized videos from PETS2009 tracking dataset. Paths taken by individuals are shown shown in the map. Circle represents the current location and the small line drawn within the circle represents the head directions

We have open sourced the outcomes of our research. The implementation is available at https://github.com/eduze/AugurSense. This article gives a brief idea about the implementation and the importance of this implementation. Furthermore, this article explains the motivation behind AugurSense (a.k.a. CRAMP), outcomes of the research and the architecture of the implemented system. At the end of this article, you will gain a high level understanding on how/what human movement analytics are generated using multiple camera feeds by AugurSense.

Why Human Analytics on Video Feeds?

This was a less explored area by that time (and still is) and the outcomes of this kind of a research can be applied to lots of areas. For example, this kind of a system will be valuable for a business to get a better understanding of their customers and to target specific groups (say age groups, genders). As most of the organizations already have surveillance systems installed, their existing video feeds can be used without an additional infrastructure cost. Apart from that, these statistics can be used later for designing marketing strategies. In short, this can be considered as the first step of the “ Google analytics in real life “.

Possible applications of the target research

As shown above, applications of this kind of a research outcome are immense. We will be discussing some of those applications and may be about an implementation for customer service quality measurement in a future article. For now, let’s focus on running our prototype implementation.

Analytics and Features

A heat map generated by an experiment run on PETS2009 dataset.
  • Coordinate a system of cameras video feeds to detect, identify and track positions, postures and orientation of people observed by a camera network.
    - Human detection and noise filtering.
    - Human pose estimation including head orientation estimation and sitting/standing estimation.
    - Human position mapping to top down view.
    - Human movement tracking with smoothing of results.
  • Aggregate analytical data from multiple cameras into a unified global view (top down view on floor plan) and provide real-time and time-bound statistical maps.
    - Instantaneous View Maps with
    - Human position markers.
    - Movement trails indication.
    - Head direction annotation.
    - Standing/sitting postures annotation.
    - Full body images annotation.
  • Zone based analytics including
    - Average people count in a zone classified as standing/sitting
    - Average time spent by a person in a zone
    - Breakdown of outbound traffic movement fractions to other zones
    - Breakdown of inbound traffic movements from other zones.
  • Statistical Heatmaps
    - Human density Heatmaps
    - Stop point / velocity bound heat maps
    - Traffic flow direction maps.
  • Uniquely identify humans to generate accurate analytical data. (Short term re-identification)
    - Identification of probable previous detections of a person.
    - Long distance routes per person in the monitored environment.
  • Generating quantitative statistics based on analytics.

The research findings are presented through the system AugurSense, which is a generic human analytics solution, which can later be extended for specific real-time human analytics requirements.

For years, companies knew how their web visitors behave thanks to Google Analytics. Through our solution, we provide a platform to deliver similar analytics on physical visitors to shopping complexes or offices of companies.

Publications

Following research publications were produced as results of this project.

Architecture

Component organization in AugurSense

AugurSense consists of 3 major components:

Analytics Engine (CRAMP Accumulator)

This is the central server responsible for aggregating raw data/information obtained by processing individual video feeds. For example, each video feed’s human locations will be aggregated to a global map within this module using human re-identification and tracking techniques.

dist/ and core/ directories in the project contains the source code for the Accumulator. It is written in Java with the help of Spring Framework which allows us the ability plug different implementations for person re-identification, map processing and storing analytics.

Technologies used:

  • Java
  • Spring Framework
  • Tensorflow
  • Tensorflow Java
  • REST
  • MySQL
  • Since we are tracking people across cameras, we need a human re-identification technique. Following two implementations can be used out of the box.
  • Open Re-ID — Implementation of our paper “Open Set Person Re-Identification Framework on Closed Set Re-Id Systems
  • TriNet Re-ID

Camera Processing Unit (CRAMP Sense)

This is responsible for processing video feeds and reporting the obtained instantaneous analytics to the central server (Analytics Engine). One camera processing unit can be deployed per one computer and depending on the GPU power of the machine, parallel video feeds can be processed by one machine. For example, on Nvidia GTX 965 GPU card, up to 5 parallel feeds can be processed assuming that each feed is processed at 1 FPS.

Snapshots of a person tracked in the global map (map generated after collecting person locations from multiple processed feeds). His track is shown in yellow color in the map.

sense/ directory contains the source code related to camera processing unit.

Technologies used:

  • Python 3
  • OpenCV
  • OpenPose, Tensorflow Object Detection API, YOLO V3 and etc for human detection (One of these methods required to be chosen before running. Default is OpenPose). Using a Pose Estimation library is recommended to get analytics on posture (stand/sitting), head direction and accurate location tracking). Custom implementations can be written using other human detection/pose estimation frameworks.

Dashboard

This is the web based UI provided to view analytics generated/obtained by processing video feeds. ngapp/ directory contains the Angular CLI project corresponding to the dashboard.

Technologies used:

The UI for configuring the mapping between camera views to floor plan.

Thank you! To be continued…

Thank you reading this article. Hope you got an understanding on how AugurSense operates. A getting started guide will be posted soon explaining how to setup AugurSense with multiple cameras to generate human movement analytics of a desired locations using video feeds of multiple CCTV cameras.

Until then, you can use the README in the github repository to get our project up and running.

Originally published at http://loneidealist.wordpress.com on January 7, 2020.

--

--

Imesha Sudasingha

Co-Founder @HighFlyer | Ex @WSO2 | Ex @BallerinaLang | Opensource | Member @TheASF