Contextual Note: Production of cleaned footfall estimates for safeguarded dataset

The probe request detected from a device does not have a one-to-one correspondence with an individual so the initial MAC address detected at each location must go through a cleaning and validation procedure as detailed below.
a) Input: Hashed probe requests summarised for every five-minute interval. The field level metadata for this data is given in the table overleaf – section 1.
b) Count Probe requests: We separately count the total number of probe requests which have randomised and non-randomised mac addresses.
c) Count MAC addresses: We then remove all the MAC addresses which are repeating within each five-minute interval and count just the unique MAC addresses separately for probe requests which are randomised and non-randomised. For example, 15 probe requests sent from same MAC address within same five-minute interval would be counted as 1.
d) Remove long dwellers: We then remove MAC addresses detected during consecutive intervals within half hour period. This removes the long dwellers from being counted repeatedly over different five-minute intervals to give us the filtered counts. For example, a printer or the mobile devices owned by store employees are included only once even if they were present over multiple consecutive intervals. This is done for randomised and non-randomised probe requests separately.
e) Adjusting local count: We then look at the ratio between the filtered count and the corresponding total number within the non-randomised probe requests and adjust the randomised counts accordingly for each five-minute interval. We then add the filtered non-randomised count and with the adjusted randomised count to arrive at the final estimated counts.
f) Impute missing values: Finally, we remove all gaps in the data which are less than 30 minutes long. We employ Kalman Smoothing on structural time series model to impute the missing data from the existing ones. The implementation of this imputation methodology is detailed at https://cran.r-project.org/web/packages/imputeTS/index.html
g) Output: The output is cleaned footfall estimates for each five-minute interval at each location. The field level metadata for this is given in the table overleaf – section 2.

You are here

Primary tabs

Resources