Sunday, March 25, 2018

computer vision - Pedestrian counting algorithm


Currently I am developing a pedestrian counter project (using OpenCV+QT on Linux). My idea about the approach is:



  1. Capture Frames

  2. Do Background Subtraction

  3. clear noises (erode, dilate)


  4. find blobs (cvBlobslib) - foreground objects

  5. For each blob, set ROI and search for pedestrians (LBP with detectMultiScale) in these blobs (for better performance)

  6. For each found pedestrian do a nested upper body search(Not sure) (better reliability)

  7. If same pedestrian is found on continuing frames (3-4 frames maybe) - add that area to camshift and track - mark as pedestrian

  8. Exclude camshift tracked areas from blob detection for next frames

  9. If a pedestrian crosses a line increment number


I want to check if I am on the right track. Do you have any suggestions on how to improve my approach? If somebody worked on something similar, I would appreciate any useful tips, resources (and criticisms) on this problem.



Answer



I can see a number of possible problems with this approach. I speak from my own experience here from improving a pedestrian counting system with a very similar approach, so I don't mean to be discouraging. On the contrary, I'd like to warn you of possible hurdles you may have to overcome in order to build an accurate and robust system.



Firstly, background substraction assumes that objects of interest will always be moving, and objects you aren't interested in counting will remain completely still. Surely enough, this may be the case in your scenario, but it still is a very limiting assumption. I've also found background substraction to be very sensitive to changes in illumination (I agree with geometrikal).


Be wary of making the assumption that one blob = one person, even if you think that your environment is well controlled. It happened way too often that blobs corresponding to people went undetected because they weren't moving or they were too small, so they were deleted by erosion or by some thresholding criteria (and believe me, you don't want to get into the "tune thresholds until everything works" trap. It doesn't work ;) ). It can also happen that a single blob corresponds to two people walking together, or a single person carrying some sort of luggage. Or a dog. So don't make clever assumptions about blobs.


Fortunately, since you do mention that you are using LBP's for person detection, I think you are in the right track of not making the mistakes in the paragraph above. I can't comment on the effectiveness of LBP's in particular, though. I've also read that HOG (histogram of gradients) are a state of the art method in people detection, see Histograms of Oriented Gradients for Human Detection.


My last gripe is related with using Camshift. It is based in color histograms, so, by itself, it works nicely when tracking a single object that is easy to distinguish by color, as long as the tracking window is big enough and there are no occlusions or abrupt changes. But as soon as you have to track multiple targets which may have very similar color descriptions and which will move very near to one another, you simply can't do without an algorithm that somehow allows you to maintain multiple hypothesis. This may be a particle filter or a framework such as MCMCDA (Markov Chain Monte Carlo Data Association, see Markov Chain Monte Carlo Data Association for Multiple-Target Tracking). My experience with using Meanshift alone when tracking multiple objects is everything that shouldn't happen with tracking: losing track, confusing targets, fixating in the background, etc. Read a bit about multiple object tracking and data association problems, this might be at the heart of counting multiple people after all (I say "might be" because your goal is counting not tracking, so I don't completely discard the possibility of some clever approach that counts without tracking...)


My last piece of advice is: there is only so much you can do with a given approach, and you will need fancier stuff to achieve better performance (so I disagree with user36624 in this regard). This may imply changing a piece of your algorithm by something more powerful, or changing the architecture altogether. Of course, you have to know which fancy stuff is really useful for you. There are publications that attempt to solve the problem in a principled way, while others simply come up with an algorithm for a given data set and expect you to train a classifier that isn't really suited to the problem at hand, while requiring you to adjust a few thresholds too. People counting is ongoing research, so don't expect things to come easily. Do make an effort to learn things that are slightly beyond your ability, and then do it again and again...


I acknowledge that I haven't offered any solutions and instead have only pointed out flaws in your approach (which all come from my own experience). For inspiration, I recommend you read some recent research, for example Stable Multi-Target Tracking in Real-Time Surveillance Video. Good luck!


No comments:

Post a Comment

digital communications - Understanding the Matched Filter

I have a question about matched filtering. Does the matched filter maximise the SNR at the moment of decision only? As far as I understand, ...