Thursday, March 1, 2018

signal analysis - Is deep learning killing image processing/computer vision?


I'm looking forward to enroll in an MSc in Signal and Image processing, or maybe Computer Vision (I have not decided yet), and this question emerged.


My concern is, since deep learning doesn't need feature extraction and almost no input pre-processing, is it killing image processing (or signal processing in general)?



I'm not an expert in deep learning, but it seems to work very well in recognition and classification tasks taking images directly instead of a feature vector like other techniques.


Is there any case in which a traditional feature extraction + classification approach would be better, making use of image processing techniques, or is this dying because of deep learning?



Answer



On the top of this answer, you can see a section of Updated links, where artificial intelligence, deep learning or and database machine learning progressively step of the grounds of traditional signal processing/image analysis/computer vision. Below, variations on the original answer.


For a short version: successes of convolutional neural networks and deep learning look like a sort of Galilean revolution. For a practical point of view, classical signal processing or computer vision are dead... provided that you have enough labeled data, care little about evident classification failures (deep flaws), have infinite energy to run tests without thinking about the carbon footprint, and don't bother rational explanations. For the others, this made us rethink about all what we did before: feature extraction, optimization (cf. my colleague J.-C. Pesquet work on Deep Neural Network Structures Solving Variational Inequalities), invariance, quantification, etc. And really interesting research is emerging from that, hopefully catching up with firmly grounded principles and similar performance.


Updated links:




We introduce natural adversarial examples -- real-world, unmodified, and naturally occurring examples that cause classifier accuracy to significantly degrade. We curate 7,500 natural adversarial examples and release them in an ImageNet classifier test set that we call ImageNet-A. This dataset serves as a new way to measure classifier robustness. Like l_p adversarial examples, ImageNet-A examples successfully transfer to unseen or black-box classifiers. For example, on ImageNet-A a DenseNet-121 obtains around 2% accuracy, an accuracy drop of approximately 90%. Recovering this accuracy is not simple because ImageNet-A examples exploit deep flaws in current classifiers including their over-reliance on color, texture, and background cues. We observe that popular training techniques for improving robustness have little effect, but we show that some architectural changes can enhance robustness to natural adversarial examples. Future research is required to enable robust generalization to this hard ImageNet test set.





Deep learning references "stepping" on standard signal/image processing can be found at the bottom. Michael Elad just wrote Deep, Deep Trouble: Deep Learning’s Impact on Image Processing, Mathematics, and Humanity (SIAM News, 2017/05), excerpt:




Then neural networks suddenly came back, and with a vengeance.



This tribune is of interest, as it shows a shift from traditional "image processing", trying to model/understand the data, to a realm of correctness, without so much insight.


This domain is evolving quite fast. This does not mean it evolves in some intentional or constant direction. Neither right nor wrong. But this morning, I heard the following saying (or is it a joke?):



a bad algorithm with a huge set of data can do better than a smart algorithm with pauce data.



Here was my very short try: deep learning may provide state-of-the-art results, but one does not always understand why, and part of our scientist job remains on explaining why things work, what is the content of a piece of data, etc.


Deep learning requires (huge) well-tagged databases. Any time you do craftwork on single or singular images (i. e. without a huge database behind), especially in places unlikely to yield "free user-based tagged images" (in the complementary set of the set "funny cats playing games and faces"), you can stick to traditional image processing for a while, and for profit. A recent tweet summarizes that:




(lots of) labeled data (with no missing vars) requirement is a deal breaker (& unnecessary) for many domains



If they are being killed (which I doubt at a short term notice), they are not dead yet. So any skill you acquire in signal processing, image analysis, computer vision will help you in the future. This is for instance discussed in the blog post: Have We Forgotten about Geometry in Computer Vision? by Alex Kendall:



Deep learning has revolutionised computer vision. Today, there are not many problems where the best performing solution is not based on an end-to-end deep learning model. In particular, convolutional neural networks are popular as they tend to work fairly well out of the box. However, these models are largely big black-boxes. There are a lot of things we don’t understand about them.



A concrete example can be the following: a couple of very dark (eg surveillance) images from the same location, needing to evaluate if one of them contains a specific change that should be detected, is potentially a matter of traditional image processing, more than Deep Learning (as of today).


On the other side, as successful as Deep Learning is on a large scale, it can lead to misclassification of a small sets of data, which might be harmless "in average" for some applications. Two images that just slightly differ to the human eye could be classified differently via DL. Or random images could be set to a specific class. See for instance Deep neural networks are easily fooled: High confidence predictions for unrecognizable images (Nguyen A, Yosinski J, Clune J. Proc. Computer Vision and Pattern Recognition 2015), or Does Deep Learning Have Deep Flaws?, on adversarial negatives:



The network may misclassify an image after the researchers applied a certain imperceptible perturbation. The perturbations are found by adjusting the pixel values to maximize the prediction error.




With all due respect to "Deep Learning", think about "mass production responding to a registered, known, mass-validable or expected behaviour" versus "singular piece of craft". None is better (yet) in a single index scale. Both may have to coexist for a while.


However, deep learning pervades many novel areas, as described in references below.



Luckily, some folks are trying to find mathematical rationale behind deep learning, an example of which are scattering networks or transforms proposed by Stéphane Mallat and co-authors, see ENS site for scattering. Harmonic analysis and non-linear operators, Lipschitz functions, translation/rotation invariance, better for the average signal processing person. See for instance Understanding Deep Convolutional Networks.


No comments:

Post a Comment

digital communications - Understanding the Matched Filter

I have a question about matched filtering. Does the matched filter maximise the SNR at the moment of decision only? As far as I understand, ...