■ AI-assisted poor quality picture detection & disk cleanup

Cameras built in modern mobile phones are very good, but not all pictures shot are worth keeping. Bad lighting or ambient conditions, fingers obscuring the camera lens — or simply your kid playing with your phone taking photos at random — lead to many poor quality pictures that just take up space in your phone's memory card. You can manually examine your photos one by one, deleting the bad ones, but that is very tedious when you have tons of pictures shot.

So when the other day the missus complained that her hard disk was full, I started thinking that there must be a program to automatically detect and cleanup blurred or otherwise "bad" pictures (shaken, too dark or bright, out of focus etc) — but apparently there is no such solution yet (or I couldn't find it on google). Immediately it occurred to me that there is a busine$$ opportunity for a picture cleanup program, as almost anyone can understand the need to get rid of poor quality photos.

I tried a few ideas the past couple of weeks, and as it turns out automatic blur detection in pictures isn't an easy task. I tried to get the computer to teach itself how to discover low quality pictures using artificial intelligence algorithms. Under supervised learning, I presented the computer with a series of good and bad pictures, and let it figure out how to tell them apart. The end result isn't fit for commercialization yet, but it is good enough for a blog article <g>.

It is a small command line (console) tool that works on a picture (or a folder full of pictures) and assigns a quality index to it. If this number is high (>100) then the photo is probably quite low quality; the bigger the number, the bigger the confidence of the prediction. It works by randomly sampling portions of the picture, so quality scores will vary slightly each time you use it. It isn't perfect: it will mis-classify good pictures as blurred and vice-versa, but it will flag suspect pictures quickly and effortlessly. If you agree with its recommendations you can remove (delete) the bad photos to free up space.

Click to download blur detection tool (40KB, build 1.001)

No installation required, just unzip and place the executable IMGCHK.EXE somewhere it can be easily found e.g. in C:\WINDOWS folder.

There are several ways you can use the blur detection tool. Open a console window (or use xplorer² console) and type:

IMGCHK "c:\path to\pic folder" > report.txt

This will examine all JPG pictures in the said folder, and save the assessment results in REPORT.TXT. You can then import this text file in Excel and sort by the blur number. Bigger numbers indicate high probability of blurred images. You can examine the top scorers in a file manager and decide whether to keep them or not.

If you have xplorer² ultimate edition, things are easier. You can setup the programmable column to call the blur detection tool and either see the quality number next to each file, or translate it in a bad/good indicator, defining the programmable column as such:

${Extension}=jpg & int(SYSTEM("imgchk.exe " + ${name})) > 100

This rule will check for a JPG file extension (picture) and will call IMGCHK for the current file, and if the returned number is greater than 100 it will show "1" (blurred), otherwise it will show 0 — for photos deemed good enough. Sorting by this rule will bunch together all low quality pictures, for further examination. Select the really bad ones and delete them. Thus you avoid going through all the pictures one by one, and concentrate on those that are flagged by the tool.

This programmable column can also be used as a search rule to find bad pictures in subfolders

Figure 1. Finding low quality pictures using xplorer² programmable column

Please try it out and tell me if it works for you. It will do many mistakes, but as it doesn't delete any pictures automatically, it is safe to try it out. During my tests, the error rate was 18% (i.e. 18 out of 100 pictures deemed bad were actually good). Blur is hard to quantify, it is even an art form for some photographers, but some pictures are rubbish by any standards. The AI classifier was trained on mobile phone pictures. Proper digital cameras (without a phone) are usually idiot proof so you can't take any unfocused pictures even if you try!

xplorer² version 4.2 has a dedicated file detail column called Blur that incorporates this tool. It is much faster and more convenient to use, just select the Blur column using <ALT+K> keys and arrange the pictures by this column to see those that are badly blurred.

Under the hood: photo quality assessment using artificial intelligence

Humans can recognize a substandard photo immediately by just looking at it, but teaching a computer what is so bad about "blur" isn't straightforward. I have read about people analyzing pictures in detail (frequency response using openCV computer vision library), and it is also an open research topic. I didn't want this to turn into a PhD research, so instead of exact theories I went for the brute force artificial intelligence approach: show the computer samples of what is "good" and what is "bad" picture, and let it figure out the classification itself.

Without going into much detail, I am sampling randomly a fixed percentage (15%) of the picture in small squares, and considering the standard deviation of color brightness there, then use these numbers as input for teaching the computer. It isn't rocket science, just considering varying "features" just like the convolution filters in deep learning. Mind you, the neural network I use is just a toy of 30x15x1 layers, trained with a simple backpropagation algorithm as in other AI projects I worked on. This not-so-deep neural network learnt to be 80% accurate on the training set of approximately 200 mobile phone pictures of mixed quality.

Teaching a neural network is essentially solving a numerical optimization problem that minimizes the prediction errors of the network over the available good and bad sample pictures. But backpropagation is not designed to be mathematically rigorous, and never really finds the "best" network for the task. So for a change I also tried a different AI supervised training method called Support Vector Machine (SVM). SVM is designed exactly for identifying 2 categories, yes/no or good/blur, and it is mathematically more rigorous (optimal). As it turns out you don't need a QP solver or anything fancy to solve a SVM problem; using a method called simplified SMO it only takes about 200 lines of C++ code to solve it quickly!

The IMGCHK tool (download link above) is using the trained linear SVM separating hyperplane to categorize blurred pictures, because SVM proved to be better generalizing for unseen pictures and correctly identifying their quality, compared to the equivalent 30-input neural network. On a validation picture set (1000 photos) it was 18% wrong, so there is room for improvement, but clearly finding 82% of bad pictures automatically isn't a bad start!

I also tried various SVM "kernels" but didn't observe any marked improvement in prediction quality. Kernels cause a 1000-fold increase in SVM complexity (memory requirements mainly)

Presently the tool misclassifies pictures with lots of sky and sea (or other large uniform color areas) as "bad". Why don't you try it on your mobile phone pictures and see how it goes? I would love to hear your feedback!