Imagine you could shrink your desktop computer to netbook size and take it to the cafeteria to do your surfing. Far fetched you say? But we see this kind of magic every day where our pictures, music and video shrink in size without any apparent loss of quality. Data compression algorithms underwrite the whole digital and communications revolution we live in, but how does it work? It isn't mathematics in the strict sense, it isn't numerical algorithm per se.
The idea is simple. Say you have to transmit a message in english. Instead of using the normal letters occupying 8 bits per character, you can compose your message so that more frequently occurring letters like A and E take fewer bits and rarer letters like X are transmitted with more bits. So the message AXE instead of using 3x8 = 24 bits could use something like 1+10+2 = 13 bits, saving almost half the original size. Another trick used is detecting repeating sequences of characters like ABABAB.. and sending just the first occurrence with a repetition count.
The idea behind compression may be simple but how do you make a real life algorithm that works on arbitrary content? Well you and I can't but some rocket scientists have come up with a variety of compression algorithms, that all your compression programs like winzip, winrar, 7-zip etc use. I have a PhD in engineering but can't get my head round even the most simple of them, never mind coming up with the idea in the first place!
Add compression to your program
Compression algorithms have been around for ages so when I started searching for some simple C/C++ implementation of a basic algorithm I thought I would find many, but it depends what you mean 'simple'. The majority of zip/unzip source code is based on zlib (e.g. codeproject article) with various DLL or C++ wrappers. If you want something just for Windows you can use the microsoft CAB format (example). Or for really good compression you can rip code off the open source 7-zip. But I wanted something simple that wouldn't add MBs of code and dependencies on my code.
After many hours drifting in google, I discovered a 20 year old article explaining the LZW variety of reversible compression with some simple straight C source code. But it used fixed-length (bit) encoding of code words, meaning you don't get good compression for both small and big files. So I combined it with a variable bit length LZW implementation and came up with a simple C++ class for compression and expansion of files and other data streams.
I confess I don't have the foggiest about the internals of the algorithms, but after a lot of testing on random files it appears to be working fine. Here is a sample VS6 project workspace with a test program demonstrating the use of my CLZWCompressFile class.
Download LZW compression source code (27 kb)
Obviously this isn't meant to replace your 7-zip, but if you want some lightweight compression class for your program that can squash files under 50% with just 500 lines of code (mostly comments and assertions btw), then look no further!
|© 2002—2010 Nikos Bozinis, all rights reserved