Why Some Files Do Not Compress Much and What You Can Do About It
You drag a 1 GB folder onto an icon called Compress and end up with a 990 MB archive. It feels like the algorithm failed. It did not; the folder was already close to its theoretical minimum size before you started. Understanding why is the difference between rage-quitting at the progress bar and getting an actually-smaller file out the other end.
The core idea: information density
Compression works by removing predictable patterns. A file with lots of repetition or structure compresses well; a file that already looks like random noise barely compresses at all. The technical term for that looks-like-noise property is high entropy.
Two short examples make this concrete. A 10 MB text file containing nothing but the letter A can compress down to a few hundred bytes; the compressor just records 10 million As and stops. A 10 MB file containing genuinely random bytes will hardly compress at all; any reduction would require predicting bytes that, by definition, cannot be predicted.
Every real file sits somewhere between those extremes, and the more predictable the file's bytes are, the more a compressor can shrink it.
Why already-compressed files barely shrink
Most modern media files are already compressed by their own internal algorithm:
- JPEG already runs the DCT and Huffman or arithmetic coding on every image.
- MP3, AAC, OPUS, and OGG use psychoacoustic models to strip inaudible data and then code the remainder.
- MP4, MKV, and WebM use video codecs (H.264, H.265, VP9, AV1) that already squeeze the data hard.
- DOCX, XLSX, PPTX, and ODT are themselves ZIP archives; the .docx extension hides a ZIP file inside.
- ZIP, 7Z, and RAR archives are by definition already compressed.
When you put these inside an outer ZIP, the outer compressor sees high-entropy data and gives up. The compressed result is typically within 1 to 3 percent of the original size.
The double-compression myth
People sometimes try to fix this by compressing twice; say, ZIP then 7Z. It does not help. After the first compression pass the data is close to its entropy limit; a second pass usually adds metadata overhead and may actually grow the file by a few bytes. The right move is to compress once with the best algorithm for that data, not to chain multiple passes.
Why encrypted files do not compress
A good encryption algorithm is designed to make ciphertext indistinguishable from random data. If it failed at that, it would leak information. So encrypted data is, by construction, maximally high-entropy and impossible to compress meaningfully.
If you ever need to both compress and encrypt, do it in this order: compress first, then encrypt. Most archive tools (7-Zip, WinRAR) do this automatically when you set both a compression method and a password.
What about lossless versus lossy compression?
Two broad approaches to making files smaller:
- Lossless: every byte of the original is recoverable. ZIP, 7Z, RAR, PNG, FLAC, and lossless WebP all fall in this category.
- Lossy: discards data deemed unimportant for the use case. JPEG, MP3, MP4 video codecs, and lossy WebP all fall here.
When a file resists lossless compression, switching to a lossy alternative can be the right answer at the cost of quality. A 50 MB BMP image will lose almost no visible quality when re-encoded as a JPEG at quality 85 and might land at 1 to 2 MB. That is not ZIP working better; it is recognising that lossy compression is the appropriate tool for that image.
Practical things you can do
Compress the right layer
If your folder contains a 500 MB video, ZIP will not help. Re-encode the video at a more aggressive bitrate and you will get a real reduction.
Switch the file format itself
- JPEG photos to WebP or AVIF for the web (see JPG vs PNG vs WebP).
- BMP or TIFF photos to JPEG, WebP, or AVIF.
- WAV audio to FLAC for lossless, MP3 or OPUS for lossy.
- Uncompressed PDFs to a PDF optimiser (see how to compress a PDF to 1MB).
Strip what you do not need
- Delete unused files from the folder before zipping.
- Remove image metadata (EXIF, location, thumbnails) if it is not needed.
- Empty caches, build artefacts, and log files.
Use a higher-quality algorithm
For folders dominated by source code, text logs, or many small files, switching from ZIP to 7Z (LZMA2) often saves 30 to 60 percent. The catch is compatibility, see ZIP vs 7Z vs RAR.
Split, then compress only what compresses
If your folder is half text and half MP4, separate them. Compress the text folder aggressively. Leave the MP4 alone or re-encode it directly.
Worked example
You have a 1.2 GB project folder: 800 MB of high-resolution PNG mockups, 300 MB of MP4 demo videos, 100 MB of code and text files. Re-export the PNG mockups as WebP at quality 90 and the 800 MB collapses to roughly 220 MB with no visible degradation. Re-encode the MP4s at a lower bitrate suitable for sharing and 300 MB drops to roughly 110 MB. 7-Zip the code and text files and 100 MB compresses to roughly 18 MB.
Total: 1.2 GB to 348 MB. A single outer ZIP of the original folder would have been around 1.18 GB. The work was done at the right layer.
FAQ
Will renaming the file extension change compression behaviour?
No. The extension is just a label. The underlying bytes determine entropy. Renaming .mp4 to .txt will not make it compress better.
Why does my .docx barely shrink when zipped?
Because .docx is already a ZIP archive in disguise. Modern Office formats wrap XML and embedded media in a ZIP container. The outer compression you add finds little to do.
Can I tell in advance how much a file will compress?
You can estimate. Plain text usually compresses 60 to 80 percent; databases and logs often 70 to 90 percent; high-resolution photos a few percent; already-compressed media essentially 0 percent. Eyeballing the file types in your folder is usually enough.
Is there a compressor that handles media better?
For each specific media type, yes; image-specific (JPEG, WebP, AVIF), audio-specific (MP3, OPUS, FLAC), and video-specific (H.264, H.265, AV1) tools shrink data far better than a general-purpose archiver because they understand the data's structure.
If I keep compressing forever, will the file eventually become a single byte?
No. Every compressor has a lower bound determined by the data's information content. A truly random file cannot be compressed at all, and most real files have far more structure than randomness, which is why they compress at all in the first place.
Comments
Post a Comment