[AI] Why is my PDF so large compared to the original?
mohammadwaseemk at gmail.com
Thu Sep 30 14:42:20 EDT 2010
Summary: It's possible that a PDF created from a document may be larger, perhaps much larger, than the original. I'll look at a few reasons why this might be.
PDF (Portable Document Format) files are a common and popular way to distribute documents. Their primary "feature" is simply that they look pretty much the same on just about any computer.
And of course a PDF file typically mimics the layout and feel of an actually printed document, only in electronically displayed form.
Why might it be larger than the original word processing or other original document? I can think of a few possibilities.
Compress me Once
Adobe has compression options that control how aggressively - or not - it compresses images in PDF documents it creates.
That actually makes sense since an uncompressed image can be large, and a good compression algorithm can reduce the size required to represent the image significantly, even more significantly if you're willing to trade off some of the image quality.
" small size isn't really a primary goal for PDF "
A potential problem, however, is that attempting to compress something that's already efficiently compressed can make it larger.
It's possible that if a document contains a large number of images, perhaps ".jpg" formatted photos which are by definition already compressed, the process of creating the PDF might actually cause those photographs to become somewhat larger. From what you say, that might well be the issue that you're facing.
Recommendation: check and experiment with the compression settings of your PDF creation utility.
Fonts: Here, but not There
Fonts and typefaces can be fairly confusing. We're all familiar with nearly ubiquitous fonts like Times New Roman, Arial, and even (dare I say it?) Comic Sans.
But what happens if you use a font in your document that most people don't have? When you print it out on paper it looks great, because that all happens on your computer where the font is present. On someone else's machine, things might look quite different if that font's not present. Use an obscure font and take your original document to a machine where that font isn't present, and you'll see what I mean - it'll look different.
PDF attempts to solve this problem by including fonts within the document. My belief is that it embeds only non-standard fonts - those which can't be assumed to be on most machines - however the rules may be more complex than that.
As a test, I created a small Microsoft Word document consisting of two sentences, 25 words total, all in the default font Times New Roman. Changing one word in the document to the font "Algerian" took the generated PDF from around 2,000 bytes to over 10,000.
Recommendation: examine your font usage, and see if you can reduce the number of non-standard fonts in your document.
Size Doesn't Matter (or So They Say)
PDF is relatively efficient, but creating a small file actually isn't its primary goal. That, as its name implies, is to be a Portable Document - one that looks pretty much the same everywhere, and one that can be viewed on a wide variety of machines. If achieving that goal means the file gets bigger, then so be it.
One of the apparent design decisions in the format is that a lot of information in the document is stored as "plain text", which presumably is easier for that "wide variety of machines" to understand.
If you ever open a .pdf file in notepad, or just "Type" it at the Windows Command Prompt you'll see a lot of plain text - text you can read and make some sense of (even if what it's saying is obscure).
Now, plain text isn't the most efficient way to store information from a space perspective. If you want proof, go grab a large plain text document and zip it. I'll use the Project Gutenberg copy of Tolstoy's War and Peace as an example. The plain text version of this book, known for its length, weighs in at a little over 3 megabytes. Zipping it using 7-Zip the result is less than 1/3rd the size of the original. That smaller version contains the exact same information, albeit in an unreadable form. All you need do is decompress it to recover the exact original copy.
Recommendation: try zipping your PDF. Yes, you might be re-compressing compressed or even doubly-compressed pictures, per the earlier point, but it's worth experimenting with. In a text-heavy document zipping the file for distribution might make a fair amount of sense.
There's Probably More
I've probably just scratched the surface of reasons that a PDF file might end up being larger than its original. The big take away from my perspective is that small size isn't really a primary goal for PDF and as a result some kinds of things it needs to do might well end up increasing the size of the result.
And zipping the file is always a quick and easy thing to try, often with good results.
bestmunda at gmail.com,
mohammadwaseemk at gmail.com,
mohammadwaseemkhan at yahoo.com.
MSN: mr.waseemkhan at hotmail.com
More information about the AccessIndia