[AI] (Tech Dose of the day) PDF

vishnu ramchandani vishnuhappy at yahoo.com
Fri Feb 8 07:42:52 EST 2008


PDF
contributor : Swapna KM, an Employee of MphasiS
Software Services
What is PDF?
The Portable Document Format (PDF) is the file format
created by Adobe Systems in 1993 for document
exchange. PDF is fixed-layout document format used for
representing two-dimensional documents in a manner
independent of the application software, hardware, and
operating system. Each PDF file encapsulates a
complete description of a 2-D document (and, with
Acrobat 3-D, embedded 3-D documents) that includes the
text, fonts, images, and 2-D vector graphics that
compose the document.
PDF is an open standard, and recently took a major
step towards becoming ISO 32000.
Why PDF?
1. Multiplatform - Viewable and printable on any
platform — Macintosh, Microsoft® Windows®, UNIX®, and
many mobile platforms. 
2. Extensible - More than 1,800 vendors worldwide
offer PDF-based solutions including creation, plug-in,
consulting, training, and support tools. 
3. Trusted and reliable - More than 200 million PDF
documents on the web today serve as evidence of the
number of organizations that rely on Adobe PDF to
capture information. 
4. Maintain information integrity - Adobe PDF files
look exactly like original documents and preserve
source file information — text, drawings, 3D,
full-color graphics, photos, and even business logic —
regardless of the application used to create them. 
5. Keep information secure - Digitally sign or
password-protect Adobe PDF documents created with
Adobe Acrobat® 8 or Adobe LiveCycle™ software. 
6. Searchable - Leverage full-text search features to
locate words, bookmarks, and data fields in documents.

7. Accessible - Adobe PDF documents work with
assistive technology to help make information
accessible to people with disabilities.
Technical Overview:
File structure:-
A PDF file consists primarily of objects, of which
there are eight types: 
Boolean values (representing true or false), Numbers,
Strings, Names, Arrays (ordered collections of
objects), Dictionaries (collections of objects indexed
by Names), Streams (usually containing large amounts
of data) and The null object.
Objects may be either direct (embedded in another
object) or indirect. Indirect objects are numbered
with an object number and a generation number. An
index table called the xref table gives the byte
offset of each indirect object from the start of the
file. This design allows for efficient random access
to the objects in the file, and also allows for small
changes to be made without rewriting the entire file
(incremental update).
Beginning with PDF version 1.5, indirect objects may
also be located in special streams known as object
streams. This technique reduces the size of files that
have large numbers of small indirect objects and is
especially useful for Tagged PDF.
There are two layouts to the PDF files—non-linear (not
“optimized”) and linear (“optimized”). Non-linear PDF
files consume less disk space than their linear
counterparts, though they are slower to access because
portions of the data required to assemble pages of the
document are scattered throughout the PDF file. Linear
PDF files (also called “optimized” or “web optimized”
PDF files) are constructed in a manner which enables
them to be read in a Web browser plugin since they are
written to disk in a linear (as in page order)
fashion. PDF files may be optimized using Adobe
Acrobat software or pdfopt, which is part of GPL
Ghostscript.
Imaging model:-
The basic design of how graphics are represented in
PDF is very similar to that of PostScript, except for
the use of transparency.
PDF graphics use a device independent Cartesian
coordinate system to describe the surface of a page. A
PDF page description can use a matrix to scale,
rotate, or skew graphical elements. A key concept in
PDF is that of the graphics state, which is a
collection of graphical parameters that may be
changed, saved, and restored by a page description.
PDF has (as of version 1.6) 24 graphics state
properties, of which some of the most important are:
The current transformation matrix (CTM) which
determines the coordinate system, The clipping path,
The color space and The alpha constant which is a key
component of transparency.
Vector graphics:-
Vector graphics in PDF, as in PostScript, are
constructed with paths. Paths are usually composed of
lines and cubic Bezier curves, but can also be
constructed from the outlines of text. Unlike
PostScript, PDF does not allow a single path to mix
text outlines with lines and curves. Paths can be
stroked, filled, or used for clipping. Strokes and
fills can use any color set in the graphics state,
including patterns.
Raster images:-
Raster images in PDF (called Image XObjects) are
represented by dictionaries with an associated stream.
The dictionary describes properties of the image, and
the stream contains the image data.
Text:-
Text in PDF is represented by text elements in page
content streams. A text element specifies that
characters should be drawn at certain positions. The
characters are specified using the encoding of a
selected font resource.
Fonts:-
A font object in PDF is a description of a digital
typeface. It may either describe the characteristics
of a typeface, or it may include an embedded font
file.
Encodings:-
Within text strings characters are shown using
character codes (integers) that map to glyphs in the
current font using an encoding. There are a number of
built-in encodings, including WinAnsi, MacRoman, and a
large number of encodings for East Asian languages.
(Although the WinAnsi and MacRoman encodings are
derived from the historical properties of the Windows
and Macintosh operating systems, fonts using these
encodings work equally well on any platform.) The
encoding mechanisms in PDF were designed for Type 1
fonts, and the rules for applying them to TrueType
fonts are complex.
Transparency:-
The original imaging model of PDF was, like
PostScript's, opaque: each object drawn on the page
completely replaced anything previously marked in the
same location. In PDF 1.4 the imaging model was
extended to allow transparency. When transparency is
used, new objects interact with previously marked
objects to produce blending effects. The addition of
transparency to PDF was done by means of new
extensions that were designed to be ignored in
products written to the PDF 1.3 and earlier
specifications. As a result, files that use a small
amount of transparency might view acceptably in older
viewers, but files making extensive use of
transparency could view completely wrong in an older
viewer without warning.
Interactive elements:-
PDF files may contain interactive elements such as
annotations and form fields.
Logical structure and accessibility:-
A PDF may contain structure information to enable
better text extraction and accessibility.
Security and signatures:-
A PDF file may be encrypted for security, or digitally
signed for authentication.
Subsets:-
Proper subsets of PDF have been, or are being,
standardized under ISO for several constituencies:
1. PDF/X for the printing and graphic arts as ISO
15930 (working in ISO TC130)
2. PDF/A for archiving in
corporate/government/library/etc environments as ISO
19005 (work done in ISO TC171)
3. PDF/E for exchange of engineering drawings (work
done in ISO TC171)
4. PDF/UA for universally accessible PDF files
A PDF/H variant (PDF for Healthcare) is being
developed.[11] However, it may consist more of a set
of "best practices" than of a specific format or
subset.
Further References 
Wikipedia : 
http://en.wikipedia.org/wiki/Portable_Document_Format
Other Links : 
http://www.adobe.com/products/acrobat/adobepdf.html  


      Save all your chat conversations. Find them online at http://in.messenger.yahoo.com/webmessengerpromo.php




More information about the AccessIndia mailing list