Class PdfReader

    • Field Detail

      • xref

        protected int[] xref
      • newXrefType

        protected boolean newXrefType
      • pageRefs

        protected com.lowagie.text.pdf.PdfReader.PageRefs pageRefs
      • acroFormParsed

        protected boolean acroFormParsed
      • encrypted

        protected boolean encrypted
      • rebuilt

        protected boolean rebuilt
      • freeXref

        protected int freeXref
      • tampered

        protected boolean tampered
      • lastXref

        protected int lastXref
      • eofPos

        protected int eofPos
      • pdfVersion

        protected char pdfVersion
      • password

        protected byte[] password
      • certificateKey

        protected Key certificateKey
      • certificateKeyProvider

        protected String certificateKeyProvider
      • sharedStreams

        protected boolean sharedStreams
      • consolidateNamedDestinations

        protected boolean consolidateNamedDestinations
      • remoteToLocalNamedDestinations

        protected boolean remoteToLocalNamedDestinations
      • rValue

        protected int rValue
      • pValue

        protected int pValue
    • Constructor Detail

      • PdfReader

        protected PdfReader()
      • PdfReader

        public PdfReader​(String filename)
                  throws IOException
        Reads and parses a PDF document.
        Parameters:
        filename - the file name of the document
        Throws:
        IOException - on error
      • PdfReader

        public PdfReader​(String filename,
                         byte[] ownerPassword)
                  throws IOException
        Reads and parses a PDF document.
        Parameters:
        filename - the file name of the document
        ownerPassword - the password to read the document
        Throws:
        IOException - on error
      • PdfReader

        public PdfReader​(byte[] pdfIn)
                  throws IOException
        Reads and parses a PDF document.
        Parameters:
        pdfIn - the byte array with the document
        Throws:
        IOException - on error
      • PdfReader

        public PdfReader​(byte[] pdfIn,
                         byte[] ownerPassword)
                  throws IOException
        Reads and parses a PDF document.
        Parameters:
        pdfIn - the byte array with the document
        ownerPassword - the password to read the document
        Throws:
        IOException - on error
      • PdfReader

        public PdfReader​(String filename,
                         Certificate certificate,
                         Key certificateKey,
                         String certificateKeyProvider)
                  throws IOException
        Reads and parses a PDF document.
        Parameters:
        filename - the file name of the document
        certificate - the certificate to read the document
        certificateKey - the private key of the certificate
        certificateKeyProvider - the security provider for certificateKey
        Throws:
        IOException - on error
      • PdfReader

        public PdfReader​(URL url)
                  throws IOException
        Reads and parses a PDF document.
        Parameters:
        url - the URL of the document
        Throws:
        IOException - on error
      • PdfReader

        public PdfReader​(URL url,
                         byte[] ownerPassword)
                  throws IOException
        Reads and parses a PDF document.
        Parameters:
        url - the URL of the document
        ownerPassword - the password to read the document
        Throws:
        IOException - on error
      • PdfReader

        public PdfReader​(InputStream is,
                         byte[] ownerPassword)
                  throws IOException
        Reads and parses a PDF document.
        Parameters:
        is - the InputStream containing the document. The stream is read to the end but is not closed
        ownerPassword - the password to read the document
        Throws:
        IOException - on error
      • PdfReader

        public PdfReader​(InputStream is)
                  throws IOException
        Reads and parses a PDF document.
        Parameters:
        is - the InputStream containing the document. The stream is read to the end but is not closed
        Throws:
        IOException - on error
      • PdfReader

        public PdfReader​(RandomAccessFileOrArray raf,
                         byte[] ownerPassword)
                  throws IOException
        Reads and parses a pdf document. Contrary to the other constructors only the xref is read into memory. The reader is said to be working in "partial" mode as only parts of the pdf are read as needed. The pdf is left open but may be closed at any time with PdfReader.close(), reopen is automatic.
        Parameters:
        raf - the document location
        ownerPassword - the password or null for no password
        Throws:
        IOException - on error
      • PdfReader

        public PdfReader​(PdfReader reader)
        Creates an independent duplicate.
        Parameters:
        reader - the PdfReader to duplicate
    • Method Detail

      • getSafeFile

        public RandomAccessFileOrArray getSafeFile()
        Gets a new file instance of the original PDF document.
        Returns:
        a new file instance of the original PDF document
      • getPdfReaderInstance

        protected com.lowagie.text.pdf.PdfReaderInstance getPdfReaderInstance​(PdfWriter writer)
      • getNumberOfPages

        public int getNumberOfPages()
        Gets the number of pages in the document.
        Returns:
        the number of pages in the document
      • getCatalog

        public PdfDictionary getCatalog()
        Returns the document's catalog. This dictionary is not a copy, any changes will be reflected in the catalog.
        Returns:
        the document's catalog
      • getAcroForm

        public PRAcroForm getAcroForm()
        Returns the document's acroform, if it has one.
        Returns:
        the document's acroform
      • getPageRotation

        public int getPageRotation​(int index)
        Gets the page rotation. This value can be 0, 90, 180 or 270.
        Parameters:
        index - the page number. The first page is 1
        Returns:
        the page rotation
      • getPageSizeWithRotation

        public Rectangle getPageSizeWithRotation​(int index)
        Gets the page size, taking rotation into account. This is a Rectangle with the value of the /MediaBox and the /Rotate key.
        Parameters:
        index - the page number. The first page is 1
        Returns:
        a Rectangle
      • getPageSizeWithRotation

        public Rectangle getPageSizeWithRotation​(PdfDictionary page)
        Gets the rotated page from a page dictionary.
        Parameters:
        page - the page dictionary
        Returns:
        the rotated page or null when the page does not exists
      • getPageSizeWithRotation

        public Rectangle getPageSizeWithRotation​(int index,
                                                 String boxName)
        Gets the page size, taking rotation into account. This is a Rectangle with the value of a an arbitrary box and the /Rotate key.
        Parameters:
        index - the page number. The first page is 1
        boxName - of the rotated box. Allowed names are: "crop", "trim", "art", "bleed" and "media".
        Returns:
        a Rectangle or null if the page does not exist
      • getPageSize

        public Rectangle getPageSize​(int index)
        Gets the page size without taking rotation into account. This is the value of the /MediaBox key.
        Parameters:
        index - the page number. The first page is 1
        Returns:
        the page size
      • getPageSize

        public Rectangle getPageSize​(PdfDictionary page)
        Gets the page from a page dictionary
        Parameters:
        page - the page dictionary
        Returns:
        the page
      • getCropBox

        public Rectangle getCropBox​(int index)
        Gets the crop box without taking rotation into account. This is the value of the /CropBox key. The crop box is the part of the document to be displayed or printed. It usually is the same as the media box but may be smaller. If the page doesn't have a crop box the page size will be returned.
        Parameters:
        index - the page number. The first page is 1
        Returns:
        the crop box
      • getBoxSize

        public Rectangle getBoxSize​(int index,
                                    String boxName)
        Gets the box size. Allowed names are: "crop", "trim", "art", "bleed" and "media".
        Parameters:
        index - the page number. The first page is 1
        boxName - the box name
        Returns:
        the box rectangle or null
      • getInfo

        public Map<String,​String> getInfo()
        Returns the content of the document information dictionary as a HashMap of String.
        Returns:
        content of the document information dictionary
      • getNormalizedRectangle

        public static Rectangle getNormalizedRectangle​(PdfArray box)
        Normalizes a Rectangle so that llx and lly are smaller than urx and ury.
        Parameters:
        box - the original rectangle
        Returns:
        a normalized Rectangle
      • getPdfObjectRelease

        public static PdfObject getPdfObjectRelease​(PdfObject obj)
        Parameters:
        obj - an object of PdfObject
        Returns:
        a PdfObject
      • convertPdfNull

        public static PdfObject convertPdfNull​(PdfObject obj)
        If given object is instance of PdfNull, then null is returned. The provided object otherwise.
        Parameters:
        obj - object to convert
        Returns:
        provided object or null
      • getPdfObject

        public static PdfObject getPdfObject​(PdfObject obj)
        Reads a PdfObject resolving an indirect reference if needed.
        Parameters:
        obj - the PdfObject to read
        Returns:
        the resolved PdfObject
      • getPdfObjectRelease

        public static PdfObject getPdfObjectRelease​(PdfObject obj,
                                                    PdfObject parent)
        Reads a PdfObject resolving an indirect reference if needed. If the reader was opened in partial mode the object will be released to save memory.
        Parameters:
        obj - the PdfObject to read
        parent - parent object
        Returns:
        a PdfObject
      • getPdfObject

        public static PdfObject getPdfObject​(PdfObject obj,
                                             PdfObject parent)
        Parameters:
        obj - the PdfObject to read
        parent - parent object
        Returns:
        a PdfObject
      • getPdfObjectRelease

        public PdfObject getPdfObjectRelease​(int idx)
        Parameters:
        idx - index
        Returns:
        a PdfObject
      • getPdfObject

        public PdfObject getPdfObject​(int idx)
        Parameters:
        idx - index
        Returns:
        aPdfObject
      • resetLastXrefPartial

        public void resetLastXrefPartial()
      • releaseLastXrefPartial

        public void releaseLastXrefPartial()
      • releaseLastXrefPartial

        public static void releaseLastXrefPartial​(PdfObject obj)
        Parameters:
        obj - an object of PdfObject
      • dumpPerc

        public double dumpPerc()
        Returns:
        the percentage of the cross reference table that has been read
      • killIndirect

        public static PdfObject killIndirect​(PdfObject obj)
        Eliminates the reference to the object freeing the memory used by it and clearing the xref entry.
        Parameters:
        obj - the object. If it's an indirect reference it will be eliminated
        Returns:
        the object or the already erased dereferenced object
      • FlateDecode

        public static byte[] FlateDecode​(byte[] in)
        Decodes a stream that has the FlateDecode filter.
        Parameters:
        in - the input data
        Returns:
        the decoded data
      • decodePredictor

        public static byte[] decodePredictor​(byte[] in,
                                             PdfObject dicPar)
        Parameters:
        in - the input data
        dicPar - an object of PdfObject
        Returns:
        a byte array
      • FlateDecode

        public static byte[] FlateDecode​(byte[] in,
                                         boolean strict)
        A helper to FlateDecode.
        Parameters:
        in - the input data
        strict - true to read a correct stream. false to try to read a corrupted stream
        Returns:
        the decoded data
      • ASCIIHexDecode

        public static byte[] ASCIIHexDecode​(byte[] in)
        Decodes a stream that has the ASCIIHexDecode filter.
        Parameters:
        in - the input data
        Returns:
        the decoded data
      • ASCII85Decode

        public static byte[] ASCII85Decode​(byte[] in)
        Decodes a stream that has the ASCII85Decode filter.
        Parameters:
        in - the input data
        Returns:
        the decoded data
      • LZWDecode

        public static byte[] LZWDecode​(byte[] in)
        Decodes a stream that has the LZWDecode filter.
        Parameters:
        in - the input data
        Returns:
        the decoded data
      • isRebuilt

        public boolean isRebuilt()
        Checks if the document had errors and was rebuilt.
        Returns:
        true if rebuilt.
      • getPageN

        public PdfDictionary getPageN​(int pageNum)
        Gets the dictionary that represents a page.
        Parameters:
        pageNum - the page number. 1 is the first
        Returns:
        the page dictionary or null when the page does not exist
      • getPageNRelease

        public PdfDictionary getPageNRelease​(int pageNum)
        Parameters:
        pageNum - page number
        Returns:
        a Dictionary object
      • releasePage

        public void releasePage​(int pageNum)
        Parameters:
        pageNum - page number
      • resetReleasePage

        public void resetReleasePage()
      • getPageOrigRef

        public PRIndirectReference getPageOrigRef​(int pageNum)
        Gets the page reference to this page.
        Parameters:
        pageNum - the page number. 1 is the first
        Returns:
        the page reference
      • getPageContent

        public byte[] getPageContent​(int pageNum,
                                     RandomAccessFileOrArray file)
                              throws IOException
        Gets the contents of the page.
        Parameters:
        pageNum - the page number. 1 is the first
        file - the location of the PDF document
        Returns:
        the content
        Throws:
        IOException - on error
      • getPageContent

        public byte[] getPageContent​(int pageNum)
                              throws IOException
        Gets the contents of the page.
        Parameters:
        pageNum - the page number. 1 is the first
        Returns:
        the content
        Throws:
        IOException - on error
      • killXref

        protected void killXref​(PdfObject obj)
      • setPageContent

        public void setPageContent​(int pageNum,
                                   byte[] content)
        Sets the contents of the page.
        Parameters:
        content - the new page content
        pageNum - the page number. 1 is the first
      • setPageContent

        public void setPageContent​(int pageNum,
                                   byte[] content,
                                   int compressionLevel)
        Sets the contents of the page.
        Parameters:
        content - the new page content
        pageNum - the page number. 1 is the first
        compressionLevel - compression level
        Since:
        2.1.3 (the method already existed without param compressionLevel)
      • getStreamBytes

        public static byte[] getStreamBytes​(PRStream stream,
                                            RandomAccessFileOrArray file)
                                     throws IOException
        Get the content from a stream applying the required filters.
        Parameters:
        stream - the stream
        file - the location where the stream is
        Returns:
        the stream content
        Throws:
        IOException - on error
      • getStreamBytes

        public static byte[] getStreamBytes​(PRStream stream)
                                     throws IOException
        Get the content from a stream applying the required filters.
        Parameters:
        stream - the stream
        Returns:
        the stream content
        Throws:
        IOException - on error
      • getStreamBytesRaw

        public static byte[] getStreamBytesRaw​(PRStream stream,
                                               RandomAccessFileOrArray file)
                                        throws IOException
        Get the content from a stream as it is without applying any filter.
        Parameters:
        stream - the stream
        file - the location where the stream is
        Returns:
        the stream content
        Throws:
        IOException - on error
      • getStreamBytesRaw

        public static byte[] getStreamBytesRaw​(PRStream stream)
                                        throws IOException
        Get the content from a stream as it is without applying any filter.
        Parameters:
        stream - the stream
        Returns:
        the stream content
        Throws:
        IOException - on error
      • eliminateSharedStreams

        public void eliminateSharedStreams()
        Eliminates shared streams if they exist.
      • isTampered

        public boolean isTampered()
        Checks if the document was changed.
        Returns:
        true if the document was changed, false otherwise
      • setTampered

        public void setTampered​(boolean tampered)
        Sets the tampered state. A tampered PdfReader cannot be reused in PdfStamper.
        Parameters:
        tampered - the tampered state
      • getMetadata

        public byte[] getMetadata()
                           throws IOException
        Gets the XML metadata.
        Returns:
        the XML metadata
        Throws:
        IOException - on error
      • getLastXref

        public int getLastXref()
        Gets the byte address of the last xref table.
        Returns:
        the byte address of the last xref table
      • getXrefSize

        public int getXrefSize()
        Gets the number of xref objects.
        Returns:
        the number of xref objects
      • getEofPos

        public int getEofPos()
        Gets the byte address of the %%EOF marker.
        Returns:
        the byte address of the %%EOF marker
      • getPdfVersion

        public char getPdfVersion()
        Gets the PDF version. Only the last version char is returned. For example version 1.4 is returned as '4'.
        Returns:
        the PDF version
      • isEncrypted

        public boolean isEncrypted()
        Returns true if the PDF is encrypted.
        Returns:
        true if the PDF is encrypted
      • isOwnerPasswordUsed

        public boolean isOwnerPasswordUsed()
        Returns true if the owner password has been used to open the document.
        Returns:
        true if the owner password has been used to open the document.
      • getPermissions

        public int getPermissions()
        Gets the encryption permissions. It can be used directly in PdfWriter.setEncryption().
        Returns:
        the encryption permissions
      • setPermissions

        public void setPermissions​(int permissionValue)
      • is128Key

        public boolean is128Key()
        Returns true if the PDF has a 128 bit key encryption.
        Returns:
        true if the PDF has a 128 bit key encryption
      • getTrailer

        public PdfDictionary getTrailer()
        Gets the trailer dictionary
        Returns:
        the trailer dictionary
      • shuffleSubsetNames

        public int shuffleSubsetNames()
        Finds all the font subsets and changes the prefixes to some random values.
        Returns:
        the number of font subsets altered
      • createFakeFontSubsets

        public int createFakeFontSubsets()
        Finds all the fonts not subset but embedded and marks them as subset.
        Returns:
        the number of fonts altered
      • getNamedDestination

        public HashMap<Object,​PdfObject> getNamedDestination()
        Gets all the named destinations as an HashMap. The key is the name and the value is the destinations array.
        Returns:
        gets all the named destinations
      • getNamedDestination

        public HashMap<Object,​PdfObject> getNamedDestination​(boolean keepNames)
        Gets all the named destinations as an HashMap. The key is the name and the value is the destinations array.
        Parameters:
        keepNames - true if you want the keys to be real PdfNames instead of Strings
        Returns:
        gets all the named destinations
        Since:
        2.1.6
      • getNamedDestinationFromNames

        public HashMap getNamedDestinationFromNames()
        Gets the named destinations from the /Dests key in the catalog as an HashMap. The key is the name and the value is the destinations array.
        Returns:
        gets the named destinations
      • getNamedDestinationFromNames

        public HashMap<Object,​PdfObject> getNamedDestinationFromNames​(boolean keepNames)
        Gets the named destinations from the /Dests key in the catalog as an HashMap. The key is the name and the value is the destinations array.
        Parameters:
        keepNames - true if you want the keys to be real PdfNames instead of Strings
        Returns:
        gets the named destinations
        Since:
        2.1.6
      • getNamedDestinationFromStrings

        public HashMap getNamedDestinationFromStrings()
        Gets the named destinations from the /Names key in the catalog as an HashMap. The key is the name and the value is the destinations array.
        Returns:
        gets the named destinations
      • removeFields

        public void removeFields()
        Removes all the fields from the document.
      • removeAnnotations

        public void removeAnnotations()
        Removes all the annotations and fields from the document.
      • makeRemoteNamedDestinationsLocal

        public void makeRemoteNamedDestinationsLocal()
        Replaces remote named links with local destinations that have the same name.
        Since:
        5.0
      • consolidateNamedDestinations

        public void consolidateNamedDestinations()
        Replaces all the local named links with the actual destinations.
      • removeUnusedNode

        protected void removeUnusedNode​(PdfObject obj,
                                        boolean[] hits)
      • removeUnusedObjects

        public int removeUnusedObjects()
        Removes all the unreachable objects.
        Returns:
        the number of indirect objects removed
      • getAcroFields

        public AcroFields getAcroFields()
        Gets a read-only version of AcroFields.
        Returns:
        a read-only version of AcroFields
      • getJavaScript

        public String getJavaScript()
                             throws IOException
        Gets the global document JavaScript.
        Returns:
        the global document JavaScript
        Throws:
        IOException - on error
      • selectPages

        public void selectPages​(String ranges)
        Selects the pages to keep in the document. The pages are described as ranges. The page ordering can be changed but no page repetitions are allowed. Note that it may be very slow in partial mode.
        Parameters:
        ranges - the comma separated ranges as described in SequenceList
      • selectPages

        public void selectPages​(List<Integer> pagesToKeep)
        Selects the pages to keep in the document. The pages are described as a List of Integer. The page ordering can be changed but no page repetitions are allowed. Note that it may be very slow in partial mode.
        Parameters:
        pagesToKeep - the pages to keep in the document
      • getSimpleViewerPreferences

        public int getSimpleViewerPreferences()
        Returns a bitset representing the PageMode and PageLayout viewer preferences. Doesn't return any information about the ViewerPreferences dictionary.
        Returns:
        an int that contains the Viewer Preferences.
      • isAppendable

        public boolean isAppendable()
        Getter for property appendable.
        Returns:
        Value of property appendable.
      • setAppendable

        public void setAppendable​(boolean appendable)
        Setter for property appendable.
        Parameters:
        appendable - New value of property appendable.
      • isNewXrefType

        public boolean isNewXrefType()
        Getter for property newXrefType.
        Returns:
        Value of property newXrefType.
      • getFileLength

        public int getFileLength()
        Getter for property fileLength.
        Returns:
        Value of property fileLength.
      • isHybridXref

        public boolean isHybridXref()
        Getter for property hybridXref.
        Returns:
        Value of property hybridXref.
      • removeUsageRights

        public void removeUsageRights()
        Removes any usage rights that this PDF may have. Only Adobe can grant usage rights and any PDF modification with iText will invalidate them. Invalidated usage rights may confuse Acrobat and it's advisable to remove them altogether.
      • getCertificationLevel

        public int getCertificationLevel()
        Gets the certification level for this document. The return values can be PdfSignatureAppearance.NOT_CERTIFIED, PdfSignatureAppearance.CERTIFIED_NO_CHANGES_ALLOWED, PdfSignatureAppearance.CERTIFIED_FORM_FILLING and PdfSignatureAppearance.CERTIFIED_FORM_FILLING_AND_ANNOTATIONS .

        No signature validation is made, use the methods available for that in AcroFields.

        Returns:
        gets the certification level for this document
      • isModificationlowedWithoutOwnerPassword

        public boolean isModificationlowedWithoutOwnerPassword()
        Checks if an encrypted document may be modified if the owner password was not supplied. If the document is not encrypted, the setting has no effect.
        Returns:
        true if the document may be modified even if the owner password was not supplied false otherwise
      • setModificationAllowedWithoutOwnerPassword

        public void setModificationAllowedWithoutOwnerPassword​(boolean modificationAllowedWithoutOwnerPassword)
        Sets whether the document (if encrypted) may be modified even if the owner password was not supplied. If this is set to false an exception will be thrown when attempting to access the Document if the owner password was not supplied (for encrypted documents.)
        Parameters:
        modificationAllowedWithoutOwnerPassword - the modificationAllowedWithoutOwnerPassword state.
      • isOpenedWithFullPermissions

        public final boolean isOpenedWithFullPermissions()
        Checks if the document was opened with the owner password so that the end application can decide what level of access restrictions to apply. If the document is not encrypted it will return true.
        Returns:
        true if the document was opened with the owner password or if it's not encrypted or the modificationAllowedWithoutOwnerPassword flag is set, false otherwise.
      • getCryptoMode

        public int getCryptoMode()
      • isMetadataEncrypted

        public boolean isMetadataEncrypted()
      • computeUserPassword

        public byte[] computeUserPassword()
      • getDocumentId

        public byte[] getDocumentId()
        Returns a permanent document identifier extracted from trailer /ID entry, when present
        Returns:
        byte array representing the document permanent identifier