Package com.tom_roush.pdfbox.util
Class PDFHighlighter
- java.lang.Object
-
- com.tom_roush.pdfbox.contentstream.PDFStreamEngine
-
- com.tom_roush.pdfbox.text.PDFTextStripper
-
- com.tom_roush.pdfbox.util.PDFHighlighter
-
public class PDFHighlighter extends PDFTextStripper
Highlighting of words in a PDF document with an XML file.- See Also:
- Adobe Highlight File Format
-
-
Field Summary
-
Fields inherited from class com.tom_roush.pdfbox.text.PDFTextStripper
charactersByArticle, document, LINE_SEPARATOR, output
-
-
Constructor Summary
Constructors Constructor Description PDFHighlighter()Default constructor.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected voidendPage(PDPage pdPage)End a page.voidgenerateXMLHighlight(PDDocument pdDocument, String[] sWords, Writer xmlOutput)Generate an XML highlight string based on the PDF.voidgenerateXMLHighlight(PDDocument pdDocument, String highlightWord, Writer xmlOutput)Generate an XML highlight string based on the PDF.static voidmain(String[] args)Command line application.protected voidshowGlyph(Matrix textRenderingMatrix, PDFont font, int code, String unicode, Vector displacement)This method was originally written by Ben Litchfield for PDFStreamEngine.protected voidshowText(byte[] string)Process text from the PDF Stream.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from class com.tom_roush.pdfbox.contentstream.PDFStreamEngine
addOperator, applyTextAdjustment, beginText, endText, getAppearance, getCurrentPage, getGraphicsStackSize, getGraphicsState, getInitialMatrix, getResources, getTextLineMatrix, getTextMatrix, operatorException, processAnnotation, processChildStream, processOperator, processOperator, processSoftMask, processTilingPattern, processTilingPattern, processTransparencyGroup, processType3Stream, registerOperatorProcessor, restoreGraphicsStack, restoreGraphicsState, saveGraphicsStack, saveGraphicsState, setLineDashPattern, setTextLineMatrix, setTextMatrix, showAnnotation, showFontGlyph, showForm, showTextString, showTextStrings, showTransparencyGroup, showType3Glyph, transformedPoint, transformWidth, unsupportedOperator
-
Methods inherited from class com.tom_roush.pdfbox.text.PDFTextStripper
endArticle, endDocument, getAddMoreFormatting, getArticleEnd, getArticleStart, getAverageCharTolerance, getCharactersByArticle, getCurrentPageNo, getDropThreshold, getEndBookmark, getEndPage, getIndentThreshold, getLineSeparator, getListItemPatterns, getOutput, getPageEnd, getPageStart, getParagraphEnd, getParagraphStart, getSeparateByBeads, getSortByPosition, getSpacingTolerance, getStartBookmark, getStartPage, getSuppressDuplicateOverlappingText, getText, getWordSeparator, matchPattern, processPage, processPages, processTextPosition, setAddMoreFormatting, setArticleEnd, setArticleStart, setAverageCharTolerance, setDropThreshold, setEndBookmark, setEndPage, setIndentThreshold, setLineSeparator, setListItemPatterns, setPageEnd, setPageStart, setParagraphEnd, setParagraphStart, setShouldSeparateByBeads, setSortByPosition, setSpacingTolerance, setStartBookmark, setStartPage, setSuppressDuplicateOverlappingText, setWordSeparator, startArticle, startArticle, startDocument, startPage, writeCharacters, writeLineSeparator, writePage, writePageEnd, writePageStart, writeParagraphEnd, writeParagraphSeparator, writeParagraphStart, writeString, writeString, writeText, writeWordSeparator
-
-
-
-
Constructor Detail
-
PDFHighlighter
public PDFHighlighter() throws IOExceptionDefault constructor.- Throws:
IOException- If there is an error constructing this class.
-
-
Method Detail
-
generateXMLHighlight
public void generateXMLHighlight(PDDocument pdDocument, String highlightWord, Writer xmlOutput) throws IOException
Generate an XML highlight string based on the PDF.- Parameters:
pdDocument- The PDF to find words in.highlightWord- The word to search for.xmlOutput- The resulting output xml file.- Throws:
IOException- If there is an error reading from the PDF, or writing to the XML.
-
generateXMLHighlight
public void generateXMLHighlight(PDDocument pdDocument, String[] sWords, Writer xmlOutput) throws IOException
Generate an XML highlight string based on the PDF.- Parameters:
pdDocument- The PDF to find words in.sWords- The words to search for.xmlOutput- The resulting output xml file.- Throws:
IOException- If there is an error reading from the PDF, or writing to the XML.
-
endPage
protected void endPage(PDPage pdPage) throws IOException
End a page. Default implementation is to do nothing. Subclasses may provide additional information.- Overrides:
endPagein classPDFTextStripper- Parameters:
pdPage- The page we are about to process.- Throws:
IOException- If there is any error writing to the stream.
-
main
public static void main(String[] args) throws IOException
Command line application.- Parameters:
args- The command line arguments to the application.- Throws:
IOException- If there is an error generating the highlight file.
-
showText
protected void showText(byte[] string) throws IOExceptionDescription copied from class:PDFStreamEngineProcess text from the PDF Stream. You should override this method if you want to perform an action when encoded text is being processed.- Overrides:
showTextin classPDFStreamEngine- Parameters:
string- the encoded text- Throws:
IOException- if there is an error processing the string
-
showGlyph
protected void showGlyph(Matrix textRenderingMatrix, PDFont font, int code, String unicode, Vector displacement) throws IOException
This method was originally written by Ben Litchfield for PDFStreamEngine.- Overrides:
showGlyphin classPDFStreamEngine- Parameters:
textRenderingMatrix- the current text rendering matrix, Trmfont- the current fontcode- internal PDF character code for the glyphunicode- the Unicode text for this glyph, or null if the PDF does provide itdisplacement- the displacement (i.e. advance) of the glyph in text space- Throws:
IOException- if the glyph cannot be processed
-
-