Package com.tom_roush.pdfbox.text
Class PDFTextStripperByArea
- java.lang.Object
-
- com.tom_roush.pdfbox.contentstream.PDFStreamEngine
-
- com.tom_roush.pdfbox.text.PDFTextStripper
-
- com.tom_roush.pdfbox.text.PDFTextStripperByArea
-
public class PDFTextStripperByArea extends PDFTextStripper
This will extract text from a specified region in the PDF.
-
-
Field Summary
-
Fields inherited from class com.tom_roush.pdfbox.text.PDFTextStripper
charactersByArticle, document, LINE_SEPARATOR, output
-
-
Constructor Summary
Constructors Constructor Description PDFTextStripperByArea()Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidaddRegion(String regionName, android.graphics.RectF rect)Add a new region to group text by.voidextractRegions(PDPage page)Process the page to extract the region text.List<String>getRegions()Get the list of regions that have been setup.StringgetTextForRegion(String regionName)Get the text for the region, this should be called after extractRegions().protected voidprocessTextPosition(TextPosition text)This will process a TextPosition object and add the text to the list of characters on a page.voidsetShouldSeparateByBeads(boolean aShouldSeparateByBeads)This method does nothing in this derived class, because beads and regions are incompatible.protected voidshowGlyph(Matrix textRenderingMatrix, PDFont font, int code, String unicode, Vector displacement)This method was originally written by Ben Litchfield for PDFStreamEngine.protected voidshowText(byte[] string)Process text from the PDF Stream.protected voidwritePage()This will print the processed page text to the output stream.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from class com.tom_roush.pdfbox.contentstream.PDFStreamEngine
addOperator, applyTextAdjustment, beginText, endText, getAppearance, getCurrentPage, getGraphicsStackSize, getGraphicsState, getInitialMatrix, getResources, getTextLineMatrix, getTextMatrix, operatorException, processAnnotation, processChildStream, processOperator, processOperator, processSoftMask, processTilingPattern, processTilingPattern, processTransparencyGroup, processType3Stream, registerOperatorProcessor, restoreGraphicsStack, restoreGraphicsState, saveGraphicsStack, saveGraphicsState, setLineDashPattern, setTextLineMatrix, setTextMatrix, showAnnotation, showFontGlyph, showForm, showTextString, showTextStrings, showTransparencyGroup, showType3Glyph, transformedPoint, transformWidth, unsupportedOperator
-
Methods inherited from class com.tom_roush.pdfbox.text.PDFTextStripper
endArticle, endDocument, endPage, getAddMoreFormatting, getArticleEnd, getArticleStart, getAverageCharTolerance, getCharactersByArticle, getCurrentPageNo, getDropThreshold, getEndBookmark, getEndPage, getIndentThreshold, getLineSeparator, getListItemPatterns, getOutput, getPageEnd, getPageStart, getParagraphEnd, getParagraphStart, getSeparateByBeads, getSortByPosition, getSpacingTolerance, getStartBookmark, getStartPage, getSuppressDuplicateOverlappingText, getText, getWordSeparator, matchPattern, processPage, processPages, setAddMoreFormatting, setArticleEnd, setArticleStart, setAverageCharTolerance, setDropThreshold, setEndBookmark, setEndPage, setIndentThreshold, setLineSeparator, setListItemPatterns, setPageEnd, setPageStart, setParagraphEnd, setParagraphStart, setSortByPosition, setSpacingTolerance, setStartBookmark, setStartPage, setSuppressDuplicateOverlappingText, setWordSeparator, startArticle, startArticle, startDocument, startPage, writeCharacters, writeLineSeparator, writePageEnd, writePageStart, writeParagraphEnd, writeParagraphSeparator, writeParagraphStart, writeString, writeString, writeText, writeWordSeparator
-
-
-
-
Constructor Detail
-
PDFTextStripperByArea
public PDFTextStripperByArea() throws IOExceptionConstructor.- Throws:
IOException- If there is an error loading properties.
-
-
Method Detail
-
setShouldSeparateByBeads
public void setShouldSeparateByBeads(boolean aShouldSeparateByBeads)
This method does nothing in this derived class, because beads and regions are incompatible. Beads are ignored when stripping by area.- Overrides:
setShouldSeparateByBeadsin classPDFTextStripper- Parameters:
aShouldSeparateByBeads- The new grouping of beads.
-
addRegion
public void addRegion(String regionName, android.graphics.RectF rect)
Add a new region to group text by.- Parameters:
regionName- The name of the region.rect- The rectangle area to retrieve the text from.
-
getRegions
public List<String> getRegions()
Get the list of regions that have been setup.- Returns:
- A list of java.lang.String objects to identify the region names.
-
getTextForRegion
public String getTextForRegion(String regionName)
Get the text for the region, this should be called after extractRegions().- Parameters:
regionName- The name of the region to get the text from.- Returns:
- The text that was identified in that region.
-
extractRegions
public void extractRegions(PDPage page) throws IOException
Process the page to extract the region text.- Parameters:
page- The page to extract the regions from.- Throws:
IOException- If there is an error while extracting text.
-
processTextPosition
protected void processTextPosition(TextPosition text)
This will process a TextPosition object and add the text to the list of characters on a page. It takes care of overlapping text.- Overrides:
processTextPositionin classPDFTextStripper- Parameters:
text- The text to process.
-
writePage
protected void writePage() throws IOExceptionThis will print the processed page text to the output stream.- Overrides:
writePagein classPDFTextStripper- Throws:
IOException- If there is an error writing the text.
-
showText
protected void showText(byte[] string) throws IOExceptionDescription copied from class:PDFStreamEngineProcess text from the PDF Stream. You should override this method if you want to perform an action when encoded text is being processed.- Overrides:
showTextin classPDFStreamEngine- Parameters:
string- the encoded text- Throws:
IOException- if there is an error processing the string
-
showGlyph
protected void showGlyph(Matrix textRenderingMatrix, PDFont font, int code, String unicode, Vector displacement) throws IOException
This method was originally written by Ben Litchfield for PDFStreamEngine.- Overrides:
showGlyphin classPDFStreamEngine- Parameters:
textRenderingMatrix- the current text rendering matrix, Trmfont- the current fontcode- internal PDF character code for the glyphunicode- the Unicode text for this glyph, or null if the PDF does provide itdisplacement- the displacement (i.e. advance) of the glyph in text space- Throws:
IOException- if the glyph cannot be processed
-
-