public class PDFText2HTML
extends org.apache.pdfbox.text.PDFTextStripper
| Constructor and Description |
|---|
PDFText2HTML()
Constructor.
|
| Modifier and Type | Method and Description |
|---|---|
protected float |
computeFontHeight(org.apache.pdfbox.pdmodel.font.PDFont arg0) |
protected void |
endArticle()
Write out the article separator.
|
void |
endDocument(org.apache.pdfbox.pdmodel.PDDocument document) |
protected String |
getTitle()
This method will attempt to guess the title of the document using
either the document properties or the first lines of text.
|
protected void |
showGlyph(org.apache.pdfbox.util.Matrix arg0,
org.apache.pdfbox.pdmodel.font.PDFont arg1,
int arg2,
String arg3,
org.apache.pdfbox.util.Vector arg4) |
protected void |
startArticle(boolean isLTR)
Write out the article separator (div tag) with proper text direction
information.
|
protected void |
startDocument(org.apache.pdfbox.pdmodel.PDDocument document) |
protected void |
writeHeader()
Deprecated.
|
protected void |
writeParagraphEnd()
Writes the paragraph end "</p>" to the output.
|
protected void |
writeString(String chars)
Write a string to the output stream and escape some HTML characters.
|
protected void |
writeString(String text,
List<org.apache.pdfbox.text.TextPosition> textPositions)
Write a string to the output stream, maintain font state, and escape some HTML characters.
|
endPage, getAddMoreFormatting, getArticleEnd, getArticleStart, getAverageCharTolerance, getCharactersByArticle, getCurrentPageNo, getDropThreshold, getEndBookmark, getEndPage, getIndentThreshold, getLineSeparator, getListItemPatterns, getOutput, getPageEnd, getPageStart, getParagraphEnd, getParagraphStart, getSeparateByBeads, getSortByPosition, getSpacingTolerance, getStartBookmark, getStartPage, getSuppressDuplicateOverlappingText, getText, getWordSeparator, matchPattern, processPage, processPages, processTextPosition, setAddMoreFormatting, setArticleEnd, setArticleStart, setAverageCharTolerance, setDropThreshold, setEndBookmark, setEndPage, setIndentThreshold, setLineSeparator, setListItemPatterns, setPageEnd, setPageStart, setParagraphEnd, setParagraphStart, setShouldSeparateByBeads, setSortByPosition, setSpacingTolerance, setStartBookmark, setStartPage, setSuppressDuplicateOverlappingText, setWordSeparator, startArticle, startPage, writeCharacters, writeLineSeparator, writePage, writePageEnd, writePageStart, writeParagraphSeparator, writeParagraphStart, writeText, writeWordSeparatoraddOperator, applyTextAdjustment, beginMarkedContentSequence, beginText, decreaseLevel, endMarkedContentSequence, endText, getAppearance, getCurrentPage, getGraphicsStackSize, getGraphicsState, getInitialMatrix, getLevel, getResources, getTextLineMatrix, getTextMatrix, increaseLevel, operatorException, processAnnotation, processChildStream, processOperator, processOperator, processSoftMask, processTilingPattern, processTilingPattern, processTransparencyGroup, processType3Stream, registerOperatorProcessor, restoreGraphicsStack, restoreGraphicsState, saveGraphicsStack, saveGraphicsState, setLineDashPattern, setTextLineMatrix, setTextMatrix, showAnnotation, showFontGlyph, showFontGlyph, showForm, showGlyph, showText, showTextString, showTextStrings, showTransparencyGroup, showType3Glyph, showType3Glyph, transformedPoint, transformWidth, unsupportedOperatorpublic PDFText2HTML()
throws IOException
IOException - If there is an error during initialization.@Deprecated protected void writeHeader() throws IOException
startDocument(PDDocument)IOException - If there is a problem writing out the header to the document.protected void startDocument(org.apache.pdfbox.pdmodel.PDDocument document)
throws IOException
startDocument in class org.apache.pdfbox.text.PDFTextStripperIOExceptionpublic void endDocument(org.apache.pdfbox.pdmodel.PDDocument document)
throws IOException
endDocument in class org.apache.pdfbox.text.PDFTextStripperIOExceptionprotected String getTitle()
protected void startArticle(boolean isLTR)
throws IOException
startArticle in class org.apache.pdfbox.text.PDFTextStripperisLTR - true if direction of text is left to rightIOException - If there is an error writing to the stream.protected void endArticle()
throws IOException
endArticle in class org.apache.pdfbox.text.PDFTextStripperIOException - If there is an error writing to the stream.protected void writeString(String text, List<org.apache.pdfbox.text.TextPosition> textPositions) throws IOException
writeString in class org.apache.pdfbox.text.PDFTextStrippertext - The text to write to the stream.textPositions - the corresponding text positionsIOException - If there is an error writing to the stream.protected void writeString(String chars) throws IOException
writeString in class org.apache.pdfbox.text.PDFTextStripperchars - String to be written to the streamIOException - If there is an error writing to the stream.protected void writeParagraphEnd()
throws IOException
writeParagraphEnd in class org.apache.pdfbox.text.PDFTextStripperIOExceptionprotected void showGlyph(org.apache.pdfbox.util.Matrix arg0,
org.apache.pdfbox.pdmodel.font.PDFont arg1,
int arg2,
String arg3,
org.apache.pdfbox.util.Vector arg4)
throws IOException
showGlyph in class org.apache.pdfbox.contentstream.PDFStreamEngineIOExceptionprotected float computeFontHeight(org.apache.pdfbox.pdmodel.font.PDFont arg0)
throws IOException
IOExceptionCopyright © 2002–2021 The Apache Software Foundation. All rights reserved.