Package org.docx4j.convert.in.xhtml
Class XHTMLImporterImpl
java.lang.Object
org.docx4j.convert.in.xhtml.XHTMLImporterImpl
- All Implemented Interfaces:
XHTMLImporter
Convert XHTML + CSS to WordML content. Can convert an entire document,
or a fragment consisting of one or more block level objects.
Your XHTML must be well formed XML!
For usage examples, please see org.docx4j.samples/XHTMLImportFragment,
and XHTMLImportDocument
For best results, be sure to include src/main/resources on your classpath.
Includes support for:
- paragraph and run formatting
- tables
- images
- lists (ordered, unordered)#
People complain flying-saucer is slow
(due to DTD related network lookups).
See http://stackoverflow.com/questions/5431646/is-there-any-way-improve-the-performance-of-flyingsaucer
Looking at FSEntityResolver, the problem is that there
is no resources/schema on dir anymore which can be put on
the classpath. Once this problem is fixed, things work better.
TODO:
- insert, delete
- space-before, space-after unrecognized CSS property
- Since:
- 2.8
- Author:
- jharrop
-
Nested Class Summary
Nested Classes -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic org.slf4j.Loggerprotected org.docx4j.openpackaging.packages.WordprocessingMLPackage -
Constructor Summary
ConstructorsConstructorDescriptionXHTMLImporterImpl(org.docx4j.openpackaging.packages.WordprocessingMLPackage wordMLPackage) -
Method Summary
Modifier and TypeMethodDescriptionstatic voidaddFontMapping(String cssFontFamily, String font) static voidaddFontMapping(String cssFontFamily, org.docx4j.wml.RFonts rFonts) Map a font family, for example "Century Gothic" in: font-family:"Century Gothic", Helvetica, Arial, sans-serif; to a w:rFonts object, for example: <w:rFonts w:ascii="Arial Black" w:hAnsi="Arial Black"/> Assuming style font-family:"Century Gothic", Helvetica, Arial, sans-serif; the first font family for which there is a mapping is the one which will be used.Convert the well formed XHTML contained in file to a list of WML objects.convert(InputStream is, String baseUrl) Convert the well formed XHTML contained in the string to a list of WML objects.Convert the well formed XHTML found at the specified URI to a list of WML objects.convert(InputSource is, String baseUrl) Convert the well formed XHTML from the specified SAX InputSourceconvertMHT(InputStream is, String baseUrl) protected intgetAncestorIndentation(com.openhtmltopdf.layout.Styleable styleable) getCascadedProperties(com.openhtmltopdf.css.style.CalculatedStyle cs) protected LinkedList<org.docx4j.wml.ContentAccessor>static shortgetLengthPrimitiveType(com.openhtmltopdf.css.style.FSDerivedValue val) protected ListHelperprotected intgetLocalIndentation(com.openhtmltopdf.layout.Styleable styleable) Inside a list item, get the contribution of any div.static Templatesprotected org.docx4j.wml.PPrgetPPr(com.openhtmltopdf.render.BlockBox blockBox, Map<String, com.openhtmltopdf.css.parser.PropertyValue> cssMap) Get the current numbers of SEQ fields, used in image captions.protected org.docx4j.wml.StylegetStyleByIdOrName(String... parameters) If one parameter is passed then search style by id (1st parameter), if style by id is not found then search style by name (also 1st parameter).protected FormattingOptionprotected TableHelperprotected booleanprotected voidpopulatePPr(org.docx4j.wml.PPr pPr, com.openhtmltopdf.layout.Styleable blockBox, Map<String, com.openhtmltopdf.css.parser.PropertyValue> cssMap) voidvoidsetBookmarkNamePrefix(String bookmarkNamePrefix) The prefix (if any) to be added to bookmark names generated during this run.static voidsetCssWhiteList(Set<String> cssWhiteList) Deprecated.voidsetDivHandler(DivHandler divHandler) voidsetHyperlinkStyle(String hyperlinkStyleID) Configure, how the Importer styles hyperlinks If hyperlinkStyleId is set tonull, hyperlinks are styled using just the CSS.voidsetMaxWidth(int maxWidth, String tableStyle) Set the maximum width available (in twips); useful for scaling bare images if they are to go in a table cell.voidsetParagraphFormatting(FormattingOption paragraphFormatting) voidsetRenderer(DocxRenderer renderer) voidsetRunFormatting(FormattingOption runFormatting) voidsetSequenceCounters(Map<String, Integer> sequenceCounters) Set the last used numbers of SEQ fields, used in image captions.voidsetTableFormatting(FormattingOption tableFormatting) voidsetXHTMLImageHandler(XHTMLImageHandler xHTMLImageHandler) If you have your own implementation of the XHTMLImageHandler interface which you'd like to use.
-
Field Details
-
log
public static org.slf4j.Logger log -
wordMLPackage
protected org.docx4j.openpackaging.packages.WordprocessingMLPackage wordMLPackage
-
-
Constructor Details
-
XHTMLImporterImpl
public XHTMLImporterImpl(org.docx4j.openpackaging.packages.WordprocessingMLPackage wordMLPackage)
-
-
Method Details
-
getMathXSLT
-
setHyperlinkStyle
Configure, how the Importer styles hyperlinks If hyperlinkStyleId is set tonull, hyperlinks are styled using just the CSS. This is the default behavior. If hyperlinkStyleId is set to"someWordHyperlinkStyleName", that style is used. The default Word hyperlink style name is "Hyperlink". It is currently your responsibility to define that style in your styles definition part.- Specified by:
setHyperlinkStylein interfaceXHTMLImporter- Parameters:
hyperlinkStyleID- The style to use for hyperlinks (eg Hyperlink)
-
setXHTMLImageHandler
If you have your own implementation of the XHTMLImageHandler interface which you'd like to use. -
setMaxWidth
Description copied from interface:XHTMLImporterSet the maximum width available (in twips); useful for scaling bare images if they are to go in a table cell.
Also set table style if images are really to go in a table cell (needed to remove table style margins from final width).- Specified by:
setMaxWidthin interfaceXHTMLImportertableStyle- - can be null
-
setDivHandler
-
getListHelper
-
getTableHelper
-
getRenderer
- Returns:
- the renderer
-
setRenderer
- Parameters:
renderer- the renderer to set
-
addFontMapping
Map a font family, for example "Century Gothic" in: font-family:"Century Gothic", Helvetica, Arial, sans-serif; to a w:rFonts object, for example: <w:rFonts w:ascii="Arial Black" w:hAnsi="Arial Black"/> Assuming style font-family:"Century Gothic", Helvetica, Arial, sans-serif; the first font family for which there is a mapping is the one which will be used. xhtml-renderer's CSSName defaults font-family: serif It is your responsibility to ensure a suitable font is available on the target system (or embedded in the docx package). If we (eventually) support CSS @font-face, docx4j could do that for you (at least for font formats we can convert to something embeddable). You should set these up once, for all your subsequent imports, since some stuff is cached and currently won't get updated if you add fonts later.- Since:
- 3.0
-
addFontMapping
-
setRunFormatting
- Specified by:
setRunFormattingin interfaceXHTMLImporter- Parameters:
runFormatting- the runFormatting to set
-
setParagraphFormatting
- Specified by:
setParagraphFormattingin interfaceXHTMLImporter- Parameters:
paragraphFormatting- the paragraphFormatting to set
-
setTableFormatting
- Specified by:
setTableFormattingin interfaceXHTMLImporter- Parameters:
tableFormatting- the tableFormatting to set
-
getTableFormatting
-
setCssWhiteList
Deprecated.If the CSS white list is non-null, a CSS property will only be honoured if it is on the list. Useful where suitable default values aren't being provided via- Parameters:
cssWhiteList- the cssWhiteList to set
-
getBookmarkIdLast
- Specified by:
getBookmarkIdLastin interfaceXHTMLImporter
-
setBookmarkIdNext
-
convert
public List<Object> convert(File file, String baseUrl) throws org.docx4j.openpackaging.exceptions.Docx4JException Convert the well formed XHTML contained in file to a list of WML objects.- Specified by:
convertin interfaceXHTMLImporter- Parameters:
file-baseUrl-wordMLPackage-- Returns:
- Throws:
IOExceptionorg.docx4j.openpackaging.exceptions.Docx4JException
-
convert
public List<Object> convert(InputSource is, String baseUrl) throws org.docx4j.openpackaging.exceptions.Docx4JException Convert the well formed XHTML from the specified SAX InputSource- Specified by:
convertin interfaceXHTMLImporter- Parameters:
is-baseUrl-wordMLPackage-- Returns:
- Throws:
IOExceptionorg.docx4j.openpackaging.exceptions.Docx4JException
-
convertMHT
public List<Object> convertMHT(InputStream is, String baseUrl) throws org.docx4j.openpackaging.exceptions.Docx4JException - Throws:
org.docx4j.openpackaging.exceptions.Docx4JException
-
convert
public List<Object> convert(InputStream is, String baseUrl) throws org.docx4j.openpackaging.exceptions.Docx4JException - Specified by:
convertin interfaceXHTMLImporter- Parameters:
is-baseUrl-wordMLPackage-- Returns:
- Throws:
IOExceptionorg.docx4j.openpackaging.exceptions.Docx4JException
-
convert
public List<Object> convert(Node node, String baseUrl) throws org.docx4j.openpackaging.exceptions.Docx4JException - Specified by:
convertin interfaceXHTMLImporter- Parameters:
node-baseUrl-wordMLPackage-- Returns:
- Throws:
IOExceptionorg.docx4j.openpackaging.exceptions.Docx4JException
-
convert
public List<Object> convert(Reader reader, String baseUrl) throws org.docx4j.openpackaging.exceptions.Docx4JException - Specified by:
convertin interfaceXHTMLImporter- Parameters:
reader-baseUrl-wordMLPackage-- Returns:
- Throws:
IOExceptionorg.docx4j.openpackaging.exceptions.Docx4JException
-
convert
Convert the well formed XHTML found at the specified URI to a list of WML objects.- Specified by:
convertin interfaceXHTMLImporter- Parameters:
url-wordMLPackage-- Returns:
- Throws:
org.docx4j.openpackaging.exceptions.Docx4JException
-
convert
public List<Object> convert(String content, String baseUrl) throws org.docx4j.openpackaging.exceptions.Docx4JException Convert the well formed XHTML contained in the string to a list of WML objects.- Specified by:
convertin interfaceXHTMLImporter- Parameters:
content-baseUrl-wordMLPackage-- Returns:
- Throws:
org.docx4j.openpackaging.exceptions.Docx4JException
-
getCascadedProperties
-
getLengthPrimitiveType
public static short getLengthPrimitiveType(com.openhtmltopdf.css.style.FSDerivedValue val) -
getContentContextStack
-
getSequenceCounters
Get the current numbers of SEQ fields, used in image captions. Typically you'd use this if you are importing multiple times into a single docx (as for example, OpenDoPE does).- Specified by:
getSequenceCountersin interfaceXHTMLImporter- Parameters:
sequenceCounters-
-
setSequenceCounters
Set the last used numbers of SEQ fields, used in image captions. Key is sequence name. The default is "Figure", but you can also use others (matching value of @sequence).- Specified by:
setSequenceCountersin interfaceXHTMLImporter- Parameters:
sequenceCounters-
-
getPPr
-
isBidi
-
populatePPr
-
getStyleByIdOrName
If one parameter is passed then search style by id (1st parameter), if style by id is not found then search style by name (also 1st parameter).
If two - then search by id (1st parameter) and if style by id is not found then search style by name (2nd parameter).
Other parameters are ignored. -
getLocalIndentation
protected int getLocalIndentation(com.openhtmltopdf.layout.Styleable styleable) Inside a list item, get the contribution of any div. -
getAncestorIndentation
protected int getAncestorIndentation(com.openhtmltopdf.layout.Styleable styleable) -
setBookmarkNamePrefix
The prefix (if any) to be added to bookmark names generated during this run. Useful for preventing name collisions, when importing multiple fragments into a single docx.- Parameters:
bookmarkNamePrefix-
-