Package org.opencms.search.extractors

Contains a generic, low-level framework for extration of plain text content out of various popular file formats.

See:
          Description

Interface Summary
I_CmsExtractionResult The result of a document text extraction.
I_CmsTextExtractor Allows extraction of the indexable "plain" text plus (optional) meta information from a given binary input document format.
 

Class Summary
A_CmsTextExtractor Base utility class that allows extraction of the indexable "plain" text from a given document format.
CmsExtractionResult The result of a document text extraction.
CmsExtractorHtml Extracts the text from an HTML document.
CmsExtractorMsOfficeOLE2 Extracts text data from a VFS resource that is an OLE 2 MS Office document.
CmsExtractorMsOfficeOOXML Extracts text data from a VFS resource that is an OOXML MS Office document.
CmsExtractorOpenOffice Extracts the text from OpenOffice documents (.ods, .odf).
CmsExtractorPdf Extracts the text from a PDF document.
CmsExtractorRtf Extracts the text from a RTF document.
Messages Convenience class to access the localized messages of this OpenCms package.
 

Package org.opencms.search.extractors Description

Contains a generic, low-level framework for extration of plain text content out of various popular file formats.

Since:
6.0.0