org.opencms.search.documents
Interface I_CmsSearchExtractor

All Known Subinterfaces:
I_CmsDocumentFactory
All Known Implementing Classes:
A_CmsVfsDocument, CmsDocumentContainerPage, CmsDocumentGeneric, CmsDocumentHtml, CmsDocumentMsOfficeOLE2, CmsDocumentMsOfficeOOXML, CmsDocumentOpenOffice, CmsDocumentPdf, CmsDocumentPlainText, CmsDocumentRtf, CmsDocumentXmlContent, CmsDocumentXmlPage, CmsGalleryDocumentXmlContent, CmsGalleryDocumentXmlPage, CmsSolrDocumentContainerPage, CmsSolrDocumentXmlContent

public interface I_CmsSearchExtractor

Defines a text extractor for the integrated search engine.

The job of a search extractor is to extract indexable plain text from a resource in the OpenCms VFS. This may be from the resource content, for example from a PDF file, or from the resource properties, for example the Title, Keywords and Description properties.

Since:
6.0.0

Method Summary
 I_CmsExtractionResult extractContent(CmsObject cms, CmsResource resource, CmsSearchIndex index)
          Extracts the content of a given index resource according to the resource file type and the configuration of the given index.
 

Method Detail

extractContent

I_CmsExtractionResult extractContent(CmsObject cms,
                                     CmsResource resource,
                                     CmsSearchIndex index)
                                     throws CmsException
Extracts the content of a given index resource according to the resource file type and the configuration of the given index.

Parameters:
cms - the cms object
resource - the resource to extract the content from
index - the index to extract the content for
Returns:
the extracted content of the resource
Throws:
CmsException - if something goes wrong