C D E G H I M N O P R S U X
All Classes All Packages
All Classes All Packages
All Classes All Packages
C
- contains(Charset) - Method in class org.apache.tika.parser.html.charsetdetector.charsets.ReplacementCharset
- contains(Charset) - Method in class org.apache.tika.parser.html.charsetdetector.charsets.XUserDefinedCharset
D
- DataURIScheme - Class in org.apache.tika.parser.html
- DataURISchemeParseException - Exception in org.apache.tika.parser.html
- DataURISchemeParseException(String) - Constructor for exception org.apache.tika.parser.html.DataURISchemeParseException
- DataURISchemeUtil - Class in org.apache.tika.parser.html
-
Not thread safe.
- DataURISchemeUtil() - Constructor for class org.apache.tika.parser.html.DataURISchemeUtil
- DefaultHtmlMapper - Class in org.apache.tika.parser.html
-
The default HTML mapping rules in Tika.
- DefaultHtmlMapper() - Constructor for class org.apache.tika.parser.html.DefaultHtmlMapper
- detect(InputStream, Metadata) - Method in class org.apache.tika.parser.html.charsetdetector.StandardHtmlEncodingDetector
- detect(InputStream, Metadata) - Method in class org.apache.tika.parser.html.HtmlEncodingDetector
E
- equals(Object) - Method in class org.apache.tika.parser.html.DataURIScheme
- extract(String) - Method in class org.apache.tika.parser.html.DataURISchemeUtil
-
Extracts DataURISchemes from free text, as in javascript.
G
- getInputStream() - Method in class org.apache.tika.parser.html.DataURIScheme
- getMarkLimit() - Method in class org.apache.tika.parser.html.charsetdetector.StandardHtmlEncodingDetector
- getMarkLimit() - Method in class org.apache.tika.parser.html.HtmlEncodingDetector
- getMediaType() - Method in class org.apache.tika.parser.html.DataURIScheme
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.html.HtmlParser
H
- hashCode() - Method in class org.apache.tika.parser.html.DataURIScheme
- HtmlEncodingDetector - Class in org.apache.tika.parser.html
-
Character encoding detector for determining the character encoding of a HTML document based on the potential charset parameter found in a Content-Type http-equiv meta tag somewhere near the beginning.
- HtmlEncodingDetector() - Constructor for class org.apache.tika.parser.html.HtmlEncodingDetector
- HtmlMapper - Interface in org.apache.tika.parser.html
-
HTML mapper used to make incoming HTML documents easier to handle by Tika clients.
- HtmlParser - Class in org.apache.tika.parser.html
-
HTML parser.
- HtmlParser() - Constructor for class org.apache.tika.parser.html.HtmlParser
- HtmlParser(EncodingDetector) - Constructor for class org.apache.tika.parser.html.HtmlParser
I
- IdentityHtmlMapper - Class in org.apache.tika.parser.html
-
Alternative HTML mapping rules that pass the input HTML as-is without any modifications.
- IdentityHtmlMapper() - Constructor for class org.apache.tika.parser.html.IdentityHtmlMapper
- INSTANCE - Static variable in class org.apache.tika.parser.html.DefaultHtmlMapper
- INSTANCE - Static variable in class org.apache.tika.parser.html.IdentityHtmlMapper
- isBase64() - Method in class org.apache.tika.parser.html.DataURIScheme
- isDiscardElement(String) - Method in class org.apache.tika.parser.html.DefaultHtmlMapper
- isDiscardElement(String) - Method in interface org.apache.tika.parser.html.HtmlMapper
-
Checks whether all content within the given HTML element should be discarded instead of including it in the parse output.
- isDiscardElement(String) - Method in class org.apache.tika.parser.html.HtmlParser
-
Deprecated.Use the
HtmlMappermechanism to customize the HTML mapping. This method will be removed in Tika 1.0. - isDiscardElement(String) - Method in class org.apache.tika.parser.html.IdentityHtmlMapper
- isExtractScripts() - Method in class org.apache.tika.parser.html.HtmlParser
M
- mapSafeAttribute(String, String) - Method in class org.apache.tika.parser.html.DefaultHtmlMapper
-
Normalizes an attribute name.
- mapSafeAttribute(String, String) - Method in interface org.apache.tika.parser.html.HtmlMapper
-
Maps "safe" HTML attribute names to semantic XHTML equivalents.
- mapSafeAttribute(String, String) - Method in class org.apache.tika.parser.html.HtmlParser
-
Deprecated.Use the
HtmlMappermechanism to customize the HTML mapping. This method will be removed in Tika 1.0. - mapSafeAttribute(String, String) - Method in class org.apache.tika.parser.html.IdentityHtmlMapper
- mapSafeElement(String) - Method in class org.apache.tika.parser.html.DefaultHtmlMapper
- mapSafeElement(String) - Method in interface org.apache.tika.parser.html.HtmlMapper
-
Maps "safe" HTML element names to semantic XHTML equivalents.
- mapSafeElement(String) - Method in class org.apache.tika.parser.html.HtmlParser
-
Deprecated.Use the
HtmlMappermechanism to customize the HTML mapping. This method will be removed in Tika 1.0. - mapSafeElement(String) - Method in class org.apache.tika.parser.html.IdentityHtmlMapper
N
- newDecoder() - Method in class org.apache.tika.parser.html.charsetdetector.charsets.ReplacementCharset
- newDecoder() - Method in class org.apache.tika.parser.html.charsetdetector.charsets.XUserDefinedCharset
- newEncoder() - Method in class org.apache.tika.parser.html.charsetdetector.charsets.ReplacementCharset
- newEncoder() - Method in class org.apache.tika.parser.html.charsetdetector.charsets.XUserDefinedCharset
- NotImplementedException(String) - Constructor for exception org.apache.tika.parser.html.charsetdetector.charsets.XUserDefinedCharset.NotImplementedException
O
- org.apache.tika.parser.html - package org.apache.tika.parser.html
- org.apache.tika.parser.html.charsetdetector - package org.apache.tika.parser.html.charsetdetector
- org.apache.tika.parser.html.charsetdetector.charsets - package org.apache.tika.parser.html.charsetdetector.charsets
P
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.html.HtmlParser
- parse(String) - Method in class org.apache.tika.parser.html.DataURISchemeUtil
R
- ReplacementCharset - Class in org.apache.tika.parser.html.charsetdetector.charsets
-
An implementation of the standard "replacement" charset defined by the W3C.
- ReplacementCharset() - Constructor for class org.apache.tika.parser.html.charsetdetector.charsets.ReplacementCharset
S
- setExtractScripts(boolean) - Method in class org.apache.tika.parser.html.HtmlParser
-
Whether or not to extract contents in script entities.
- setMarkLimit(int) - Method in class org.apache.tika.parser.html.charsetdetector.StandardHtmlEncodingDetector
-
How far into the stream to read for charset detection.
- setMarkLimit(int) - Method in class org.apache.tika.parser.html.HtmlEncodingDetector
-
How far into the stream to read for charset detection.
- StandardHtmlEncodingDetector - Class in org.apache.tika.parser.html.charsetdetector
-
An encoding detector that tries to respect the spirit of the HTML spec part 12.2.3 "The input byte stream", or at least the part that is compatible with the implementation of tika.
- StandardHtmlEncodingDetector() - Constructor for class org.apache.tika.parser.html.charsetdetector.StandardHtmlEncodingDetector
U
- UNSPECIFIED_MEDIA_TYPE - Static variable in class org.apache.tika.parser.html.DataURISchemeUtil
X
- XUserDefinedCharset - Class in org.apache.tika.parser.html.charsetdetector.charsets
- XUserDefinedCharset() - Constructor for class org.apache.tika.parser.html.charsetdetector.charsets.XUserDefinedCharset
- XUserDefinedCharset.NotImplementedException - Exception in org.apache.tika.parser.html.charsetdetector.charsets
All Classes All Packages