Class WARCParser

java.lang.Object
org.apache.tika.parser.warc.WARCParser
All Implemented Interfaces:
Serializable, org.apache.tika.parser.Parser

public class WARCParser extends Object implements org.apache.tika.parser.Parser
This uses jwarc to parse warc files and arc files
See Also:
  • Field Details

    • WARC_PREFIX

      public static String WARC_PREFIX
    • WARC_HTTP_PREFIX

      public static String WARC_HTTP_PREFIX
    • WARC_HTTP_STATUS

      public static String WARC_HTTP_STATUS
    • WARC_HTTP_STATUS_REASON

      public static String WARC_HTTP_STATUS_REASON
  • Constructor Details

    • WARCParser

      public WARCParser()
  • Method Details

    • getSupportedTypes

      public Set<org.apache.tika.mime.MediaType> getSupportedTypes(org.apache.tika.parser.ParseContext context)
      Specified by:
      getSupportedTypes in interface org.apache.tika.parser.Parser
    • parse

      public void parse(InputStream stream, ContentHandler handler, org.apache.tika.metadata.Metadata metadata, org.apache.tika.parser.ParseContext context) throws IOException, SAXException, org.apache.tika.exception.TikaException
      Specified by:
      parse in interface org.apache.tika.parser.Parser
      Throws:
      IOException
      SAXException
      org.apache.tika.exception.TikaException