public final class UnicodeDecode extends PrimitiveOp
The character codepoints for all strings are returned using a single vector `char_values`, with strings expanded to characters in row-major order.
The `row_splits` tensor indicates where the codepoints for each input string begin and end within the `char_values` tensor. In particular, the values for the `i`th string (in row-major order) are stored in the slice `[row_splits[i]:row_splits[i+1]]`. Thus:
| Modifier and Type | Class and Description |
|---|---|
static class |
UnicodeDecode.Options
Optional attributes for
UnicodeDecode |
operation| Modifier and Type | Method and Description |
|---|---|
Output<Integer> |
charValues()
A 1D int32 Tensor containing the decoded codepoints.
|
static UnicodeDecode |
create(Scope scope,
Operand<String> input,
String inputEncoding,
UnicodeDecode.Options... options)
Factory method to create a class to wrap a new UnicodeDecode operation to the graph.
|
static UnicodeDecode.Options |
errors(String errors) |
static UnicodeDecode.Options |
replaceControlCharacters(Boolean replaceControlCharacters) |
static UnicodeDecode.Options |
replacementChar(Long replacementChar) |
Output<Long> |
rowSplits()
A 1D int32 tensor containing the row splits.
|
equals, hashCode, toStringpublic static UnicodeDecode create(Scope scope, Operand<String> input, String inputEncoding, UnicodeDecode.Options... options)
scope - current graph scopeinput - The text to be decoded. Can have any shape. Note that the output is flattened
to a vector of char values.inputEncoding - Text encoding of the input strings. This is any of the encodings supported
by ICU ucnv algorithmic converters. Examples: `"UTF-16", "US ASCII", "UTF-8"`.options - carries optional attributes valuespublic static UnicodeDecode.Options errors(String errors)
errors - Error handling policy when there is invalid formatting found in the input.
The value of 'strict' will cause the operation to produce a InvalidArgument
error on any invalid input formatting. A value of 'replace' (the default) will
cause the operation to replace any invalid formatting in the input with the
`replacement_char` codepoint. A value of 'ignore' will cause the operation to
skip any invalid formatting in the input and produce no corresponding output
character.public static UnicodeDecode.Options replacementChar(Long replacementChar)
replacementChar - The replacement character codepoint to be used in place of any invalid
formatting in the input when `errors='replace'`. Any valid unicode codepoint may
be used. The default value is the default unicode replacement character is
0xFFFD or U+65533.)public static UnicodeDecode.Options replaceControlCharacters(Boolean replaceControlCharacters)
replaceControlCharacters - Whether to replace the C0 control characters (00-1F) with the
`replacement_char`. Default is false.Copyright © 2015–2019. All rights reserved.