public final class UnicodeDecodeWithOffsets extends PrimitiveOp
The character codepoints for all strings are returned using a single vector `char_values`, with strings expanded to characters in row-major order. Similarly, the character start byte offsets are returned using a single vector `char_to_byte_starts`, with strings expanded in row-major order.
The `row_splits` tensor indicates where the codepoints and start offsets for each input string begin and end within the `char_values` and `char_to_byte_starts` tensors. In particular, the values for the `i`th string (in row-major order) are stored in the slice `[row_splits[i]:row_splits[i+1]]`. Thus:
| Modifier and Type | Class and Description |
|---|---|
static class |
UnicodeDecodeWithOffsets.Options
Optional attributes for
UnicodeDecodeWithOffsets |
operation| Modifier and Type | Method and Description |
|---|---|
Output<Long> |
charToByteStarts()
A 1D int32 Tensor containing the byte index in the input string where each
character in `char_values` starts.
|
Output<Integer> |
charValues()
A 1D int32 Tensor containing the decoded codepoints.
|
static UnicodeDecodeWithOffsets |
create(Scope scope,
Operand<String> input,
String inputEncoding,
UnicodeDecodeWithOffsets.Options... options)
Factory method to create a class to wrap a new UnicodeDecodeWithOffsets operation to the graph.
|
static UnicodeDecodeWithOffsets.Options |
errors(String errors) |
static UnicodeDecodeWithOffsets.Options |
replaceControlCharacters(Boolean replaceControlCharacters) |
static UnicodeDecodeWithOffsets.Options |
replacementChar(Long replacementChar) |
Output<Long> |
rowSplits()
A 1D int32 tensor containing the row splits.
|
equals, hashCode, toStringpublic static UnicodeDecodeWithOffsets create(Scope scope, Operand<String> input, String inputEncoding, UnicodeDecodeWithOffsets.Options... options)
scope - current graph scopeinput - The text to be decoded. Can have any shape. Note that the output is flattened
to a vector of char values.inputEncoding - Text encoding of the input strings. This is any of the encodings supported
by ICU ucnv algorithmic converters. Examples: `"UTF-16", "US ASCII", "UTF-8"`.options - carries optional attributes valuespublic static UnicodeDecodeWithOffsets.Options errors(String errors)
errors - Error handling policy when there is invalid formatting found in the input.
The value of 'strict' will cause the operation to produce a InvalidArgument
error on any invalid input formatting. A value of 'replace' (the default) will
cause the operation to replace any invalid formatting in the input with the
`replacement_char` codepoint. A value of 'ignore' will cause the operation to
skip any invalid formatting in the input and produce no corresponding output
character.public static UnicodeDecodeWithOffsets.Options replacementChar(Long replacementChar)
replacementChar - The replacement character codepoint to be used in place of any invalid
formatting in the input when `errors='replace'`. Any valid unicode codepoint may
be used. The default value is the default unicode replacement character is
0xFFFD or U+65533.)public static UnicodeDecodeWithOffsets.Options replaceControlCharacters(Boolean replaceControlCharacters)
replaceControlCharacters - Whether to replace the C0 control characters (00-1F) with the
`replacement_char`. Default is false.Copyright © 2015–2019. All rights reserved.