Packages

final class UTF8String extends Comparable[UTF8String] with Externalizable with KryoSerializable with Cloneable

A UTF-8 String for internal Spark use.

A String encoded in UTF-8 as an Array[Byte], which can be used for comparison, search, see http://en.wikipedia.org/wiki/UTF-8 for details.

Note: This is not designed for general use cases, should not be used outside SQL.

Linear Supertypes
Cloneable, KryoSerializable, Externalizable, Serializable, Comparable[UTF8String], AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. UTF8String
  2. Cloneable
  3. KryoSerializable
  4. Externalizable
  5. Serializable
  6. Comparable
  7. AnyRef
  8. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new UTF8String()
  2. new UTF8String(base: Any, offset: Long, numBytes: Int)
    Attributes
    protected[types]

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): UTF8String
    Definition Classes
    UTF8String → AnyRef
  6. def compare(other: UTF8String): Int
  7. def compareTo(other: UTF8String): Int
    Definition Classes
    UTF8String → Comparable
  8. def contains(substring: UTF8String): Boolean

    Returns whether this contains substring or not.

  9. def copy(): UTF8String
  10. def endsWith(suffix: UTF8String): Boolean
  11. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  12. def equals(other: Any): Boolean
    Definition Classes
    UTF8String → AnyRef → Any
  13. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  14. def findInSet(match: UTF8String): Int
  15. def getBaseObject(): AnyRef
  16. def getBaseOffset(): Long
  17. def getByteBuffer(): ByteBuffer

    Returns a ByteBuffer wrapping the base object if it is a byte array or a copy of the data if the base object is not a byte array.

    Returns a ByteBuffer wrapping the base object if it is a byte array or a copy of the data if the base object is not a byte array.

    Unlike getBytes this will not create a copy the array if this is a slice.

  18. def getBytes(): Array[Byte]

    Returns the underline bytes, will be a copy of it if it's part of another array.

  19. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  20. def getPrefix(): Long

    Returns a 64-bit integer that can be used as the prefix used in sorting.

  21. def hashCode(): Int
    Definition Classes
    UTF8String → AnyRef → Any
  22. def indexOf(v: UTF8String, start: Int): Int

    Returns the position of the first occurrence of substr in current string from the specified position (0-based index).

    Returns the position of the first occurrence of substr in current string from the specified position (0-based index).

    v

    the string to be searched

    start

    the start position of the current string for searching

    returns

    the position of the first occurrence of substr, if not found, -1 returned.

  23. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  24. def levenshteinDistance(other: UTF8String): Int

    Levenshtein distance is a metric for measuring the distance of two strings.

    Levenshtein distance is a metric for measuring the distance of two strings. The distance is defined by the minimum number of single-character edits (i.e. insertions, deletions or substitutions) that are required to change one of the strings into the other.

  25. def lpad(len: Int, pad: UTF8String): UTF8String

    Returns str, left-padded with pad to a length of len.

    Returns str, left-padded with pad to a length of len. For example: ('hi', 5, '??') => '???hi' ('hi', 1, '??') => 'h'

  26. def matchAt(s: UTF8String, pos: Int): Boolean
  27. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  28. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  29. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  30. def numBytes(): Int

    Returns the number of bytes

  31. def numChars(): Int

    Returns the number of code points in it.

  32. def read(kryo: Kryo, in: Input): Unit
    Definition Classes
    UTF8String → KryoSerializable
  33. def readExternal(in: ObjectInput): Unit
    Definition Classes
    UTF8String → Externalizable
  34. def repeat(times: Int): UTF8String
  35. def replace(search: UTF8String, replace: UTF8String): UTF8String
  36. def reverse(): UTF8String
  37. def rpad(len: Int, pad: UTF8String): UTF8String

    Returns str, right-padded with pad to a length of len For example: ('hi', 5, '??') => 'hi???' ('hi', 1, '??') => 'h'

  38. def soundex(): UTF8String

    Encodes a string into a Soundex value.

    Encodes a string into a Soundex value. Soundex is an encoding used to relate similar names, but can also be used as a general purpose scheme to find word with similar phonemes. https://en.wikipedia.org/wiki/Soundex

  39. def split(pattern: UTF8String, limit: Int): Array[UTF8String]
  40. def startsWith(prefix: UTF8String): Boolean
  41. def subStringIndex(delim: UTF8String, count: Int): UTF8String

    Returns the substring from string str before count occurrences of the delimiter delim.

    Returns the substring from string str before count occurrences of the delimiter delim. If count is positive, everything the left of the final delimiter (counting from left) is returned. If count is negative, every to the right of the final delimiter (counting from the right) is returned. subStringIndex performs a case-sensitive match when searching for delim.

  42. def substring(start: Int, until: Int): UTF8String

    Returns a substring of this.

    Returns a substring of this.

    start

    the position of first code point

    until

    the position after last code point, exclusive.

  43. def substringSQL(pos: Int, length: Int): UTF8String
  44. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  45. def toByte(intWrapper: IntWrapper): Boolean
  46. def toByteExact(): Byte
  47. def toInt(intWrapper: IntWrapper): Boolean

    Parses this UTF8String(trimmed if needed) to int.

    Parses this UTF8String(trimmed if needed) to int.

    Note that, in this method we accumulate the result in negative format, and convert it to positive format at the end, if this string is not started with '-'. This is because min value is bigger than max value in digits, e.g. Integer.MAX_VALUE is '2147483647' and Integer.MIN_VALUE is '-2147483648'.

    This code is mostly copied from LazyInt.parseInt in Hive.

    Note that, this method is almost same as toLong, but we leave it duplicated for performance reasons, like Hive does.

    intWrapper

    If a valid int was parsed from this UTF8String, then its value would be set in intWrapper

    returns

    true if the parsing was successful else false

  48. def toIntExact(): Int

    Parses UTF8String(trimmed if needed) to int.

    Parses UTF8String(trimmed if needed) to int. This method is used when ANSI is enabled.

    returns

    If string contains valid numeric value then it returns the int value otherwise a NumberFormatException is thrown.

  49. def toLong(toLongResult: LongWrapper): Boolean

    Parses this UTF8String(trimmed if needed) to long.

    Parses this UTF8String(trimmed if needed) to long.

    Note that, in this method we accumulate the result in negative format, and convert it to positive format at the end, if this string is not started with '-'. This is because min value is bigger than max value in digits, e.g. Long.MAX_VALUE is '9223372036854775807' and Long.MIN_VALUE is '-9223372036854775808'.

    This code is mostly copied from LazyLong.parseLong in Hive.

    toLongResult

    If a valid long was parsed from this UTF8String, then its value would be set in toLongResult

    returns

    true if the parsing was successful else false

  50. def toLongExact(): Long

    Parses UTF8String(trimmed if needed) to long.

    Parses UTF8String(trimmed if needed) to long. This method is used when ANSI is enabled.

    returns

    If string contains valid numeric value then it returns the long value otherwise a NumberFormatException is thrown.

  51. def toLowerCase(): UTF8String

    Returns the lower case of this string

  52. def toShort(intWrapper: IntWrapper): Boolean
  53. def toShortExact(): Short
  54. def toString(): String
    Definition Classes
    UTF8String → AnyRef → Any
  55. def toTitleCase(): UTF8String

    Returns the title case of this string, that could be used as title.

  56. def toUpperCase(): UTF8String

    Returns the upper case of this string

  57. def translate(dict: Map[Character, Character]): UTF8String
  58. def trim(trimString: UTF8String): UTF8String

    Trims instances of the given trim string from both ends of this string.

    Trims instances of the given trim string from both ends of this string.

    trimString

    the trim character string

    returns

    this string with no occurrences of the trim string at the start or end, or null if trimString is null

  59. def trim(): UTF8String

    Trims space characters (ASCII 32) from both ends of this string.

    Trims space characters (ASCII 32) from both ends of this string.

    returns

    this string with no spaces at the start or end

  60. def trimAll(): UTF8String

    Trims whitespaces (<= ASCII 32) from both ends of this string.

    Trims whitespaces (<= ASCII 32) from both ends of this string.

    Note that, this method is the same as java's String#trim, and different from UTF8String#trim() which remove only spaces(= ASCII 32) from both ends.

    returns

    A UTF8String whose value is this UTF8String, with any leading and trailing white space removed, or this UTF8String if it has no leading or trailing whitespace.

  61. def trimLeft(trimString: UTF8String): UTF8String

    Trims instances of the given trim string from the start of this string.

    Trims instances of the given trim string from the start of this string.

    trimString

    the trim character string

    returns

    this string with no occurrences of the trim string at the start, or null if trimString is null

  62. def trimLeft(): UTF8String

    Trims space characters (ASCII 32) from the start of this string.

    Trims space characters (ASCII 32) from the start of this string.

    returns

    this string with no spaces at the start

  63. def trimRight(trimString: UTF8String): UTF8String

    Trims instances of the given trim string from the end of this string.

    Trims instances of the given trim string from the end of this string.

    trimString

    the trim character string

    returns

    this string with no occurrences of the trim string at the end, or null if trimString is null

  64. def trimRight(): UTF8String

    Trims space characters (ASCII 32) from the end of this string.

    Trims space characters (ASCII 32) from the end of this string.

    returns

    this string with no spaces at the end

  65. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  66. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  67. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  68. def write(kryo: Kryo, out: Output): Unit
    Definition Classes
    UTF8String → KryoSerializable
  69. def writeExternal(out: ObjectOutput): Unit
    Definition Classes
    UTF8String → Externalizable
  70. def writeTo(out: OutputStream): Unit
  71. def writeTo(buffer: ByteBuffer): Unit
  72. def writeToMemory(target: Any, targetOffset: Long): Unit

    Writes the content of this string into a memory address, identified by an object and an offset.

    Writes the content of this string into a memory address, identified by an object and an offset. The target memory address must already been allocated, and have enough space to hold all the bytes in this string.

Inherited from Cloneable

Inherited from KryoSerializable

Inherited from Externalizable

Inherited from Serializable

Inherited from Comparable[UTF8String]

Inherited from AnyRef

Inherited from Any

Ungrouped