org.allenai.scienceparse.pipeline

Bucketizers

Related Doc: package pipeline

object Bucketizers

This contains a bunch of helper functions stolen from the pipeline code. We need it here to anticipate how well the pipeline will work with the output from science-parse.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. Bucketizers
  2. AnyRef
  3. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  5. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. val concatChar: String

  7. def cutoffFilter(b: String, cutoffOption: Option[Int], highFreqs: Map[String, Int]): Boolean

  8. val defaultAllowTruncated: Boolean

  9. val defaultNameCutoffThreshold: Int

  10. val defaultNameNgramLength: Int

  11. val defaultTitleCutoffThreshold: Int

  12. val defaultTitleNgramLength: Int

  13. val defaultUpto: Int

  14. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  15. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  16. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  17. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  18. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  19. val highFreqNameNgramStream: InputStream

  20. lazy val highFreqNameNgrams: Map[String, Int]

  21. val highFreqTitleNgramStream: InputStream

    This file contains 225 high-frequency n-grams from title prefixes.

    This file contains 225 high-frequency n-grams from title prefixes. High means the S2 * Dblp bucket size is > 1M. (Early Sept. 2015) n is 2, 3, 4, 5.

  22. lazy val highFreqTitleNgrams: Map[String, Int]

  23. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  24. def loadHighFreqs(is: InputStream): Map[String, Int]

  25. def nameNgrams(name: String): Iterator[String]

  26. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  27. def ngramAux(chunks: Array[String], n: Int, cutoffOption: Option[Int], allowTruncated: Boolean, highFreqs: Map[String, Int], upto: Int): Iterator[String]

  28. def ngrams(text: String, n: Int, cutoffOption: Option[Int], allowTruncated: Boolean = defaultAllowTruncated, highFreqs: Map[String, Int] = highFreqTitleNgrams, upto: Int = defaultUpto): Iterator[String]

    Returns a list of ngrams.

    Returns a list of ngrams. If cutoff is specified, continue to add more words until the result has frequency lower than the cutoff value. If allowTruncated is set to true, accept ngrams that have length less than n. For example, if the text is "local backbones" and n = 3, we will generate the ngram "local_backbones".

  29. final def notify(): Unit

    Definition Classes
    AnyRef
  30. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  31. def simple3TitlePrefix(text: String): List[String]

    This is used in V1.

  32. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  33. def tailNgrams(text: String, n: Int, cutoffOption: Option[Int], allowTruncated: Boolean = defaultAllowTruncated, highFreqs: Map[String, Int] = highFreqTitleNgrams, upto: Int = defaultUpto): Iterator[String]

  34. def titleNgrams(title: String, upto: Int, allowTruncated: Boolean = defaultAllowTruncated): Iterator[String]

  35. def titleTailNgrams(title: String, upto: Int = 1, allowTruncated: Boolean = defaultAllowTruncated): Iterator[String]

  36. def toBucket(s: String): String

  37. def toBucket(words: Iterable[String]): String

  38. def toString(): String

    Definition Classes
    AnyRef → Any
  39. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  40. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  41. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  42. def words(text: String, maxCount: Int = 40): Array[String]

    Return the array of tokens for the given input.

    Return the array of tokens for the given input. Limit number of tokens to maxCount

Inherited from AnyRef

Inherited from Any

Ungrouped