Class Stemmer

java.lang.Object
org.apache.lucene.analysis.hunspell.Stemmer

final class Stemmer extends Object
Stemmer uses the affix rules declared in the Dictionary to generate one or more stems for a word. It conforms to the algorithm in the original hunspell algorithm, including recursive suffix stripping.
  • Field Details

    • dictionary

      private final Dictionary dictionary
    • formStep

      private final int formStep
  • Constructor Details

    • Stemmer

      public Stemmer(Dictionary dictionary)
      Constructs a new Stemmer which will use the provided Dictionary to create its stems.
      Parameters:
      dictionary - Dictionary that will be used to create the stems
  • Method Details

    • stem

      public List<CharsRef> stem(String word)
      Find the stem(s) of the provided word.
      Parameters:
      word - Word to find the stems for
      Returns:
      List of stems for the word
    • stem

      public List<CharsRef> stem(char[] word, int length)
      Find the stem(s) of the provided word
      Parameters:
      word - Word to find the stems for
      Returns:
      List of stems for the word
    • varyCase

      boolean varyCase(char[] word, int length, WordCase wordCase, Stemmer.CaseVariationProcessor processor)
    • caseOf

      WordCase caseOf(char[] word, int length)
      returns EXACT_CASE,TITLE_CASE, or UPPER_CASE type for the word
    • caseFoldTitle

      private char[] caseFoldTitle(char[] word, int length)
      folds titlecase variant of word to titleBuffer
    • caseFoldLower

      private char[] caseFoldLower(char[] word, int length)
      folds lowercase variant of word (title cased) to lowerBuffer
    • capitalizeAfterApostrophe

      private static char[] capitalizeAfterApostrophe(char[] word, int length)
    • varySharpS

      private boolean varySharpS(char[] word, int length, Stemmer.CaseVariationProcessor processor)
    • doStem

      boolean doStem(char[] word, int offset, int length, WordContext context, Stemmer.RootProcessor processor)
    • uniqueStems

      public List<CharsRef> uniqueStems(char[] word, int length)
      Find the unique stem(s) of the provided word
      Parameters:
      word - Word to find the stems for
      Returns:
      List of stems for the word
    • stemException

      private String stemException(int morphDataId)
    • newStem

      private CharsRef newStem(CharsRef stem, int morphDataId)
    • stem

      private boolean stem(char[] word, int offset, int length, WordContext context, int previous, char prevFlag, int prefixId, int recursionDepth, boolean doPrefix, boolean previousWasPrefix, Stemmer.RootProcessor processor)
      Generates a list of stems for the provided word
      Parameters:
      word - Word to generate the stems for
      previous - previous affix that was removed (so we dont remove same one twice)
      prevFlag - Flag from a previous stemming step that need to be cross-checked with any affixes in this recursive step
      prefixId - ID of the most inner removed prefix, so that when removing a suffix, it's also checked against the word
      recursionDepth - current recursiondepth
      doPrefix - true if we should remove prefixes
      previousWasPrefix - true if the previous removal was a prefix: if we are removing a suffix, and it has no continuation requirements, it's ok. but two prefixes (COMPLEXPREFIXES) or two suffixes must have continuation requirements to recurse.
      Returns:
      whether the processing should be continued
    • stripAffix

      private char[] stripAffix(char[] word, int offset, int length, int affixLen, int affix, boolean isPrefix)
      Returns:
      null if affix conditions isn't met; a reference to the same char[] if the affix has no strip data and can thus be simply removed, or a new char[] containing the word affix removal
    • isAffixCompatible

      private boolean isAffixCompatible(int affix, char prevFlag, int recursionDepth, boolean isPrefix, boolean previousWasPrefix, WordContext context)
    • applyAffix

      private boolean applyAffix(char[] strippedWord, int offset, int length, WordContext context, int affix, int previousAffix, int prefixId, int recursionDepth, boolean prefix, Stemmer.RootProcessor processor)
      Applies the affix rule to the given word, producing a list of stems if any are found
      Parameters:
      strippedWord - Char array containing the word with the affix removed and the strip added
      offset - where the word actually starts in the array
      length - the length of the stripped word
      affix - HunspellAffix representing the affix rule itself
      prefixId - when we already stripped a prefix, we can't simply recurse and check the suffix, unless both are compatible so we must check dictionary form against both to add it as a stem!
      recursionDepth - current recursion depth
      prefix - true if we are removing a prefix (false if it's a suffix)
      Returns:
      whether the processing should be continued
    • isRootCompatibleWithContext

      private boolean isRootCompatibleWithContext(WordContext context, int lastAffix, int entryId)
    • callProcessor

      private boolean callProcessor(char[] word, int offset, int length, Stemmer.RootProcessor processor, IntsRef forms, int i)
    • needsAnotherAffix

      private boolean needsAnotherAffix(int affix, int previousAffix, boolean isSuffix, int prefixId)
    • isFlagAppendedByAffix

      private boolean isFlagAppendedByAffix(int affixId, char flag)