Class WANDScorer


final class WANDScorer extends Scorer
This implements the WAND (Weak AND) algorithm for dynamic pruning described in "Efficient Query Evaluation using a Two-Level Retrieval Process" by Broder, Carmel, Herscovici, Soffer and Zien. Enhanced with techniques described in "Faster Top-k Document Retrieval Using Block-Max Indexes" by Ding and Suel. For scoreMode == ScoreMode.TOP_SCORES, this scorer maintains a feedback loop with the collector in order to know at any time the minimum score that is required in order for a hit to be competitive.

The implementation supports both minCompetitiveScore by enforce that ∑ max_score >= minCompetitiveScore, and minShouldMatch by enforcing freq >= minShouldMatch. It keeps sub scorers in 3 different places: - tail: a heap that contains scorers that are behind the desired doc ID. These scorers are ordered by cost so that we can advance the least costly ones first. - lead: a linked list of scorer that are positioned on the desired doc ID - head: a heap that contains scorers which are beyond the desired doc ID, ordered by doc ID in order to move quickly to the next candidate.

When scoreMode == ScoreMode.TOP_SCORES, it leverages the max score from each scorer in order to know when it may call DocIdSetIterator.advance(int) rather than DocIdSetIterator.nextDoc() to move to the next competitive hit. When scoreMode != ScoreMode.TOP_SCORES, block-max scoring related logic is skipped. Finding the next match consists of first setting the desired doc ID to the least entry in 'head', and then advance 'tail' until there is a match, by meeting the configured freq >= minShouldMatch and / or ∑ max_score >= minCompetitiveScore requirements.

  • Field Details

    • FLOAT_MANTISSA_BITS

      static final int FLOAT_MANTISSA_BITS
      See Also:
    • MAX_SCALED_SCORE

      private static final long MAX_SCALED_SCORE
      See Also:
    • scalingFactor

      private final int scalingFactor
    • minCompetitiveScore

      private long minCompetitiveScore
    • lead

    • doc

      int doc
    • leadMaxScore

      long leadMaxScore
    • tail

      final DisiWrapper[] tail
    • tailMaxScore

      long tailMaxScore
    • tailSize

      int tailSize
    • cost

      final long cost
    • maxScorePropagator

      final MaxScoreSumPropagator maxScorePropagator
    • upTo

      int upTo
    • minShouldMatch

      final int minShouldMatch
    • freq

      int freq
    • scoreMode

      final ScoreMode scoreMode
  • Constructor Details

  • Method Details

    • scalingFactor

      static int scalingFactor(float f)
      Return a scaling factor for the given float so that f x 2^scalingFactor would be in [2^23, 2^24[. Special cases:
          scalingFactor(0) = scalingFactor(MIN_VALUE) + 1
          scalingFactor(+Infty) = scalingFactor(MAX_VALUE) - 1
       
    • scaleMaxScore

      static long scaleMaxScore(float maxScore, int scalingFactor)
      Scale max scores in an unsigned integer to avoid overflows (only the lower 32 bits of the long are used) as well as floating-point arithmetic errors. Those are rounded up in order to make sure we do not miss any matches.
    • scaleMinScore

      private static long scaleMinScore(float minScore, int scalingFactor)
      Scale min competitive scores the same way as max scores but this time by rounding down in order to make sure that we do not miss any matches.
    • ensureConsistent

      private boolean ensureConsistent()
    • setMinCompetitiveScore

      public void setMinCompetitiveScore(float minScore) throws IOException
      Description copied from class: Scorable
      Optional method: Tell the scorer that its iterator may safely ignore all documents whose score is less than the given minScore. This is a no-op by default.

      This method may only be called from collectors that use ScoreMode.TOP_SCORES, and successive calls may only set increasing values of minScore.

      Overrides:
      setMinCompetitiveScore in class Scorable
      Throws:
      IOException
    • getChildren

      public final Collection<Scorable.ChildScorable> getChildren() throws IOException
      Description copied from class: Scorable
      Returns child sub-scorers positioned on the current document
      Overrides:
      getChildren in class Scorable
      Throws:
      IOException
    • iterator

      public DocIdSetIterator iterator()
      Description copied from class: Scorer
      Return a DocIdSetIterator over matching documents.

      The returned iterator will either be positioned on -1 if no documents have been scored yet, DocIdSetIterator.NO_MORE_DOCS if all documents have been scored already, or the last document id that has been scored otherwise.

      The returned iterator is a view: calling this method several times will return iterators that have the same state.

      Specified by:
      iterator in class Scorer
    • twoPhaseIterator

      public TwoPhaseIterator twoPhaseIterator()
      Description copied from class: Scorer
      Optional method: Return a TwoPhaseIterator view of this Scorer. A return value of null indicates that two-phase iteration is not supported.

      Note that the returned TwoPhaseIterator's approximation must advance synchronously with the Scorer.iterator(): advancing the approximation must advance the iterator and vice-versa.

      Implementing this method is typically useful on Scorers that have a high per-document overhead in order to confirm matches.

      The default implementation returns null.

      Overrides:
      twoPhaseIterator in class Scorer
    • addLead

      private void addLead(DisiWrapper lead)
      Add a disi to the linked list of leads.
    • pushBackLeads

      private void pushBackLeads(int target) throws IOException
      Move disis that are in 'lead' back to the tail.
      Throws:
      IOException
    • advanceHead

      private void advanceHead(int target) throws IOException
      Make sure all disis in 'head' are on or after 'target'.
      Throws:
      IOException
    • advanceTail

      private void advanceTail(DisiWrapper disi) throws IOException
      Throws:
      IOException
    • advanceTail

      private void advanceTail() throws IOException
      Pop the entry from the 'tail' that has the greatest score contribution, advance it to the current doc and then add it to 'lead' or 'head' depending on whether it matches.
      Throws:
      IOException
    • updateMaxScores

      private void updateMaxScores(int target) throws IOException
      Throws:
      IOException
    • updateMaxScoresIfNecessary

      private void updateMaxScoresIfNecessary(int target) throws IOException
      Update upTo and maximum scores of sub scorers so that upTo is greater than or equal to the next candidate after target, i.e. the top of `head`.
      Throws:
      IOException
    • moveToNextCandidate

      private void moveToNextCandidate(int target) throws IOException
      Set 'doc' to the next potential match, and move all disis of 'head' that are on this doc into 'lead'.
      Throws:
      IOException
    • doNextCompetitiveCandidate

      private int doNextCompetitiveCandidate() throws IOException
      Move iterators to the tail until there is a potential match.
      Throws:
      IOException
    • advanceAllTail

      private void advanceAllTail() throws IOException
      Advance all entries from the tail to know about all matches on the current doc.
      Throws:
      IOException
    • score

      public float score() throws IOException
      Description copied from class: Scorable
      Returns the score of the current document matching the query.
      Specified by:
      score in class Scorable
      Throws:
      IOException
    • advanceShallow

      public int advanceShallow(int target) throws IOException
      Description copied from class: Scorer
      Advance to the block of documents that contains target in order to get scoring information about this block. This method is implicitly called by DocIdSetIterator.advance(int) and DocIdSetIterator.nextDoc() on the returned doc ID. Calling this method doesn't modify the current DocIdSetIterator.docID(). It returns a number that is greater than or equal to all documents contained in the current block, but less than any doc IDS of the next block. target must be >= Scorable.docID() as well as all targets that have been passed to Scorer.advanceShallow(int) so far.
      Overrides:
      advanceShallow in class Scorer
      Throws:
      IOException
    • getMaxScore

      public float getMaxScore(int upTo) throws IOException
      Description copied from class: Scorer
      Return the maximum score that documents between the last target that this iterator was shallow-advanced to included and upTo included.
      Specified by:
      getMaxScore in class Scorer
      Throws:
      IOException
    • docID

      public int docID()
      Description copied from class: Scorable
      Returns the doc ID that is currently being scored.
      Specified by:
      docID in class Scorable
    • insertTailWithOverFlow

      private DisiWrapper insertTailWithOverFlow(DisiWrapper s)
      Insert an entry in 'tail' and evict the least-costly scorer if full.
    • addTail

      private void addTail(DisiWrapper s)
      Add an entry to 'tail'. Fails if over capacity.
    • popTail

      private DisiWrapper popTail()
      Pop the least-costly scorer from 'tail'.
    • upHeapMaxScore

      private static void upHeapMaxScore(DisiWrapper[] heap, int i)
      Heap helpers
    • downHeapMaxScore

      private static void downHeapMaxScore(DisiWrapper[] heap, int size)
    • greaterMaxScore

      private static boolean greaterMaxScore(DisiWrapper w1, DisiWrapper w2)
      In the tail, we want to get first entries that produce the maximum scores and in case of ties (eg. constant-score queries), those that have the least cost so that they are likely to advance further.