Class Util

java.lang.Object
org.apache.lucene.util.fst.Util

public final class Util extends Object
Static helper methods.
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    static class 
    Represents a path in TopNSearcher.
    static final class 
    Holds a single input (IntsRef) + output, returned by shortestPaths().
    private static class 
    Compares first by the provided comparator, and then tie breaks by path.input.
    static class 
    Utility class to find top N shortest paths from start point(s).
    static final class 
    Holds the results for a top N search using Util.TopNSearcher
  • Constructor Summary

    Constructors
    Modifier
    Constructor
    Description
    private
     
  • Method Summary

    Modifier and Type
    Method
    Description
    (package private) static <T> int
    binarySearch(FST<T> fst, FST.Arc<T> arc, int targetLabel)
    Perform a binary search of Arcs encoded as a packed array
    private static void
    emitDotState(Writer out, String name, String shape, String color, String label)
    Emit a single state in the dot language.
    static <T> T
    get(FST<T> fst, BytesRef input)
    Looks up the output for this input, or null if the input is not accepted
    static <T> T
    get(FST<T> fst, IntsRef input)
    Looks up the output for this input, or null if the input is not accepted.
    private static String
    printableLabel(int label)
    Ensures an arc's label is indeed printable (dot uses US-ASCII).
    static <T> FST.Arc<T>
    readCeilArc(int label, FST<T> fst, FST.Arc<T> follow, FST.Arc<T> arc, FST.BytesReader in)
    Reads the first arc greater or equal than the given label into the provided arc in place and returns it iff found, otherwise return null.
    static <T> Util.TopResults<T>
    shortestPaths(FST<T> fst, FST.Arc<T> fromNode, T startOutput, Comparator<T> comparator, int topN, boolean allowEmptyString)
    Starting from node, find the top N min cost completions to a final node.
    static BytesRef
    Just converts IntsRef to BytesRef; you must ensure the int values fit into a byte.
    static <T> void
    toDot(FST<T> fst, Writer out, boolean sameRank, boolean labelStates)
    Dumps an FST to a GraphViz's dot language description for visualization.
    static IntsRef
    Just takes unsigned byte values from the BytesRef and converts into an IntsRef.
    static IntsRef
    Just maps each UTF16 unit (char) to the ints in an IntsRef.
    static IntsRef
    toUTF32(char[] s, int offset, int length, IntsRefBuilder scratch)
    Decodes the Unicode codepoints from the provided char[] and places them in the provided scratch IntsRef, which must not be null, returning it.
    static IntsRef
    Decodes the Unicode codepoints from the provided CharSequence and places them in the provided scratch IntsRef, which must not be null, returning it.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • Util

      private Util()
  • Method Details

    • get

      public static <T> T get(FST<T> fst, IntsRef input) throws IOException
      Looks up the output for this input, or null if the input is not accepted.
      Throws:
      IOException
    • get

      public static <T> T get(FST<T> fst, BytesRef input) throws IOException
      Looks up the output for this input, or null if the input is not accepted
      Throws:
      IOException
    • shortestPaths

      public static <T> Util.TopResults<T> shortestPaths(FST<T> fst, FST.Arc<T> fromNode, T startOutput, Comparator<T> comparator, int topN, boolean allowEmptyString) throws IOException
      Starting from node, find the top N min cost completions to a final node.
      Throws:
      IOException
    • toDot

      public static <T> void toDot(FST<T> fst, Writer out, boolean sameRank, boolean labelStates) throws IOException
      Dumps an FST to a GraphViz's dot language description for visualization. Example of use:
       PrintWriter pw = new PrintWriter("out.dot");
       Util.toDot(fst, pw, true, true);
       pw.close();
       
      and then, from command line:
       dot -Tpng -o out.png out.dot
       

      Note: larger FSTs (a few thousand nodes) won't even render, don't bother.

      Parameters:
      sameRank - If true, the resulting dot file will try to order states in layers of breadth-first traversal. This may mess up arcs, but makes the output FST's structure a bit clearer.
      labelStates - If true states will have labels equal to their offsets in their binary format. Expands the graph considerably.
      Throws:
      IOException
      See Also:
    • emitDotState

      private static void emitDotState(Writer out, String name, String shape, String color, String label) throws IOException
      Emit a single state in the dot language.
      Throws:
      IOException
    • printableLabel

      private static String printableLabel(int label)
      Ensures an arc's label is indeed printable (dot uses US-ASCII).
    • toUTF16

      public static IntsRef toUTF16(CharSequence s, IntsRefBuilder scratch)
      Just maps each UTF16 unit (char) to the ints in an IntsRef.
    • toUTF32

      public static IntsRef toUTF32(CharSequence s, IntsRefBuilder scratch)
      Decodes the Unicode codepoints from the provided CharSequence and places them in the provided scratch IntsRef, which must not be null, returning it.
    • toUTF32

      public static IntsRef toUTF32(char[] s, int offset, int length, IntsRefBuilder scratch)
      Decodes the Unicode codepoints from the provided char[] and places them in the provided scratch IntsRef, which must not be null, returning it.
    • toIntsRef

      public static IntsRef toIntsRef(BytesRef input, IntsRefBuilder scratch)
      Just takes unsigned byte values from the BytesRef and converts into an IntsRef.
    • toBytesRef

      public static BytesRef toBytesRef(IntsRef input, BytesRefBuilder scratch)
      Just converts IntsRef to BytesRef; you must ensure the int values fit into a byte.
    • readCeilArc

      public static <T> FST.Arc<T> readCeilArc(int label, FST<T> fst, FST.Arc<T> follow, FST.Arc<T> arc, FST.BytesReader in) throws IOException
      Reads the first arc greater or equal than the given label into the provided arc in place and returns it iff found, otherwise return null.
      Parameters:
      label - the label to ceil on
      fst - the fst to operate on
      follow - the arc to follow reading the label from
      arc - the arc to read into in place
      in - the fst's FST.BytesReader
      Throws:
      IOException
    • binarySearch

      static <T> int binarySearch(FST<T> fst, FST.Arc<T> arc, int targetLabel) throws IOException
      Perform a binary search of Arcs encoded as a packed array
      Type Parameters:
      T - the output type of the FST
      Parameters:
      fst - the FST from which to read
      arc - the starting arc; sibling arcs greater than this will be searched. Usually the first arc in the array.
      targetLabel - the label to search for
      Returns:
      the index of the Arc having the target label, or if no Arc has the matching label, -1 - idx), where idx is the index of the Arc with the next highest label, or the total number of arcs if the target label exceeds the maximum.
      Throws:
      IOException - when the FST reader does