Package org.apache.lucene.index
Class TermsHashPerField
java.lang.Object
org.apache.lucene.index.TermsHashPerField
- All Implemented Interfaces:
Comparable<TermsHashPerField>
- Direct Known Subclasses:
FreqProxTermsWriterPerField
,TermVectorsConsumerPerField
This class stores streams of information per term without knowing the size of the stream ahead of
time. Each stream typically encodes one level of information like term frequency per document or
term proximity. Internally this class allocates a linked list of slices that can be read by a
ByteSliceReader
for each term. Terms are first deduplicated in a BytesRefHash
once this is done internal data-structures point to the current offset of each stream that can be
written to.-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprivate static final class
-
Field Summary
FieldsModifier and TypeFieldDescription(package private) final ByteBlockPool
private final BytesRefHash
private boolean
private final String
private static final int
(package private) final IndexOptions
private final IntBlockPool
private int
private final TermsHashPerField
(package private) ParallelPostingsArray
private int[]
private int
private final int
private int[]
-
Constructor Summary
ConstructorsConstructorDescriptionTermsHashPerField
(int streamCount, IntBlockPool intPool, ByteBlockPool bytePool, ByteBlockPool termBytePool, Counter bytesUsed, TermsHashPerField nextPerField, String fieldName, IndexOptions indexOptions) streamCount: how many streams this field stores per term. -
Method Summary
Modifier and TypeMethodDescriptionprivate void
add
(int textStart, int docID) (package private) void
Called once per inverted token.(package private) abstract void
addTerm
(int termID, int docID) Called when a previously seen term is seen again.private boolean
assertDocId
(int docId) final int
compareTo
(TermsHashPerField other) (package private) abstract ParallelPostingsArray
createPostingsArray
(int size) Creates a new postings array of the specified size.(package private) void
finish()
Finish adding all instances of this field to the current document.(package private) final String
(package private) final TermsHashPerField
(package private) final int
(package private) final int[]
Returns the sorted term IDs.(package private) final void
initReader
(ByteSliceReader reader, int termID, int stream) private void
initStreamSlices
(int termID, int docID) Called when we first encounter a new term.(package private) abstract void
Called when the postings array is initialized or resized.(package private) abstract void
newTerm
(int termID, int docID) Called when a term is seen for the first time.private int
positionStreamSlice
(int termID, int docID) (package private) final void
(package private) void
reset()
(package private) final void
Collapse the hash table and sort in-place; also sets this.sortedTermIDs to the results This method must not be called twice unlessreset()
orreinitHash()
was called.(package private) boolean
start
(IndexableField field, boolean first) Start adding a new field instance; first is true if this is the first time this field name was seen in the document.(package private) final void
writeByte
(int stream, byte b) (package private) final void
writeBytes
(int stream, byte[] b, int offset, int len) (package private) final void
writeVInt
(int stream, int i)
-
Field Details
-
HASH_INIT_SIZE
private static final int HASH_INIT_SIZE- See Also:
-
nextPerField
-
intPool
-
bytePool
-
termStreamAddressBuffer
private int[] termStreamAddressBuffer -
streamAddressOffset
private int streamAddressOffset -
streamCount
private final int streamCount -
fieldName
-
indexOptions
-
bytesHash
-
postingsArray
ParallelPostingsArray postingsArray -
lastDocID
private int lastDocID -
sortedTermIDs
private int[] sortedTermIDs -
doNextCall
private boolean doNextCall
-
-
Constructor Details
-
TermsHashPerField
TermsHashPerField(int streamCount, IntBlockPool intPool, ByteBlockPool bytePool, ByteBlockPool termBytePool, Counter bytesUsed, TermsHashPerField nextPerField, String fieldName, IndexOptions indexOptions) streamCount: how many streams this field stores per term. E.g. doc(+freq) is 1 stream, prox+offset is a second.
-
-
Method Details
-
reset
void reset() -
initReader
-
sortTerms
final void sortTerms()Collapse the hash table and sort in-place; also sets this.sortedTermIDs to the results This method must not be called twice unlessreset()
orreinitHash()
was called. -
getSortedTermIDs
final int[] getSortedTermIDs()Returns the sorted term IDs.sortTerms()
must be called before -
reinitHash
final void reinitHash() -
add
- Throws:
IOException
-
initStreamSlices
Called when we first encounter a new term. We must allocate slies to store the postings (vInt compressed doc/freq/prox), and also the int pointers to where (in our ByteBlockPool storage) the postings for this term begin.- Throws:
IOException
-
assertDocId
private boolean assertDocId(int docId) -
add
Called once per inverted token. This is the primary entry point (for first TermsHash); postings use this API.- Throws:
IOException
-
positionStreamSlice
- Throws:
IOException
-
writeByte
final void writeByte(int stream, byte b) -
writeBytes
final void writeBytes(int stream, byte[] b, int offset, int len) -
writeVInt
final void writeVInt(int stream, int i) -
getNextPerField
-
getFieldName
-
compareTo
- Specified by:
compareTo
in interfaceComparable<TermsHashPerField>
-
finish
Finish adding all instances of this field to the current document.- Throws:
IOException
-
getNumTerms
final int getNumTerms() -
start
Start adding a new field instance; first is true if this is the first time this field name was seen in the document. -
newTerm
Called when a term is seen for the first time.- Throws:
IOException
-
addTerm
Called when a previously seen term is seen again.- Throws:
IOException
-
newPostingsArray
abstract void newPostingsArray()Called when the postings array is initialized or resized. -
createPostingsArray
Creates a new postings array of the specified size.
-