Class ArrayCountingBloomFilter

  • All Implemented Interfaces:
    BitMapExtractor, BloomFilter<CountingBloomFilter>, CellExtractor, CountingBloomFilter, IndexExtractor

    public final class ArrayCountingBloomFilter
    extends java.lang.Object
    implements CountingBloomFilter
    A counting Bloom filter using an int array to track cells for each enabled bit.

    Any operation that results in negative counts or integer overflow of counts will mark this filter as invalid. This transition is not reversible. The operation is completed in full, no exception is raised and the state is set to invalid. This allows the cells for the filter immediately prior to the operation that created the invalid state to be recovered. See the documentation in isValid() for details.

    All the operations in the filter assume the cells are currently valid, for example cardinality or contains operations. Behavior of an invalid filter is undefined. It will no longer function identically to a standard Bloom filter that is the merge of all the Bloom filters that have been added to and not later subtracted from the counting Bloom filter.

    The maximum supported number of items that can be stored in the filter is limited by the maximum array size combined with the Shape. For example an implementation using a Shape with a false-positive probability of 1e-6 and Integer.MAX_VALUE bits can reversibly store approximately 75 million items using 20 hash functions per item with a memory consumption of approximately 8 GB.

    Since:
    4.5.0-M1
    See Also:
    Shape, CellExtractor
    • Constructor Detail

      • ArrayCountingBloomFilter

        public ArrayCountingBloomFilter​(Shape shape)
        Constructs an empty counting Bloom filter with the specified shape.
        Parameters:
        shape - the shape of the filter
    • Method Detail

      • asIndexArray

        public int[] asIndexArray()
        Description copied from interface: IndexExtractor
        Return a copy of the IndexExtractor data as an int array.

        Indices ordering and uniqueness is not guaranteed.

        The default implementation of this method creates an array and populates it. Implementations that have access to an index array should consider returning a copy of that array if possible.

        Specified by:
        asIndexArray in interface IndexExtractor
        Returns:
        An int array of the data.
      • cardinality

        public int cardinality()
        Description copied from interface: BloomFilter
        Gets the cardinality (number of enabled bits) of this Bloom filter.

        This is also known as the Hamming value or Hamming number.

        Specified by:
        cardinality in interface BloomFilter<CountingBloomFilter>
        Returns:
        the cardinality of this filter
      • contains

        public boolean contains​(BitMapExtractor bitMapExtractor)
        Description copied from interface: BloomFilter
        Returns true if this filter contains the bits specified in the bit maps produced by the bitMapExtractor.
        Specified by:
        contains in interface BloomFilter<CountingBloomFilter>
        Parameters:
        bitMapExtractor - the BitMapExtractor to provide the bit maps.
        Returns:
        true if this filter is enabled for all bits specified by the bit maps
      • contains

        public boolean contains​(IndexExtractor indexExtractor)
        Description copied from interface: BloomFilter
        Returns true if this filter contains the indices specified IndexExtractor.

        Specifically this returns true if this filter is enabled for all bit indexes identified by the IndexExtractor.

        Specified by:
        contains in interface BloomFilter<CountingBloomFilter>
        Parameters:
        indexExtractor - the IndexExtractor to provide the indexes
        Returns:
        true if this filter is enabled for all bits specified by the IndexExtractor
      • getMaxCell

        public int getMaxCell()
        Description copied from interface: CountingBloomFilter
        Gets the maximum allowable value for a cell count in this Counting filter.
        Specified by:
        getMaxCell in interface CountingBloomFilter
        Returns:
        the maximum allowable value for a cell count in this Counting filter.
      • getMaxInsert

        public int getMaxInsert​(CellExtractor cellExtractor)
        Description copied from interface: CountingBloomFilter
        Determines the maximum number of times the Cell Extractor could have been added.
        Specified by:
        getMaxInsert in interface CountingBloomFilter
        Parameters:
        cellExtractor - the extractor of cells.
        Returns:
        the maximum number of times the CellExtractor could have been inserted.
      • isValid

        public boolean isValid()
        Returns true if the internal state is valid.

        This flag is a warning that an addition or subtraction of cells from this filter resulted in an invalid cell for one or more indexes. For example this may occur if a cell for an index was set to negative following a subtraction operation, or overflows the value specified by getMaxCell() following an addition operation.

        A counting Bloom filter that has an invalid state is no longer ensured to function identically to a standard Bloom filter instance that is the merge of all the Bloom filters that have been added to and not later subtracted from this counting Bloom filter.

        Note: The change to an invalid state may or may not be reversible. Implementations are expected to document their policy on recovery from an addition or removal operation that generated an invalid state.

        Implementation note

        The state transition to invalid is permanent.

        This implementation does not correct negative cells to zero or integer overflow cells to Integer.MAX_VALUE. Thus the operation that generated invalid cells can be reversed by using the complement of the original operation with the same Bloom filter. This will restore the cells to the state prior to the invalid operation. Cells can then be extracted using processCells(CellPredicate).

        Specified by:
        isValid in interface CountingBloomFilter
        Returns:
        true if the state is valid
      • processBitMaps

        public boolean processBitMaps​(java.util.function.LongPredicate consumer)
        Description copied from interface: BitMapExtractor
        Each bit map is passed to the predicate in order. The predicate is applied to each bit map value, if the predicate returns false the execution is stopped, false is returned, and no further bit maps are processed.

        If the extractor is empty this method will return true.

        Any exceptions thrown by the action are relayed to the caller.

        Specified by:
        processBitMaps in interface BitMapExtractor
        Parameters:
        consumer - the function to execute
        Returns:
        true if all bit maps returned true, false otherwise.
      • processCells

        public boolean processCells​(CellExtractor.CellPredicate consumer)
        Description copied from interface: CellExtractor
        Performs the given action for each cell where the cell count is non-zero.

        Some Bloom filter implementations use a count rather than a bit flag. The term Cell is used to refer to these counts.

        Any exceptions thrown by the action are relayed to the caller. The consumer is applied to each cell. If the consumer returns false the execution is stopped, false is returned, and no further pairs are processed.

        Specified by:
        processCells in interface CellExtractor
        Parameters:
        consumer - the action to be performed for each non-zero cell.
        Returns:
        true if all cells return true from consumer, false otherwise.
      • processIndices

        public boolean processIndices​(java.util.function.IntPredicate consumer)
        Description copied from interface: IndexExtractor
        Each index is passed to the predicate. The predicate is applied to each index value, if the predicate returns false the execution is stopped, false is returned, and no further indices are processed.

        Any exceptions thrown by the action are relayed to the caller.

        Indices ordering and uniqueness is not guaranteed.

        Specified by:
        processIndices in interface CellExtractor
        Specified by:
        processIndices in interface IndexExtractor
        Parameters:
        consumer - the action to be performed for each non-zero bit index.
        Returns:
        true if all indexes return true from consumer, false otherwise.