T - the type of instances that the BloomFilter accepts@Beta public final class BloomFilter<T> extends Object implements Predicate<T>, Serializable
A Bloom filter for instances of T. A Bloom filter offers an approximate containment test with one-sided error: if it claims that an element is contained in it, this might be in error, but if it claims that an element is not contained in it, then this is definitely true.
If you are unfamiliar with Bloom filters, this nice tutorial may help you understand how they work.
The false positive probability (FPP) of a bloom filter is defined as the probability that mightContain(Object) will erroneously return true for an object that has not actually been put in the BloomFilter.
Bloom filters are serializable. They also support a more compact serial representation via the writeTo(java.io.OutputStream) and readFrom(java.io.InputStream, com.diffplug.common.hash.Funnel<T>) methods. Both serialized forms will continue to be supported by future versions of this library. However, serial forms generated by newer versions of the code may not be readable by older versions of the code (e.g., a serialized bloom filter generated today may not be readable by a binary that was compiled 6 months ago).
| Modifier and Type | Method and Description |
|---|---|
BloomFilter<T> |
copy()
Creates a new
BloomFilter that’s a copy of this instance. |
static <T> BloomFilter<T> |
create(Funnel<? super T> funnel,
int expectedInsertions)
Creates a
BloomFilter with the expected number of insertions and a default expected false positive probability of 3%. |
static <T> BloomFilter<T> |
create(Funnel<? super T> funnel,
int expectedInsertions,
double fpp)
Creates a
BloomFilter with the expected number of insertions and expected false positive probability. |
static <T> BloomFilter<T> |
create(Funnel<? super T> funnel,
long expectedInsertions)
Creates a
BloomFilter with the expected number of insertions and a default expected false positive probability of 3%. |
static <T> BloomFilter<T> |
create(Funnel<? super T> funnel,
long expectedInsertions,
double fpp)
Creates a
BloomFilter with the expected number of insertions and expected false positive probability. |
boolean |
equals(Object object) |
double |
expectedFpp()
Returns the probability that mightContain(Object) will erroneously return
true for an object that has not actually been put in the BloomFilter. |
int |
hashCode() |
boolean |
isCompatible(BloomFilter<T> that)
Determines whether a given bloom filter is compatible with this bloom filter.
|
boolean |
mightContain(T object)
Returns
true if the element might have been put in this Bloom filter, false if this is definitely not the case. |
boolean |
put(T object)
Puts an element into this
BloomFilter. |
void |
putAll(BloomFilter<T> that)
Combines this bloom filter with another bloom filter by performing a bitwise OR of the underlying data.
|
static <T> BloomFilter<T> |
readFrom(InputStream in,
Funnel<T> funnel)
Reads a byte stream, which was written by writeTo(OutputStream), into a
BloomFilter<T>. |
boolean |
test(T input)
Deprecated.
Provided only to satisfy the
Predicate interface; use mightContain(T) instead. |
void |
writeTo(OutputStream out)
Writes this
BloomFilter to an output stream, with a custom format (not Java serialization). |
@CheckReturnValue public BloomFilter<T> copy()
Creates a new BloomFilter that’s a copy of this instance. The new instance is equal to this instance but shares no mutable state.
@CheckReturnValue public boolean mightContain(T object)
Returns true if the element might have been put in this Bloom filter, false if this is definitely not the case.
@Deprecated @CheckReturnValue public boolean test(T input)
Predicate interface; use mightContain(T) instead.public boolean put(T object)
Puts an element into this BloomFilter. Ensures that subsequent invocations of mightContain(Object) with the same element will always return true.
object has been added to the filter. If the bits haven’t changed, this might be the first time object has been added to the filter. Note that put(t) always returns the opposite result to what mightContain(t) would have returned at the time it is called."void return type})@CheckReturnValue public double expectedFpp()
Returns the probability that mightContain(Object) will erroneously return true for an object that has not actually been put in the BloomFilter.
Ideally, this number should be close to the fpp parameter passed in create(Funnel, int, double), or smaller. If it is significantly higher, it is usually the case that too many elements (more than expected) have been put in the BloomFilter, degenerating it.
@CheckReturnValue public boolean isCompatible(BloomFilter<T> that)
Determines whether a given bloom filter is compatible with this bloom filter. For two bloom filters to be compatible, they must:
that - The bloom filter to check for compatibility.public void putAll(BloomFilter<T> that)
Combines this bloom filter with another bloom filter by performing a bitwise OR of the underlying data. The mutations happen to this instance. Callers must ensure the bloom filters are appropriately sized to avoid saturating them.
that - The bloom filter to combine this bloom filter with. It is not mutated.IllegalArgumentException - if isCompatible(that) == false@CheckReturnValue public static <T> BloomFilter<T> create(Funnel<? super T> funnel, int expectedInsertions, double fpp)
Creates a BloomFilter with the expected number of insertions and expected false positive probability.
Note that overflowing a BloomFilter with significantly more elements than specified, will result in its saturation, and a sharp deterioration of its false positive probability.
The constructed BloomFilter<T> will be serializable if the provided Funnel<T> is.
It is recommended that the funnel be implemented as a Java enum. This has the benefit of ensuring proper serialization and deserialization, which is important since equals(java.lang.Object) also relies on object identity of funnels.
funnel - the funnel of T’s that the constructed BloomFilter<T> will useexpectedInsertions - the number of expected insertions to the constructed BloomFilter<T>; must be positivefpp - the desired false positive probability (must be positive and less than 1.0)BloomFilter@CheckReturnValue public static <T> BloomFilter<T> create(Funnel<? super T> funnel, long expectedInsertions, double fpp)
Creates a BloomFilter with the expected number of insertions and expected false positive probability.
Note that overflowing a BloomFilter with significantly more elements than specified, will result in its saturation, and a sharp deterioration of its false positive probability.
The constructed BloomFilter<T> will be serializable if the provided Funnel<T> is.
It is recommended that the funnel be implemented as a Java enum. This has the benefit of ensuring proper serialization and deserialization, which is important since equals(java.lang.Object) also relies on object identity of funnels.
funnel - the funnel of T’s that the constructed BloomFilter<T> will useexpectedInsertions - the number of expected insertions to the constructed BloomFilter<T>; must be positivefpp - the desired false positive probability (must be positive and less than 1.0)BloomFilter@CheckReturnValue public static <T> BloomFilter<T> create(Funnel<? super T> funnel, int expectedInsertions)
Creates a BloomFilter with the expected number of insertions and a default expected false positive probability of 3%.
Note that overflowing a BloomFilter with significantly more elements than specified, will result in its saturation, and a sharp deterioration of its false positive probability.
The constructed BloomFilter<T> will be serializable if the provided Funnel<T> is.
It is recommended that the funnel be implemented as a Java enum. This has the benefit of ensuring proper serialization and deserialization, which is important since equals(java.lang.Object) also relies on object identity of funnels.
funnel - the funnel of T’s that the constructed BloomFilter<T> will useexpectedInsertions - the number of expected insertions to the constructed BloomFilter<T>; must be positiveBloomFilter@CheckReturnValue public static <T> BloomFilter<T> create(Funnel<? super T> funnel, long expectedInsertions)
Creates a BloomFilter with the expected number of insertions and a default expected false positive probability of 3%.
Note that overflowing a BloomFilter with significantly more elements than specified, will result in its saturation, and a sharp deterioration of its false positive probability.
The constructed BloomFilter<T> will be serializable if the provided Funnel<T> is.
It is recommended that the funnel be implemented as a Java enum. This has the benefit of ensuring proper serialization and deserialization, which is important since equals(java.lang.Object) also relies on object identity of funnels.
funnel - the funnel of T’s that the constructed BloomFilter<T> will useexpectedInsertions - the number of expected insertions to the constructed BloomFilter<T>; must be positiveBloomFilterpublic void writeTo(OutputStream out) throws IOException
Writes this BloomFilter to an output stream, with a custom format (not Java serialization). This has been measured to save at least 400 bytes compared to regular serialization.
Use readFrom(InputStream, Funnel) to reconstruct the written BloomFilter.
IOException@CheckReturnValue public static <T> BloomFilter<T> readFrom(InputStream in, Funnel<T> funnel) throws IOException
Reads a byte stream, which was written by writeTo(OutputStream), into a BloomFilter<T>.
The Funnel to be used is not encoded in the stream, so it must be provided here. Warning: the funnel provided must behave identically to the one used to populate the original Bloom filter!
IOException - if the InputStream throws an IOException, or if its data does not appear to be a BloomFilter serialized using the writeTo(OutputStream) method.