• 15 Posts
  • 51 Comments
Joined 1 year ago
cake
Cake day: June 27th, 2023

help-circle



  • Moisture could be a problem, found this manufacturer faq:

    Energizer Non-Rechargeable Batteries: Frequently Asked Questions Is it a good idea to store batteries in a refrigerator or freezer? No, storage in a refrigerator or freezer is not required or recommended for batteries produced today. Cold temperature storage can in fact harm batteries if condensation results in corroded contacts or label or seal damage due to extreme temperature storage
















  • I can see why this data structure might be abused and/or chosen for an inappropriate use-case since it seems to offer a lot of value for the tiny amount of space required.

    if you need to know a key is definitely in that space. You still have to perform the lookup.

    This is a good description. I think the name “filter” is appropriate for their best use cases, when you want to remove members of some other set if they are probably members of the bloom filter set, and can accept that you might remove some extras due to false positives.

    Problems like that come up from time-to-time.



  • I haven’t used them in Spark directly but here’s how they are used for computing sparse joins in a similar data processing framework:

    Let’s say you want to join some data “tables” A and B. When B has many more unique keys than are present in A, computing “A inner join B” would require lots of shuffling if B, including those extra keys.

    Knowing this, you can add a step before the join to compute a bloom filter of the keys in A, then apply the filter to B. Now the join from A to B-filtered only considers relevant keys from B, hopefully now with much less total computation than the original join.