Go to content

Don’t forget to sketch! Running with large datasets By Adam Marcus

Don’t forget to sketch! Running with large datasets By Adam Marcus Large datasets got you down? Have no fear! Make them small! Sketches are probabilistic data structures: they store a rough outline of a dataset in way less space than the dataset itself takes up. We'll sketch out three sketches to determine if an item is missing from your dataset (Bloom Filters!), count how many of an item are in your dataset (Count-min Sketches!), and count how many distinct items are in your dataset (HyperLogLogs!). In the spirit of the sketch, this talk will be hand-drawn (!!!) and leave some details to the imagination! Help us caption & translate this video! http://amara.org/v/KYDx/

May 7, 2016