heavy hitter
Recently Published Documents


TOTAL DOCUMENTS

59
(FIVE YEARS 24)

H-INDEX

8
(FIVE YEARS 3)

2021 ◽  
Vol 46 (4) ◽  
pp. 1-35
Author(s):  
Shikha Singh ◽  
Prashant Pandey ◽  
Michael A. Bender ◽  
Jonathan W. Berry ◽  
Martín Farach-Colton ◽  
...  

Given an input stream S of size N , a ɸ-heavy hitter is an item that occurs at least ɸN times in S . The problem of finding heavy-hitters is extensively studied in the database literature. We study a real-time heavy-hitters variant in which an element must be reported shortly after we see its T = ɸ N-th occurrence (and hence it becomes a heavy hitter). We call this the Timely Event Detection ( TED ) Problem. The TED problem models the needs of many real-world monitoring systems, which demand accurate (i.e., no false negatives) and timely reporting of all events from large, high-speed streams with a low reporting threshold (high sensitivity). Like the classic heavy-hitters problem, solving the TED problem without false-positives requires large space (Ω (N) words). Thus in-RAM heavy-hitters algorithms typically sacrifice accuracy (i.e., allow false positives), sensitivity, or timeliness (i.e., use multiple passes). We show how to adapt heavy-hitters algorithms to external memory to solve the TED problem on large high-speed streams while guaranteeing accuracy, sensitivity, and timeliness. Our data structures are limited only by I/O-bandwidth (not latency) and support a tunable tradeoff between reporting delay and I/O overhead. With a small bounded reporting delay, our algorithms incur only a logarithmic I/O overhead. We implement and validate our data structures empirically using the Firehose streaming benchmark. Multi-threaded versions of our structures can scale to process 11M observations per second before becoming CPU bound. In comparison, a naive adaptation of the standard heavy-hitters algorithm to external memory would be limited by the storage device’s random I/O throughput, i.e., ≈100K observations per second.


Author(s):  
Adrian Pekar ◽  
Alejandra Duque‐Torres ◽  
Winston K.G. Seah ◽  
Oscar M. Caicedo Rendon
Keyword(s):  

2021 ◽  
Author(s):  
Enge Song ◽  
Nianbing Yu ◽  
Tian Pan ◽  
Liang Xu ◽  
Yisong Qiao ◽  
...  
Keyword(s):  

2021 ◽  
Author(s):  
Sahasrajit Sarmasarkar ◽  
Kota Srinivas Reddy ◽  
Nikhil Karamchandani

2021 ◽  
Author(s):  
Jianyuan Lu ◽  
Tian Pan ◽  
Shan He ◽  
Mao Miao ◽  
Guangzhe Zhou ◽  
...  

2021 ◽  
Vol 14 (11) ◽  
pp. 2046-2058
Author(s):  
Graham Cormode ◽  
Samuel Maddock ◽  
Carsten Maple

Private collection of statistics from a large distributed population is an important problem, and has led to large scale deployments from several leading technology companies. The dominant approach requires each user to randomly perturb their input, leading to guarantees in the local differential privacy model. In this paper, we place the various approaches that have been suggested into a common framework, and perform an extensive series of experiments to understand the tradeoffs between different implementation choices. Our conclusion is that for the core problems of frequency estimation and heavy hitter identification, careful choice of algorithms can lead to very effective solutions that scale to millions of users.


Author(s):  
Xin Zhe Khooi ◽  
Levente Csikor ◽  
Jialin Li ◽  
Min Suk Kang ◽  
Dinil Mon Divakara
Keyword(s):  

2021 ◽  
Vol 29 (3) ◽  
Author(s):  
Adrian Pekar ◽  
Alejandra Duque-Torres ◽  
Winston K. G. Seah ◽  
Oscar Caicedo

Sign in / Sign up

Export Citation Format

Share Document