data structures Latest Research Papers

A Concept Inventory (CI) is an assessment to measure student conceptual understanding of a particular topic. This article presents the results of a CI for basic data structures (BDSI) that has been previously shown to have strong evidence for validity. The goal of this work is to help researchers or instructors who administer the BDSI in their own courses to better understand their results. In support of this goal, we discuss our findings for each question of the CI using data gathered from 1,963 students across seven institutions.

Download Full-text

Certifying derivation of state machines from coroutines

Proceedings of the ACM on Programming Languages ◽

10.1145/3498685 ◽

2022 ◽

Vol 6 (POPL) ◽

pp. 1-31

Author(s):

Mirai Ikebuchi ◽

Andres Erbsen ◽

Adam Chlipala

Keyword(s):

Data Structures ◽

Runtime Systems ◽

Web Browsers ◽

State Machines ◽

Implementation Challenges ◽

The Coq Proof Assistant ◽

Coq Proof Assistant ◽

Equivalence Proof ◽

High Level

One of the biggest implementation challenges in security-critical network protocols is nested state machines. In practice today, state machines are either implemented manually at a low level, risking bugs easily missed in audits; or are written using higher-level abstractions like threads, depending on runtime systems that may sacrifice performance or compatibility with the ABIs of important platforms (e.g., resource-constrained IoT systems). We present a compiler-based technique allowing the best of both worlds, coding protocols in a natural high-level form, using freer monads to represent nested coroutines , which are then compiled automatically to lower-level code with explicit state. In fact, our compiler is implemented as a tactic in the Coq proof assistant, structuring compilation as search for an equivalence proof for source and target programs. As such, it is straightforwardly (and soundly) extensible with new hints, for instance regarding new data structures that may be used for efficient lookup of coroutines. As a case study, we implemented a core of TLS sufficient for use with popular Web browsers, and our experiments show that the extracted Haskell code achieves reasonable performance.

Download Full-text

Solving constrained Horn clauses modulo algebraic data types and recursive functions

Proceedings of the ACM on Programming Languages ◽

10.1145/3498722 ◽

2022 ◽

Vol 6 (POPL) ◽

pp. 1-29

Author(s):

Hari Govind V K ◽

Sharon Shoham ◽

Arie Gurfinkel

Keyword(s):

Data Structures ◽

General Class ◽

Data Types ◽

Deductive Verification ◽

Recursive Definition ◽

Underlying Theory ◽

Horn Clauses ◽

Verification Conditions ◽

Algebraic Data Types ◽

Inductive Invariants

This work addresses the problem of verifying imperative programs that manipulate data structures, e.g., Rust programs. Data structures are usually modeled by Algebraic Data Types (ADTs) in verification conditions. Inductive invariants of such programs often require recursively defined functions (RDFs) to represent abstractions of data structures. From the logic perspective, this reduces to solving Constrained Horn Clauses (CHCs) modulo both ADT and RDF. The underlying logic with RDFs is undecidable. Thus, even verifying a candidate inductive invariant is undecidable. Similarly, IC3-based algorithms for solving CHCs lose their progress guarantee: they may not find counterexamples when the program is unsafe. We propose a novel IC3-inspired algorithm Racer for solving CHCs modulo ADT and RDF (i.e., automatically synthesizing inductive invariants, as opposed to only verifying them as is done in deductive verification). Racer ensures progress despite the undecidability of the underlying theory, and is guaranteed to terminate with a counterexample for unsafe programs. It works with a general class of RDFs over ADTs called catamorphisms. The key idea is to represent catamorphisms as both CHCs, via relationification , and RDFs, using novel abstractions . Encoding catamorphisms as CHCs allows learning inductive properties of catamorphisms, as well as preserving unsatisfiabilty of the original CHCs despite the use of RDF abstractions, whereas encoding catamorphisms as RDFs allows unfolding the recursive definition, and relying on it in solutions. Abstractions ensure that the underlying theory remains decidable. We implement our approach in Z3 and show that it works well in practice.

Download Full-text

A separation logic for negative dependence

Proceedings of the ACM on Programming Languages ◽

10.1145/3498719 ◽

2022 ◽

Vol 6 (POPL) ◽

pp. 1-29

Author(s):

Jialu Bao ◽

Marco Gaboardi ◽

Justin Hsu ◽

Joseph Tassarotti

Keyword(s):

Data Structures ◽

Algorithm Design ◽

Random Variables ◽

Separation Logic ◽

Negative Dependence ◽

Probabilistic Data ◽

Complete Proof ◽

Partial Operation ◽

Verification Methods ◽

Independence Of Random Variables

Formal reasoning about hashing-based probabilistic data structures often requires reasoning about random variables where when one variable gets larger (such as the number of elements hashed into one bucket), the others tend to be smaller (like the number of elements hashed into the other buckets). This is an example of negative dependence , a generalization of probabilistic independence that has recently found interesting applications in algorithm design and machine learning. Despite the usefulness of negative dependence for the analyses of probabilistic data structures, existing verification methods cannot establish this property for randomized programs. To fill this gap, we design LINA, a probabilistic separation logic for reasoning about negative dependence. Following recent works on probabilistic separation logic using separating conjunction to reason about the probabilistic independence of random variables, we use separating conjunction to reason about negative dependence. Our assertion logic features two separating conjunctions, one for independence and one for negative dependence. We generalize the logic of bunched implications (BI) to support multiple separating conjunctions, and provide a sound and complete proof system. Notably, the semantics for separating conjunction relies on a non-deterministic , rather than partial, operation for combining resources. By drawing on closure properties for negative dependence, our program logic supports a Frame-like rule for negative dependence and monotone operations. We demonstrate how LINA can verify probabilistic properties of hash-based data structures and balls-into-bins processes.

Download Full-text

Tries-Based Parallel Solutions for Generating Perfect Crosswords Grids

Algorithms ◽

10.3390/a15010022 ◽

2022 ◽

Vol 15 (1) ◽

pp. 22

Author(s):

Virginia Niculescu ◽

Robert Manuel Ştefănică

Keyword(s):

Data Structures ◽

Execution Time ◽

Grid Generation ◽

Good Opportunity ◽

Complete Problem ◽

3D Space ◽

Black Boxes ◽

Np Complete

A general crossword grid generation is considered an NP-complete problem and theoretically it could be a good candidate to be used by cryptography algorithms. In this article, we propose a new algorithm for generating perfect crosswords grids (with no black boxes) that relies on using tries data structures, which are very important for reducing the time for finding the solutions, and offers good opportunity for parallelisation, too. The algorithm uses a special tries representation and it is very efficient, but through parallelisation the performance is improved to a level that allows the solution to be obtained extremely fast. The experiments were conducted using a dictionary of almost 700,000 words, and the solutions were obtained using the parallelised version with an execution time in the order of minutes. We demonstrate here that finding a perfect crossword grid could be solved faster than has been estimated before, if we use tries as supporting data structures together with parallelisation. Still, if the size of the dictionary is increased by a lot (e.g., considering a set of dictionaries for different languages—not only for one), or through a generalisation to a 3D space or multidimensional spaces, then the problem still could be investigated for a possible usage in cryptography.

Download Full-text

On the upper bounds for complexities of discrete functions

Asian-European Journal of Mathematics ◽

10.1142/s1793557122501790 ◽

2022 ◽

Author(s):

Slavcho Shtrakov

Keyword(s):

Data Structures ◽

Upper Bounds ◽

Complexity Measures ◽

Discrete Functions

In this paper, we study two classes of complexity measures induced by new data structures (abstract reduction systems) for representing [Formula: see text]-valued functions (operations), namely subfunction and minor reductions. When assigning values to some variables in a function, the resulting functions are called subfunctions, and when identifying some variables, the resulting functions are called minors. The number of the distinct objects obtained under these reductions of a function [Formula: see text] is a well-defined measure of complexity denoted by [Formula: see text] and [Formula: see text], respectively. We examine the maximums of these complexities and construct functions which reach these upper bounds.

Download Full-text

Efficiency improvement techniques for private intersection-sum protocol using Bloom filter

SN Applied Sciences ◽

10.1007/s42452-021-04910-z ◽

2022 ◽

Vol 4 (2) ◽

Author(s):

Hiroyuki Kano ◽

Keisuke Hakuta

Keyword(s):

Data Structures ◽

Bloom Filter ◽

Rational Integer ◽

Bloom Filters ◽

Efficiency Improvement ◽

Probabilistic Data ◽

Set Intersection ◽

Private Set Intersection

AbstractA private set intersection protocol is one of the secure multi-party computation protocols, and allows participants to compute the intersection of their sets without revealing them to each other. Ion et al. proposed the private intersection-sum protocol (PI-Sum). The PI-Sum is one of the two-party private set intersection protocol. In the PI-Sum, two parties (say Alice and Bob) have the private sets A and B. Moreover, Bob additionaly has a rational integer associated with each element of B. The PI-Sum allows Bob to obtain the sum of the rational integers associated with the elements of $$A \cap B$$ A ∩ B . This paper proposes the efficiency improvement techniques for the PI-Sum. The proposed techniques are based on Bloom filters which are probabilistic data structures. More precisely, this paper proposes three protocols which are modifications of the PI-Sum. The proposed protocols are more efficient than the PI-Sum.

Download Full-text

Data Structures

10.1016/b978-0-12-820025-4.00002-6 ◽

2022 ◽

pp. 19-44

Author(s):

Andrew F. Siegel ◽

Michael R. Wagner

Keyword(s):

Data Structures

Download Full-text

Using Heterogeneous Data Structures

Beginning Rust ◽

10.1007/978-1-4842-7208-4_8 ◽

2022 ◽

pp. 107-114

Author(s):

Carlo Milanesi

Keyword(s):

Data Structures ◽

Heterogeneous Data

Download Full-text

Timely Reporting of Heavy Hitters Using External Memory

ACM Transactions on Database Systems ◽

10.1145/3472392 ◽

2021 ◽

Vol 46 (4) ◽

pp. 1-35

Author(s):

Shikha Singh ◽

Prashant Pandey ◽

Michael A. Bender ◽

Jonathan W. Berry ◽

Martín Farach-Colton ◽

...

Keyword(s):

Data Structures ◽

High Speed ◽

High Sensitivity ◽

False Positives ◽

External Memory ◽

False Negatives ◽

Large Space ◽

Heavy Hitters ◽

Heavy Hitter ◽

Reporting Delay

Given an input stream S of size N , a ɸ-heavy hitter is an item that occurs at least ɸN times in S . The problem of finding heavy-hitters is extensively studied in the database literature. We study a real-time heavy-hitters variant in which an element must be reported shortly after we see its T = ɸ N-th occurrence (and hence it becomes a heavy hitter). We call this the Timely Event Detection ( TED ) Problem. The TED problem models the needs of many real-world monitoring systems, which demand accurate (i.e., no false negatives) and timely reporting of all events from large, high-speed streams with a low reporting threshold (high sensitivity). Like the classic heavy-hitters problem, solving the TED problem without false-positives requires large space (Ω (N) words). Thus in-RAM heavy-hitters algorithms typically sacrifice accuracy (i.e., allow false positives), sensitivity, or timeliness (i.e., use multiple passes). We show how to adapt heavy-hitters algorithms to external memory to solve the TED problem on large high-speed streams while guaranteeing accuracy, sensitivity, and timeliness. Our data structures are limited only by I/O-bandwidth (not latency) and support a tunable tradeoff between reporting delay and I/O overhead. With a small bounded reporting delay, our algorithms incur only a logarithmic I/O overhead. We implement and validate our data structures empirically using the Firehose streaming benchmark. Multi-threaded versions of our structures can scale to process 11M observations per second before becoming CPU bound. In comparison, a naive adaptation of the standard heavy-hitters algorithm to external memory would be limited by the storage device’s random I/O throughput, i.e., ≈100K observations per second.

Download Full-text

data structures
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Student Performance on the BDSI for Basic Data Structures

Certifying derivation of state machines from coroutines

Solving constrained Horn clauses modulo algebraic data types and recursive functions

A separation logic for negative dependence

Tries-Based Parallel Solutions for Generating Perfect Crosswords Grids

On the upper bounds for complexities of discrete functions

Efficiency improvement techniques for private intersection-sum protocol using Bloom filter

Data Structures

Using Heterogeneous Data Structures

Timely Reporting of Heavy Hitters Using External Memory

Export Citation Format

data structuresRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Student Performance on the BDSI for Basic Data Structures

Certifying derivation of state machines from coroutines

Solving constrained Horn clauses modulo algebraic data types and recursive functions

A separation logic for negative dependence

Tries-Based Parallel Solutions for Generating Perfect Crosswords Grids

On the upper bounds for complexities of discrete functions

Efficiency improvement techniques for private intersection-sum protocol using Bloom filter

Data Structures

Using Heterogeneous Data Structures

Timely Reporting of Heavy Hitters Using External Memory

data structures
Recently Published Documents