This is yet another library for writing parsers in Rust. What makes this one different is that I've combined some existing academic work in a way that I think is novel. The result is an unusually flexible parsing library while still offering competitive performance and memory usage.
Algorithms, data structures, safety.
Rust’s type system ensures memory safety: well-typed Rust programs are guaranteed to not exhibit problems such as dangling pointers, data races, and unexpected side effects through aliased references. Going beyond memory safety, for instance, to guarantee the absence of assertion failures or functional correctness, requires static program verification. Formal verification of system software is notoriously difficult and requires complex specifications and logics to reason about pointers, aliasing, and side effects on mutable state. This complexity is a major obstacle to a more widespread verification of system software.
In this paper, we present a novel verification technique that leverages Rust’s type system to greatly simplify the specification and verification of Rust programs. We analyse information from the Rust compiler and synthesise a corresponding core proof for the program in a flavour of separation logic tailored to automation. Crucially, our proofs are constructed and checked automatically; users of our work never work with the underlying formal logic. Users can add specifications at the abstraction level of Rust expressions; we show how to interweave these to extend our core proof to prove modularly whether these specifications are correct. We have implemented our technique for a subset of Rust; our initial evaluation on two thousand functions from widely-used Rust crates demonstrates its effectiveness
A while back, there was a discussion comparing the performance of using the hashbrown crate (based on Google’s SwissTable implementation) in the Rust compiler. In the last RustFest, Amanieu was experimenting on integrating his crate into stdlib, which turned out to have some really promising results. As a result, it’s being planned to move the crate into stdlib.
While the integration is still ongoing, there’s currently no blog post out there explaining SwissTable at the moment. So, I thought I’d dig deeper into the Rust implementation to try and explain how its (almost) identical twin hashbrown::HashMap works.
The post before this one covered how shifgrethor handles rooting: how we track for the garbage collector that this object is alive. That isn’t sufficient for implementing a tracing garbage collector though: the idea of a tracing garbage collector is that we can trace from rooted objects through all of the objects they reference. That way, instead of having to root everything you use, you can only root a few objects from which all of the live objects can be traced.
After the digression in the previous post, it’s time to get back to what I promised in the first post: a look at how shifgrethor handles rooting. Shifgrethor’s solution is somewhat novel and takes advantage of some of Rust’s specific features, so I want to start by looking briefly at some of the other options.
I’m really excited to share with you an experiment that I’ve been working on for the past 5 or 6 weeks. It’s a Rust library called shifgrethor. shifgrethor implements a garbage collector in Rust with an API I believe to be properly memory safe.
I’ll be going through all of the technical details in future blog posts, so I want to kick this series off with a high level overview of the project’s purpose and design decisions.
We introduce partially-stateful data-flow, a new streaming data-flow model that supports eviction and reconstruction of data-flow state on demand. By avoiding state explosion and supporting live changes to the data-flow graph, this model makes data-flow viable for building long-lived, low-latency applications, such as web applications. Our implementation, Noria, simplifies the backend infrastructure for read-heavy web applications while improving their performance.
A Noria application supplies a relational schema and a set of parameterized queries, which Noria compiles into a data-flow program that pre-computes results for reads and incrementally applies writes. Noria makes it easy to write high-performance applications without manual performance tuning or complex-to-maintain caching layers. Partial statefulness helps Noria limit its in-memory state without prior data-flow systems’ restriction to windowed state, and helps Noria adapt its data-flow to schema and query changes while on-line. Unlike prior data-flow systems, Noria also shares state and computation across related queries, eliminating duplicate work.
On a real web application’s queries, our prototype scales to 5× higher load than a hand-optimized MySQL baseline. Noria also outperforms a typical MySQL/memcached stack and the materialized views of a commercial database. It scales to tens of millions of reads and millions of writes per second over multiple servers, outperforming a state-of-the-art streaming data-flow system.
Rust is an emerging systems programming language with guaranteed memory safety and modern language features that has been extensively adopted to build safety-critical software. However, there is currently a lack of automated software verifiers for Rust. In this work, we present our experience extending the SMACK verifier to enable its usage on Rust programs. We evaluate SMACK on a set of Rust programs to demonstrate a wide spectrum of language features it supports.
Hi there! This is the prelude to a series of posts to detailing how to build a language VM. If you are familiar with terms like registers, program counter, and assembly, feel free to skip this post. If not, read on. Please note this is nowhere near comprehensive, but enough to understand what we’re building.
There is no general “best” way to minimize a function; different kinds of functions require different strategies. However, Python’s scipy and R’s optim both prominently feature an algorithm called BFGS. I’ll explain what BFGS stands for, the problem that it solves, and how it solves it.
A Brief Course in Computer Hardware. This is the prelude to a series of posts to detailing how to build a language VM. If you are familiar with terms like registers, program counter, and assembly, feel free to skip this post. If not, read on.
Poor usability of cryptographic APIs is a severe source of vulnerabilities. Aim: We wanted to find out what kind of cryptographic libraries are present in Rust and how usable they are. Method: We explored Rust's cryptographic libraries through a systematic search, conducted an exploratory study on the major libraries and a controlled experiment on two of these libraries with 28 student participants. Results: Only half of the major libraries explicitly focus on usability and misuse resistance, which is reflected in their current APIs. We found that participants were more successful using rust-crypto which we considered less usable than ring before the experiment. Conclusion: We discuss API design insights and make recommendations for the design of crypto libraries in Rust regarding the detail and structure of the documentation, higher-level APIs as wrappers for the existing low-level libraries, and selected, good-quality example code to improve the emerging cryptographic libraries of Rust.
This article discusses a possible genetic algorithm implementation in Rust applied to the travelling salesman problem.
This weekend, I implemented logistic regression in Rust. For me, the most interesting parts were learning how to implement a stopping condition and how to automatically set a step size.
In this article, I want to explain the term Generic Associated Types through a concrete example. I noticed that people (especially in video games development) need some tools to iterate in various manners mutably, efficiently and safely. I tried to write some convenient iterators over vectors and slices that solve those problems, but finally, I understood that some tools cannot be written with std::iter::Iterator. Doing so led me to the comprehension of generic associated types that I will abbreviate as GATs in this article. I will explain here what GATs are and why they are needed.
Rust represents a major advancement in production programming languages because of its success in bridging the gap between high-level application programming and low-level systems programming. At the heart of its design lies a novel approach to ownership that remains highly programmable.
In this talk, we will describe our ongoing work on designing a formal semantics for Rust that captures how programmers can understand ownership and borrowing without trying to grasp the details of lifetime analysis.
During the last couple of years, I’ve spent a lot of time writing parsers and parser generators, and I want to write down my thoughts about this topic. Specifically, I want to describe some properties of a parser generator that I would enjoy using. Note that this is not an “introduction to parsing” blog post, some prior knowledge is assumed.
Type-inferred gradually typed languages are a joy to use: easy to write, analyze, and refactor. In this blog post, I will showcase to the other dozen of programmers who are interested in the obscure art of programming language type inference.
Abstract—Search indices are fundamental building blocks of many systems, and there is great interest in running them on encrypted data. Unfortunately, many known schemes that enable search queries on encrypted data achieve efficiency at the expense of security, as they reveal access patterns to the encrypted data. In this paper we present Oblix, a search index for encrypted data that is oblivious (provably hides access patterns), is dynamic (supports inserts and deletes), and has good efficiency. Oblix relies on a combination of novel oblivious-access tech- niques and recent hardware enclave platforms (e.g., Intel SGX). In particular, a key technical contribution is the design and implementation of doubly-oblivious data structures, in which the client’s accesses to its internal memory are oblivious, in addition to accesses to its external memory at the server. These algorithms are motivated by hardware enclaves like SGX, which leak access patterns to both internal and external memory. We demonstrate the usefulness of Oblix in several applications: private contact discovery for Signal, private retrieval of public keys for Key Transparency, and searchable encryption that hides access patterns and result sizes.
Consensus is one of the most important challenges in designing and building distributed systems–how to make sure multiple nodes (or agents, processes, threads, tasks, participants, etc.) in a group agree on a specific value proposed by at least one of the nodes. As an open-source distributed scalable HTAP database, TiDB uses the Raft Consensus Algorithm in its distributed transactional key-value storage engine, TiKV, to ensure data consistency, auto-failover, and fault tolerance.
I just finished my Master's Thesis and part of it was writing a Python compiler in Rust. The goal of the thesis was to evaluate language features of Python that were hypothesized to cause performance issues. Quantifying the cost of these features could be valuable to language designers moving forward. Some interesting results were observed when implementing compiler optimizations for Python. An average speedup of 51% was achieved across a number of benchmarks.
In Rust, objects are either owned or borrowed. While there can be multiple borrows of the same object, each object has a unique owner. When objects are passed by value, ownership is transferred to the new scope, and the old scope can no longer access it. This makes it possible to implement “consuming” methods which take self by value, and therefore can only be called once. In the party and dealer types, such methods consume the previous state and return the next state in the protocol.
Rust is a new and promising high-level system programming language. It provides both memory safety and thread safety through its novel mechanisms such as ownership, moves and borrows. Ownership system ensures that at any point there is only one owner of any given resource. The ownership of a resource can be moved or borrowed according to the lifetimes. The ownership system establishes a clear lifetime for each value and hence does not necessarily need garbage collection. These novel features bring Rust high performance, fine low-level control of C and C++, and unnecessity in garbage collection, which differ Rust from other existing prevalent languages. For formal analysis of Rust programs and helping programmers learn its new mechanisms and features, a formal semantics of Rust is desired and useful as a fundament for developing related tools. In this paper, we present a formal executable operational semantics of a realistic subset of Rust, called KRust. The semantics is defined in K, a rewriting-based executable semantic framework for programming languages. The executable semantics yields automatically a formal interpreter and verification tools for Rust programs. KRust has been thoroughly validated by testing with hundreds of tests, including the official Rust test suite.
A software library for high performance and hardware accelerated simulation of Quantum Computers and Algorithms. Written with Rust and OpenCL.
In my last post, I talked about the new “pinned references” which guarantee that the data at the memory it points to will not, ever, be moved elsewhere. I explained how they enable giving a safe API to code that could previously only be exposed with unsafe, and how one could go about proving such a thing. This post is about another application of pinned references—another API whose safety relies on the pinning guarantees: Intrusive collections. It turns out that pinned references can almost be used for this, but not quite. However, this can be fixed by extending the guarantees provided by pinned references, as suggested by @cramertj.
I’ve been exploring various ways to write parsers. For a long time, I’ve used hand-written recursive descent for its straightforwardness, flexibility, and performance. There is another way—parser generators like Menhir, LALRPOP, or the venerable Bison use the bottom-up LR algorithm. I decided I would try an experiment: write an LR parser by hand, and see how readable I could make it.
Specialization holds the dubious honor of being among the oldest post-1.0 features remaining in unstable limbo. That’s for good reason, though: until recently, we did not know how to make it sound.
I’ve been spending some time thinking about garbage collection in rust. I know, shame on me, it’s a systems language, we hate garbage collection, but… even in a systems programming language, garbage collection is still pretty damn useful.
This library implements several of the more commonly useful immutable data structures for Rust. They rely on structural sharing to keep most operations fast without needing to mutate the underlying data store, leading to more predictable code without necessarily sacrificing performance.
Rust does not have dependent types, or GADTs like Haskell, but with a few tricks, we can use Rust's type system to emulate an Idris-like number system.
I am developing a library for stencil calculation in Rust.
In the previous post, pHash helped us to summarize our photo album. Now it’s time to employ BK-trees and efficiently search through the metric space of perceptual hashes. Let’s roll up the sleeves; more Rust awaits!
Rust is a modern programming language which is marketed primarily on the basis of its very nice type system, and I’d like to tell you about how you can use this type system to reason about your programs in interesting ways. Most of the time when its type system is discussed, the focus is on its guarantee of data race freedom and ability to enable so-called fearless concurrency (and rightfully so—this is a place where Rust truly shines!). Today, I have a different focus in mind, characterized perhaps most succinctly as follows:
Combine is a parser combinator library for the Rust programming language. I first announced version 3 of Combine back in August and back then I definitely expected to have a stable version by now. However other projects (cough gluon cough) got in the way and Combine fell to the wayside. It didn’t help that I didn’t have a killer feature for 3.0 either, user-defined error types make it possible to define parsers usable in #[no_std] crates which is great when you need it but it is still a fairly niche use-case.
To demonstrate the value of Rust's memory safety rules, I contrast the implementation of a simple vector library in C and Rust, highlighting where and how Rust's static analysis can prevent tricky memory errors.