LangSec Workshop

at IEEE Security & Privacy, May 18, 2014

Abstracts

Keynote

Caspar Bowden, "Seven Types of Metadata Ambiguity"

The Internet grew as a multi-layered architecture. Its explosive growth, its considerable censorship resistance to date, and its current state of ubiquitous insecurity can be attributed to the layering being defined "by rough consensus and running code". We are now at a point where new threat models call for more definite and consistent conceptual models for multi-layered systems, for both technical and legal threat models, including "lawful access". This talk will examine the significance of LangSec in providing legislators and security designers with such models, in particular distinguishing processing of metadata (to perform the service) from carriage of content. Such models should respect due process and do not preclude useful technical architectures for security and privacy.

The drafting of laws for interception of communications distinguish between traffic and content in ways which are susceptible to two basic interpretations. A relative interpretation is that a particular layer of data is considered traffic/metadata or content from the perspective of the service operator on whom the order is served. The absolute interpretation is that content is the inner core of many layers of address protocol data, each and any of which may be demanded of any service operator if technically feasible. This talk will examine the comparative drafting of laws in the US, UK, and other countries and the ambiguities inherent, which in some cases have resulted in self-contradictory simultaneous interpretations. Some of the policy history underlying the drafting of US FISA, and the analogous UK RIPA law, will be reviewed, and the complications this can give rise to with encrypted metadata (esp. the LavaBit case). Moreover, surveillance law for foreign intelligence purposes is unusual in that it is often purposely written to be opaque, compounding legal uncertainty. LangSec shines the light on this ambiguity as an additional major source of insecurity.

Papers

Eric Jaeger and Olivier Levillain, "Mind your Language(s): A Discussion about Languages and Security"

Following several studies conducted by the French Network and Information Security Agency (A NSSI), this paper discusses the question of the intrinsic security characteristics of programming languages. Through illustrations and discussions, it advocates for a different vision of well-known mechanisms and is intended to provide some food for thoughts regarding languages and development tools.

updated paper

Dennis Volpano, "LEGO Bricks for Reactive Programming"

A fundamental unit of computation is introduced for reactive programming called the LEGO brick. It is targeted for domains in which JavaScript runs in an attempt to allow a user to build a trustworthy reactive program on demand rather than try to analyze JavaScript. A formal definition is given for snapping bricks together based on the standard product construction for deterministic finite automata.

paper

Julian Bangert, Nickolai Zeldovich, "Nail: A Practical Interface Generator for Binary Formats"

We present Nail, an interface generator that allows programmers to safely parse and generate protocols defined by a Parser-Expression based grammar. Nail uses a richer set of parser combinators that induce an internal representation, obviating the need to write semantic actions. Nail also provides solutions parsing common patterns such as length and offset fields within binary formats that are hard to process with existing parser generators.

paper

W. Michael Petullo, Wenyuan Fei, Jon A. Solworth, Pat Gavlin, "Ethos' Deeply Integrated Distributed Types"

Programming languages have long incorporated type safety, increasing their level of abstraction and thus aiding programmers. Type safety eliminates whole classes of security-sensitive bugs, replacing the tedious and error-prone search for such bugs in each application with verifying the correctness of the type system. Despite their benefits, these protections often end at the process boundary, that is, type safety holds within a program but usually not to the filesystem or communication with other programs. Existing operating system approaches to bridge this gap require the use of a single programming language or common language runtime.

We describe the deep integration of type safety in Ethos, a clean-slate operating system which requires that all program input and output satisfy a recognizer before applications are permitted to further process it. Ethos types are multilingual and runtime-agnostic, and they incorporate a new mechanism for automatically creating type identifiers. Ethos bridges the type-safety gap between programs by (1) providing a convenient mechanism for specifying the types each program may produce or consume, (2) ensuring that each type has a single, distributed-system-wide recognizer implementation, and (3) inescapably enforcing these type constraints.

paper

Research Reports

Travis Goodspeed, "Phantom Boundaries and Cross-layer Illusions in 802.15.4 Digital Radio"

The classic design of protocol stacks, where each layer of the stack receives and unwraps the payload of the next layer, implies that each layer has a parser that accepts Protocol Data Units and extracts the intended Service Data Units from them. The PHY layer plays a special role, because it must create frames, i.e., original PDUs, from a stream of bits or symbols. An important property implicitly expected from these parsers is that SDUs are passed to the next layer only if the encapsulating PDUs from all previous layers were received exactly as transmitted by the sender and were syntactically correct.

The Packet-in-packet attack (WOOT 2011) showed that this false assumption could be easily violated and exploited on IEEE 802.15.4 and similar PHY layers; however, it did not challenge the assumption that symbols and bytes recognized by the receiver were as transmitted by the sender. This work shows that even that assumption is wrong: in fact, a valid received frame may share no symbols with the sent one! This property is due to a particular choice of low-level chip encoding of 802.15.4, which enables the attacker to co-opt the receiver’s error correction. This case study demonstrates that PHY layer logic is as susceptible to the input language manipulation attacks as other layers, or perhaps more so. Consequently, when designing protocol stacks, language-theoretic considerations must be taken into account from the very bottom of the PHY layer; no layer is too low to be considered “mere engineering.”

paper

Robert David Graham, Peter C. Johnson, "Finite State Machine Parsing for Internet Protocols: Faster Than You Think"

A parser's job is to take unstructured, opaque data and convert it to a structured, semantically meaningful format. As such, parsers often operate at the border between untrusted data sources (e.g., the Internet) and the soft, chewy center of computer systems, where performance and security are paramount. A firewall, for instance, is precisely a trust-creating parser for Internet protocols, permitting valid packets to pass through and dropping or actively rejecting malformed packets. Despite the prevalence of finite state machines (FSMs) in both protocol specifications and protocol implementations, they have gained little traction in parser code for such protocols. Typical reasons for avoiding the FSM computation model claim poor performance, poor scalability, poor expressibility, and difficult or time-consuming programming.

In this research report, we present our motivations for and designs of finite state machines to parse a variety of existing Internet protocols, both binary and ASCII. Our hand-written parsers explicitly optimize around L1 cache hit latency, branch misprediction penalty, and program-wide memory overhead to achieve aggressive performance and scalability targets. Our work demonstrates that such parsers are, contrary to popular belief, sufficiently expressive for meaningful protocols, sufficiently performant for high-throughput applications, and sufficiently simple to construct and maintain. We hope that, in light of other research demonstrating the security benefits of such parsers over more complex, Turing-complete codes, our work serves as evidence that certain "practical" reasons for avoiding FSM-based parsers are invalid.

paper

Olivier Levillain, "Parsifal: a Pragmatic Solution to the Binary Parsing Problem"

Parsers are pervasive software basic blocks: as soon as a program needs to communicate with another program or to read a file, a parser is involved. However, writing robust parsers can be difficult, as is revealed by the amount of bugs and vulnerabilities related to programming errors in parsers. It is especially true for network analysis tools, which led the network and protocols laboratory of the French Network and Information Security Agency (ANSSI) to write custom tools. One of them, Parsifal, is a generic framework to describe parsers in OCaml, and gave us some insight into binary formats and parsers. After describing our tool, this article presents some use cases and lessons we learned about format complexity, parser robustness and the role the language used played.

paper

Andreas Bogk, Marco Schöpl, "The Pitfalls of Protocol Design: Attempting to Write a Formally Verified PDF Parser"

Parsers for complex data formats generally present a big attack surface for input-driven exploitation. In practice, this has been especially true for implementations of the PDF data format, as witnessed by dozens of known vulnerabilities exploited in many real world attacks, with the Acrobat Reader implementation being the main target. In this report, we describe our attempts to use Coq, a theorem prover based on a functional programming language making use of dependent types and the Curry-Howard isomorphism, to implement a formally verified PDF parser. We ended up implementing a subset of the PDF format and proving termination of the combinator-based parser. Noteworthy results include a dependent type representing a list of strictly monotonically decreasing length of remaining symbols to parse, which allowed us to show termination of parser combinators. Also, difficulties showing termination of parsing some features of the PDF format readily translated into denial of service attacks against existing PDF parsers --- we came up with a single PDF file that made all the existing PDF implementations we could test enter an endless loop.

paper

Sarat Kompalli, "Using Existing Hardware Services for Malware Detection"

First, we describe our experiments in using hardware-based metrics such as those collected by the BPU and MMU for detection of malware activity at runtime. Second, we sketch a defense-in-depth security model that combines such detection with hardware-aided proof-carrying code and input validation.

Invited Talks

Julien Vanegue, "The Weird Machines in Proof-Carrying Code"

We review different attack vectors on Proof-Carrying Code (PCC) related to policy, memory model, machine abstraction, and formal system. We capture the notion of weird machines in PCC to formalize the shadow execution arising in programs when their proofs do not sufficiently capture and disallow the execution of untrusted computations. We suggest a few ideas to improve existing PCC systems so they are more resilient to memory attacks.

paper

Erik Bosman, "Who needs userland anyway: two words on generic shellcode"

Invited presentation from the IEEE S&P main track.

paper

Felix 'FX' Lindner, "Selling LangSec: Tales from the Alchemist's Apprentice"

Concluding keynote. How to convince real-world organizations to deploy LangSec (and how to not go about it).