Publications

A Verified, Analysis-Focused Interpreter for WebAssembly

2021 - Technische Universität München - Master’s Thesis in Informatics

Abstract

WebAssembly is a standardized general-purpose bytecode language that primarily targets usage as part of web applications. Next to an English prose specification, it also has an official formal specification that describes the language semantics unambiguously in a natural deduction style. This motivates the development of verified tools for static analysis and optimization, since the formal specification can be used as a direct basis to verify against. However, this specification of execution by means of reduction rules has certain inconvenient properties such as mixing data and code, as well as not having any notion of addresses in a program. This makes the application of techniques such as Abstract Interpretation impossible in the sense that information should be determined and stored for specific program points. We overcome these limitations by developing a new interpreter that uses an explicit call stack and value stack, in addition to explicitly tracking a program counter on top of immutable code. This interpreter is developed directly on top of the WebAssembly mechanization developed by Conrad Watt in Isabelle/HOL, a generic theorem proving and logic system. We then present a proof for completeness of our interpreter, showing that it can indeed represent any execution allowed by the WebAssembly specification, which makes it a suitable basis for static analysis. As a side effect of this, we also show that real-world interpreters taking a similar approach using explicit stacks and program counters can also cover the entire WebAssembly language. In addition to the interpreter, we extend the mechanization by formalizing the specification of integer types and operations, as this part had still been missing. We also supply executable implementations for these integer operations and verify them against the specification.

Additional Resources:

Official WasmCert Repository

SMT - Bit Vectors

2020 - Technische Universität München - Master Seminar Thesis

Abstract

Due to how real computers are designed, SMT formulas built on unbounded integers aregenerally unsuited for deciding properties on program code and instead, sequences of bits,calledbit vectors, are used to represent machine integers. After establishing a calculus forSMT formulas on such data, this document elaborates on how these can be automaticallydecided by using theflatteningapproach to translate them into propositional logic, which canbe solved by any SAT solver. This approach is extended toincremental flattening, which isable to decide certain types of formulas significantly faster. For these processes, an exemplarilyimplementation in Haskell is developed, which is finally being used to solve a small CrackMe-challenge as an example application.

Additional Resources:

Shida (SMT Solver developed in thesis)

Volume Rendering of Meteorological Simulation Data

2018 - Technische Universität München - Bachelor’s Thesis in Informatics: Games Engineering

Abstract

This thesis presents the integration of two different volume lighting methods to be used for visualization of meteorological simulation data, in particular cloud data, in Met.3D, an open-source visualization application for meteorological uses. The first method is primarily a visuals-based approach, which simulates single scattering and imitates certain visual phenomena appearing in real-world clouds. The second, while generally requiring more computational effort, is a specific implementation of the photon mapping algorithm and directly simulates multiple scattering by tracing photons through the volume. It takes advantage of the Henyey-Greenstein function in order to be able to only use 3D textures as the data structure of the photon map. We use our implementations of these methods to produce a number of images and analyze them with respect to visual outcome, physical accuracy and performance. In addition, we compare images created with the photon mapping method to results that were generated using MFASIS, a method to simulate radiative transfer of clouds.

Additional Resources:

Case Study on LLVM as suitable intermediate language for binary analysis

2017 - Technische Universität München - Bachelor Seminar Thesis

Abstract

Many binary analysis tools and compilers, instead of directly working on code, use an intermediate representation of it. The idea of this thesis is to use the well-tested intermediate representation from LLVM for binary analysis tasks. We take a look at McSema, a tool to translate x86 and x86_64 binaries to LLVM, describe its translation process in detail and additionally implement Python bindings for it. To practically test McSema, we present five examples of code we translate to LLVM and then recompile again. The last of these demos is an example on using KLEE, a symbolic execution engine for LLVM, on the code produced by McSema in order to successfully solve a CrackMe. We conclude that McSema’s translation approach provides a suitable way to extract functions from binaries to integrate them in other code or to analyse them using symbolic execution, as well as serving as a potential basis to implement an LLVM-based decompiler. We also compare it to Remill, another tool similar to McSema, which generates code that represents the assembly code more explicitly and VEX, the intermediate representation used in Valgind and Angr, which is also more close to the machine code.

Additional Resources: