Summary of Quickly Generating Diverse Valid Test Inputs with Reinforcement Learning
The paper, Quickly Generating Diverse Valid Test Inputs with Reinforcement Learning, is about fuzzing.
Generator-based fuzzing addresses this issue by having a developer write a generator. The generator is written in Java/python/a general-purpose programming language. The generator invokes functions that produce randomness (e.g. random.Select(choices)) into a structured input. With the generator, most inputs produced will be valid and random. However, there is a central conflict in generator-based fuzzing. A simple generator is easy to write, but may miss out other (semantic) constraints and may not be effective. Conversely, a generator that can produce diverse inputs can fuzz a program well, but is difficult to write.
The paper suggests that a generator can be viewed as a sequence of choice points. For example, a binary-search tree (BST) is a sequences of choices. At each node, one asks the following questions and makes a choice at each point (1. node value, 2. have left child?, 3. have right child?).
If truly random, one would get very short trees. e.g. there is 1/4 chance (50% chance to have no left child, another 50% to have no right child) of producing a tree of just a single node. Thus, we can see that such a generator would produce a large number of short trees, and in fact, is quite likely to produce the same small boring trees over and over again.
Another constraint is that a BST has the binary search property; the value in a given node is greater than or equal to the value stored in the left sub-tree, and less than or equal to the value in the right sub-tree. If this property is not explicitly described in the generator, then many inputs will not be a valid BST. On the other hand, I guess if all the constraints are encoded within the generator, the generator may not be able to produce many interesting inputs.
Reinforcement learning to the rescue?
The paper describe the diversifying guidance problem. One could guide the space of choices to produce diverse yet valid inputs. This seems very similar to the problems solved by Reinforcement Learning (RL).
From my own interpretation of reinforcement learning, RL problems are a choice of action given a particular state, such that reward is maximised (based on the estimate of the reward conditioned on the state and action). As the choices are made, rewards are received and the learner updates its policy. This is, of course, an overgeneralization of the description of RL in this paper and others.
Since each input can be represented as a sequence of choices made, the sequence of choices are abstracted into a state. For most tasks, this could be something like a trimmed window (as the choices may get too long). But one could include domain-specific information, e.g. binary search tree nodes depend on their parent.
Actions are the choices made at the choice points.
The reward function depends on the validity of the input, and if the input was seen before (based on comparing a selected characteristic, such as coverage or the input length). The highest reward is achieved if the input is both valid and new, followed by an input that is valid but not yet. The lowest reward is achieved if the input is not valid.
How did it fare against other tools?
In the evaluation, RLCheck (the tool proposed in the paper) seems to outperform existing tools in generating good inputs quickly, but plateaus early. Interestingly, using greybox (coverage information) to help RLCheck did not outperform the blackbox version (no additional coverage info...)
Overall, this was an interesting take on fuzzing, using reinforcement learning to help guide the generation of valid inputs. It is an exciting paper and I found it to be very innovative.
There are many papers on grammar-based fuzzing. I suppose grammar-based fuzzing addresses the same problem, and is an alternative to using a generator. Maybe these techniques can be compared on the CHomskey hierarchy. Generators are essentially Turing machines, since the developer can express any sort of constraint there, while grammars are typically Regular Languages or Context-free languages.
#fuzzing #reinforcement-learning- 2 toasts