Researchers extend the use of deep learning for distinguishing encrypted text from random noise

Researchers extend the use of deep learning for distinguishing encrypted text from random noise

Scientific PaperPerformance Comparison Between Deep Learning-Based and Conventional Cryptographic Distinguishers

Authors: Emanuele Bellini, Matteo Rossi

Cryptanalysis and machine learning have many common methodologies. In late 1991, cryptography pioneer Ron Rivest commented on the similarity between the two domains. However, despite the similarity and the recent progress in deep learning tools and hardware, researchers have made little progress in applying machine learning techniques to cryptography research

Finally, in 2019, researchers began demonstrating some promising results in using neural networks. Now a team of researchers at the Technology Innovation Institute in the United Arab Emirates, in a joint work with Politecnico di Torino in Italy, has analyzed the performance improvements of deep learning for a cryptographic distinguisher.

A distinguisher is a technique for discerning text that has been encrypted by a specific algorithm from random noise. A distinguisher is the first step in launching a key recovery attack. This research could pave the way for the broader adoption of deep learning in cryptography.

Neural networks and cryptography

In some respect, finding the secret key of a cryptographic algorithm is somewhat equivalent to finding the appropriate set of neural network weights in a deep learning algorithm. In his 1991 paper, Rivest observed, “The notion of ‘secret key’ in cryptography corresponds to the notion of ‘target function’ in machine learning theory, and more generally the notion of ‘key space’ in cryptography corresponds to the notion of the ‘class of possible target functions.’” The main workflow for implementing deep learning algorithms lies in training the correct weighting for the neurons in the network. In cryptanalysis, researchers attack a block cipher in order to find the key. The attack in cryptanalysis and training in deep learning are analogous. Similarly, the keys in a cryptographic system are analogous to the weightings in a neural network.

Despite these similarities, there are also differences. One main difference is that machine learning is usually applied to data that has a structure such as images, sounds, and word vectors. Encrypted data must look random by definition. But randomness is one factor that can cause machine learning algorithms to fail. A neural network trained with random data will find no correlation.

A distinguisher attack is a basic kind of attack on symmetric ciphers. A distinguisher can distinguish between data that has been structured by a cipher and random data of the same size. In his seminal 2019 research, Aron Gohr, a researcher at the German Federal Office of Information Security, showed that by using information from classical cipher analytics attacks, it is possible to build distinguishers based on neural networks. These can capture more information than conventional distinguishers. This means that these new distinguishers are more efficient than a classical distinguisher.

How performance is measured

The TII researcher Emanuele Bellini, in collaboration with Matteo Rossi from Politecnico di Torino, built on this research by improving the tools for assessing the performance of conventional versus deep learning-based distinguishers. Two aspects of measuring effectiveness include the number of cryptographic rounds and the number of input/output pairs required to affect an attack.

Gohr’s work focused on using machine learning for building a distinguisher and key recovery for the Speck32/64 algorithm. The TII team looked at applying neural networks to other algorithms, including Tiny Encryption Algorithm and RAIDEN. All three of these algorithms are called ARX ciphers, which means they only use three types of operations: modular Addition, Rotations, and XOR operations. The family of ARX ciphers is important because of the software-friendly design (which allows fast software implementations), and they are widely used in commercial cryptography algorithms such as ChaCha20. Another popular commercial cipher, Sha1, is considered an extension of the ARX family because it has a few extra operations.

Speck32/64 is a toy cipher designed for research because it is possible to crack it with brute force techniques that analyze all possible keys. TEA and RAIDEN have much larger input and key sizes. In principle, it is much harder to break these algorithms with brute force approaches. Symmetric ciphers use iterated functions, in which the same round function is applied many times consecutively. For example, The ChaCha20 cipher applies the same round function 20 times. The cipher is deemed secure with the specified number of rounds. But it is not considered secure when fewer rounds of the function are applied to the data. To evaluate the security of a cipher, what researchers can do is run data through fewer rounds of cryptographic operations than in the full implementation and then analyze the performance of various kinds of attacks against the results.

Another measure of performance relates to the amount of data required to launch a distinguisher attack. The less data needed to succeed in an attack, the more effective the attack is considered. This data usually consists of a set of input/output pairs generated by the cipher and a specific key.

The TII team found that distinguishers with neural networks require less data than would be required with classical differential cryptanalysis techniques. This meant that it would take less data to distinguish encrypted data in the same number of rounds. For example, in the case of TEA, they found that at 7-rounds, a conventional distinguisher reaches an accuracy of 1 with 220 samples (about a million). In comparison, a machine learning-based model could achieve the same accuracy with only 28 samples (256). This implies a performance improvement of several orders of magnitude.

The relative performance of the different types of classical and neural network distinguishers can vary greatly across rounds. For example, another classical distinguisher called the bitflip distinguisher was particularly good at early rounds but fared poorly after 7-rounds.

Relative strengths and weaknesses of neural network distinguishers

One of the main limitations of the new approach is that it requires a classical analysis first. The basic workflow consisted of building a normal differential distinguisher and then improving it with a neural network. The main advantage is that this leads to a far better distinguisher than what is possible with classical techniques by themselves.

There are also differences in the need for a training step depending on the algorithms that are being analyzed. For TEA, the attack is tied to a specific key. If the key is changed, then researchers need to retrain the neural network. With Speck and RAIDEN, the network can be trained once for any key, and this trained network can be reused for distinguishing data generated by other keys.

The memory requirements were one of the most significant limitations with the neural network techniques. This particular research did not rely on a huge server. The team was able to run the neural network distinguisher on up to 8-rounds with TEA. In contrast, a classic distinguisher could go further. Emanuele Bellini, Principal Cryptographer at the TII, said, “In principle, we could have gone farther with a more powerful machine. The main limitation is due to memory resources, rather than CPU resources.”

Future research

Down the road, the team would like to explore different neural network architectures. This research explored both multi-layered perceptrons (MLPs) and convolutional neural networks (CNNs). They found that their approach to implementing CNNs required more samples than their favored MLP approach, but not significantly.

Bellini believes these promising results could be extended by further experiments with larger networks with more hidden layers and more neurons per layer. Another line of research could consider different activation and loss functions.

Other future research could explore different kinds of conventional cryptoanalysis as a first step. In this research, the distinguisher was based on first-order differential cryptanalysis. It may be worth investigating other techniques like linear cryptanalysis or higher-order differential cryptanalysis.

Another goal might be finding ways to remove the need for conducting a classical analysis before training the neural networks. In particular, to identify patterns by using only machine learning. Before building a distinguisher, researchers must identify patterns in the output of the encryption function. In random functions, each bit appears with the same probability as tossing a fair coin. Usually, these patterns are found manually. “It would be of great interest if machine learning algorithms could fully automate the process of discovering the pattern you need to look for,” Bellini said.