Sunday, February 17, 2019


Is SHA-3 (Keccak) already broken?




The other day I needed to add SHA-3 (Keccak) support to a project in order to ensure interoperability with some other applications. I actually wanted to add SHA-3 support back when it first came out before I had a specific use for it, but at the time, I looked at some code samples, and it looked way too confusing. However, when looking for code samples the other day, I found some more straightforward implementations, that were clear enough that I could figure out how it worked and write my own implementation.

I’ve implemented over a dozen hash algorithms, for reasons I won’t go into here. One upside to doing this is that I can get a decent understanding of how it works internally so I can be familiar with its pros and cons. Now understanding how SHA-3 (Keccak) works, and comparing it to some other hash algorithms, I’m actually somewhat appalled by what I’m seeing.

Hash Properties Overview


Cryptographic hash algorithms (like the Secure Hash Algorithm family) take a file, document, or some other set of data, and produce a fixed size series of bytes to describe the data. Typically between 20 and 64 bytes. These hashes are supposed to change drastically if even a single bit is altered in the input data that is being hashed. They’re also supposed to ensure it’s computationally difficult to produce data targeting a specific hash.

There’s two main properties desired in a cryptographic hash algorithm:

1) That it should be too difficult to generate two documents that have the same hash. This ensures that if party A shows party B a document, and gets them to approve the document’s hash in some sort of whitelist or signature, that party A cannot use a different document against B’s whitelist or signature that is now also approved without consent of B.

2) That if a certain set of data is whitelisted or signed by a party, that it would be too difficult to generate a different set of data that matches a whitelisted or signed hash.

These two properties are really similar. The only difference is whether you’re targeting a preexisting hash that exists somewhere, or generating two different sets of data that can have any hash, as long as they’re identical.

If the first property is not applicable for a hash algorithm, it may still be usable for cases where you’re not accepting data generated by a third party, but only validating your own. If the second property is not applicable, there may still be some use cases where the hash algorithm can be used, especially if only a certain subset of hashes can have data generated against them which match. However, if either property does not hold true, you generally should be using some other hash algorithm, and not rely on something which is broken.

Classical Cryptographic Hash Algorithm Design


For decades hash algorithms basically did the following:
  1. Initialize a “state” of a particular size, generally the output size, with a series of fixed values.
  2. Break the data up into a series of blocks of a larger particular size.
  3. Take each block and use bit operations and basic math operations (addition, subtraction, multiplication) on its data, ignoring overflow, to reduce the block to a much smaller size, generally the output size, to be merged with the “state”. Each block is processed with some fixed data.
  4. Combine each of those states. This might be done by xoring or adding them altogether, and throwing away any overflow.
  5. Append the size of the data to the final block, producing an extra block if neccesary, performing steps 3 and 4 upon it.
  6. Return the state as the result.

SHA-3 Competition


All hash algorithms sooner or later don’t quite hold up to their expectations. Although this isn’t always a problem. For example, if a hash algorithm is supposed to take 1000 years to brute force a match to an existing hash, and someone figures out an algorithm to do it in 950 years, it doesn’t provide the security it theoretically advertises, but the margin of security is so high, it doesn’t matter.

However, in some recent years, real attacks violating one of the two key cryptographic hash properties which could be performed in hours or even minutes have been found against the popular MD5 and SHA-1 algorithms. These attacks don’t necessarily invalidate MD5 and SHA-1 from every potential use, but it’s bad enough that they should be avoided whenever possible, and should not be relied upon for security.

There’s also an issue with these classical hash algorithms regarding how the states are chained and returned as the hash. It makes it easy for hashes to be misused in some cases. Someone who doesn’t necessarily know what data matches a particular hash, can still calculate the hash of data + data2. This might be an issue in certain naïve usages of these hash algorithms.

Further, all the classical algorithms were shown to not quite hold up to their expectations, even though they’re still considered secure enough for the time being. This led to a desire to create a new series of hash algorithms which have a different structure than the classical ones. Therefore a competition was held to create some new ones and determine the best candidates for future use and to earn the title “SHA-3”.

BLAKE and BLAKE2


One of the SHA-3 competition finalists was BLAKE. This hash algorithm was composed of a few different components. Its authors suggested studying BLAKE leaving out each of them, to understand how strong BLAKE was based on a majority of its components. After the competition was over, and after much study, it was realized that one of the components of BLAKE wasn’t necessary, and that the amount of operations overall could be reduced to improve its speed, without really hurting its security. Taking that alongside some interesting ideas introduced alongside Skein and Keccak, two other finalists, BLAKE2 was created. BLAKE2 is one of the fastest cryptographic hash algorithms, and is also considered to be one of the strongest available.

BLAKE2 works as follows:
  1. Initialize a “state” of a particular size, with a series of fixed values, mostly zeros.
  2. Break the data up into a series of blocks of a larger particular size.
  3. Take each block, the block number, and a flag, and use bit operations and addition, ignoring overflow, reducing the size of these in half, to be merged with the “state”. Each block+number+flag is processed with some fixed data.
  4. Combine each of those states.
  5. The flag used alongside the final block is different than all prior blocks.
  6. Return the state as the result.

Theoretically BLAKE2 is stronger than classical hashes, because there is data added to each block that is processed, which cannot be simply influenced by passing data to it. This makes computing data to go from state A to state B more difficult, because you would need a different set of data to do so depending on where the block you’re trying to replace is. Calculating a hash from another hash for data + data2 is more difficult, because of the flag change. The state for data would be different if more blocks were appended to the data.

Keccak

The actual winner of the SHA-3 competition was the Keccak algorithm. It was chosen because it was really different from classical hashes (not necessarily a good thing), and really fast in hardware implementations.

Keccak works as follows:
  1. Initialize a large “state” of a particular size, with zeros.
  2. Break the data up into a series of blocks of a smaller particular size.
  3. Take each block, and use bit operations, to be merged with the larger “state”. Each block is processed with some fixed sparse data.
  4. Combine each of those states.
  5. The final block has two bits flipped.
  6. Return a small portion of the state as the result.

Like BLAKE2, Keccak aims to be stronger than classical hashes because there is state data larger than the size of the block, which cannot be immediately influenced by data (but can still be influenced). Calculating a hash based upon another with appended data is also difficult, because the result is a truncated state. The bit flips in the final block would help as well.

My thoughts


After implementing Keccak and understanding its design, I became alarmed by how much is missing. The use of pure bit operations make it easier to compute in reverse, aside from a few AND operations. The computation of the final state uses bit flippings inside the block, as opposed to outside beyond it, making it easier to tamper with (although theoretically still difficult). But most importantly, the utter lack of using a block counter or data size anywhere.

Classical hash algorithms make it difficult to insert blocks anywhere in the middle of data and get a matching hash. Even if you were able to compute some data to go from state A back to state A, you could not insert this somewhere, because the data size at the end must then be different, resulting in a different hash. Same goes for BLAKE2 because of its block counter appended to each block it processes. Keccak has absolutely nothing here.

Keccak’s initial state is all zeros, something which every Keccak hash must use. If you could compute data which is a multiple of the block size which when processed in Keccak would go from state all zeros to state all zeros, you have just broken Keccak. You would be able to prepend this data as many times as you want to the beginning of any other set of data, and produce the exact same hash. This would apply to every single Keccak hash produced, it targets all Keccak hashes in existence.

Now, I’m no expert cryptographer. Perhaps Keccak’s bit operations ensure that a state of all zeros could never be produced from it. Maybe there even exists a proof for this which is in a document or article I haven’t seen yet. However, if there is no such proof, then Keccak may be broken much worse than even the classical hash algorithms are. With the classical algorithms that are broken, you generally have to break each hash with a very large amount of computations specific to each hash result you want broken. With Keccak here, once a prefix becomes known, you can just prepend it, no computation necessary.

Improving Keccak


Of course if Keccak is indeed broken in this fashion, it’s not hard to fix. The data size could be processed in the end, or a block counter could be used alongside each block like BLAKE2, the state is certainly large enough to be able to handle it. If one were to change Keccak, I’d also move the bit flippings at the end to occur beyond the block size inside the state, just to make it more difficult to tamper with the final result. In the mean time though, is Keccak, and therefore SHA-3, even safe to use?

Summary

For the three kinds of length extension attacks that may exist (prepending, inserting, appending), it appears that classical hashes defeat the first two and have some issues with the third. BLAKE2 defeats them all. Keccak does an okay job against the third, but offers no protection against the first two. A known attack existing against the first would be catastrophic. This is fixable if the algorithm is improved.

13 comments:

Stephan said...

A few points:
- Preimage attack (second on your list) implies collision attack (first on your list).
- Brute forcing a collision is much easier (square root) than brute forcing a preimage.
- SHA1 and MD5 are basically just broken with regards to collision, because their hash lengths is now in the range of bruteforceability. (Hash lengths of 128 bits means collision attack of 64 bits (sqrt(2^128) = 2^64), which is in the range of bruteforceability.)
- Your description of how basic hash algorithms work (namely Merkle-Damgard construction) lacks of an important step: Transformation of the input into a prefix-free string in order to avoid attacks that append inputs.

I have not looked into them in detail yet, but it looks to me as if the new hash algorithms are heavily inspired by how cryptographic random-number generators work. There is an internal state that is larger than the output and you can feed in as much entropy as you want.

Your analysis is very interesting though. I cannot wait to see some interesting comments on this by people who have looked into it more deeply.

insane coder said...

Hi 施特凡,

I linked to the Wikipedia page about preimages and collisions right before my "list" for those that want to learn more about those.

Obviously if you can brute force a hash you can create two data sets with the same hash. However, there may be techniques available that allow the creation of two data sets with the same hash without being required to brute force a hash. People were generating two documents with the same MD5 in a few seconds on a Pentium 4, several years before it became feasible to brute force an MD5 hash with a server farm or stack of GPUs, which usually takes a few minutes.

Regarding the Merkle–Damgård construction, I did not go into full details, as my objective was to focus on the high level aspects and compare and contrast them with BLAKE2 and Keccak. Stuff like compression, sponges, block ciphers, padding, and more was left out on purpose. I intentionally did not cover why HMAC (or other algorithms) does stuff the way it does to avoid prefix and suffix issues.

I have not looked at every new hash algorithm either. However, from what I recall about BLAKE2, the full output is not larger than the internal state (excluding the counters). However, most of the modern hash algorithms offer features to make KDF, MAC, and PRF construction fairly easy. BLAKE2 is built upon ChaCha20 which is a CSPRNG internally.

I too eagerly await the interesting comments. I also hope someone can prove that a prefix block or blocks breaking Keccak is not possible. Although from people I spoke to so far on the topic, they don't think it is.

Bob said...

The simple answer is that the sponge mode of operation, which is the mode used by SHA-3, [guarantees that it cannot be distinguished from the ideal hash with effort less than approximately 2^(c/2) operations](https://keccak.team/files/SpongeIndifferentiability.pdf), where c is the capacity of the hash function. This is often called indifferentiability. In SHA-3, c=448,512,8769,or 1024 depending on the output length. This rules out any generic attacks of cost < 2^(c/2) that could break the hash with reasonable probability.

With respect to the core permutation, which needs to approximate a randomly sampled permutation, it does indeed only use xor, rotation and bitwise and. But repeated 24 times, and with a very well thought-out linear layer, this is sufficient to deter all known attacks. In fact, much fewer rounds would be necessary for this hash to be considered secure; 24 rounds constitute an enormous security margin.

So yes, you could find some output that would lead back to the all-zero state. But this would require either an astronomical computation cost (e.g., more than all the energy in the galaxy put together), or an unknown breakthrough in cryptanalysis that would help a highly conservative design (for reference, most conservative block ciphers designed in the 90s, such as AES, Serpent, Twofish, etc, are still as secure as they were then). None of these things seem likely to happen anytime soon.

Stephan said...

@Bob: You cannot say that it is secure because it meets the security definition. That is circular. The questions is: Why and how does it meet the security definition?
I think the Insane Coder is looking for an intuitive understanding /why/ and /how/ the transformation (sponge) function is making this attack hard.

insane coder said...

Hi 施特凡,

You are correct.

However, to go a step beyond that. We all know that ciphers and hashes get weaker over time. People find ways to break rounds in the process. People find exploits for certain aspects. The outcome is that attacks exist that are better than brute force.

At some point, if an attack exists that can be performed by a government or a large corporation with a massive farm, you have to wonder how susceptible your algorithm is. The key here is that your typical algorithm does not have an achilles heel. It's highly unlikely that anyone is going to spend billions of dollars just to break a single digest produced that is used by software people like us are developing. However, if you can spend a billion dollars to break every single digest produced in a particular algorithm that's being used? That might be a feasible target and worth pouring a huge investment into.

4 rounds of Keccak can already be broken with minimal resources. Possibly even 6 rounds. Yet, the Keccak team recently introduced a 12 round variant because they believe that's strong too: https://keccak.team/kangarootwelve.html

Now if I have two hash algorithms in front of me, both with the same theoretical security margin, but one probably has an achilles heel and the other does not, why would I use the one with the achilles heel?

insane coder said...

While talking this over with some colleagues, a truly scary thought came up.

What if Keccak has a kleptographic backdoor in it?


Dual_EC_DRBG is a random number algorithm that NIST standardized back in the 2000s. Researchers realized its design allowed for a backdoor, where if the constants used by it were computed in a certain way, those who computed those constants could use some secret data generated during those computations to predict the random output. Several years later, Snowden more or less confirmed that indeed this occurred, and the NSA has a backdoor into the algorithm.


Now we look at Keccak. Of all the popular hash algorithms in use, it's the only one which does nothing to prevent prepend and insertion attacks. This could be an oversight by its developers, or it could be just one part of hiding an intentional backdoor to break the hash in certain ways. Although we haven't proven that a zero to zero sequence must exist, we cannot immediately disprove it either (would love to see a proof!).

Now the round and rotation constants look like nothing up my sleeve numbers, but Bernstein, et al., demonstrate that use of nothing-up-my-sleeve numbers as the starting point in a complex procedure for generating cryptographic objects, such as elliptic curves, may not be sufficient to prevent insertion of back doors. If there are enough adjustable elements in the object selection procedure, the universe of possible design choices and of apparently simple constants can be large enough so that a search of the possibilities allows construction of an object with desired backdoor properties. Now this is not a case of elliptic curves, but those round constants for Keccak look really sparse. No other popular hash algorithm uses such sparse constants. The Keccak team describes the algorithm they used to generate their constants, but out of all the possible algorithms used, there is no rationale for why this one was picked over something less sparse.

Now you may say to yourself: But insane coder, so what if skeleton keys to break all instances of Keccak hashes can exist, there is no reason to believe they do and that some entity already has them. So what if the round constants look too simple, that doesn't mean anything. You're just being paranoid!

To this I respond, cryptography is about being paranoid. Consider the amount of research we now have into Post-Quantum-Computing cryptographic algorithms. Quantum Computers that actually can compute and break anything significant are science fiction, and there's a good chance they will never exist in our universe. Yet research is currently pouring tons of resources into this paranoia.

Seriously, why should we use an algorithm which might be compromised when we have other options?

Stephan said...

I found some time to look at the algorithm and at least can answer one of your initial questions: There cannot be a transfer from the zero state to the zero state because of the Iota step in the f function, where constants are added in each round (after the rotations, additions and permutations).

insane coder said...

Hello 施特凡,

Yes constants can be xor'ed in. But those same constants can be xor'ed out.

How does this step prevent the f function from ever outputting all zeros?

Or are you saying how it's added with the 5 variables cycling prevents this?

insane coder said...

Hello 施特凡,

Using the inversion code found here: https://github.com/KeccakTeam/KeccakTools/blob/master/Sources/Keccak-f.h#L496

We computed a state that when run through the Keccak F function would output all zeros.

These are the 25 64-bit state values that permuted through all 24 Keccak rounds would result in all zeros:
14721456279641464894
7724296291459751827
10886275407155278477
319949407487385933
2401862744500137422
6621695293454996830
15688751689846876917
2272708723695410598
3914634542405067510
121355999895083400
4275415388935110838
4250087273847779342
11299929515691117547
7305221439536160700
17775658536205013067
4402618029247410770
12748353195565577610
17929335980907442738
908399519615871481
5655375619681861610
7730053291985347927
4776334871276930609
14375008313403715986
11069090975619611366
8758282975667999104

Here it is in C99 if you want to test it:
uint64_t state[25] =
{
UINT64_C(14721456279641464894),
UINT64_C(7724296291459751827),
UINT64_C(10886275407155278477),
UINT64_C(319949407487385933),
UINT64_C(2401862744500137422),
UINT64_C(6621695293454996830),
UINT64_C(15688751689846876917),
UINT64_C(2272708723695410598),
UINT64_C(3914634542405067510),
UINT64_C(121355999895083400),
UINT64_C(4275415388935110838),
UINT64_C(4250087273847779342),
UINT64_C(11299929515691117547),
UINT64_C(7305221439536160700),
UINT64_C(17775658536205013067),
UINT64_C(4402618029247410770),
UINT64_C(12748353195565577610),
UINT64_C(17929335980907442738),
UINT64_C(908399519615871481),
UINT64_C(5655375619681861610),
UINT64_C(7730053291985347927),
UINT64_C(4776334871276930609),
UINT64_C(14375008313403715986),
UINT64_C(11069090975619611366),
UINT64_C(8758282975667999104),
};

If we know of a series of input a multiple of the block size that would produce this internal state, then we have successfully determined a zero to zero cycle.

We also have some leeway in that there are multiple internal states that would permute to zero. For SHA3-256, there should be multiple solutions where we can change the first 17 values and this will still work.

My colleagues and I have determined that there are multiple 272 byte inputs to SHA3-256 and 216 bytes of input to SHA3-512 that would cause a state cycle.

insane coder said...

Some math regarding this topic:

In Keccak, for every f(x) = y, there exists a non-empty z that f(z||x) = y, and the value of z is actually independent of the values of x and y. There's also more than one possible z, and f(z||z||x) = y as well as f(z||z||z||x) = y and so on.

In the realm of a two block z (a skeleton key) for SHA3-256, the first block of input would need to produce an internal state where the last 8 64-bit integers match the numbers I posted above. Regarding the first 17 integers, whatever the state is from the first block, the input for the second block would be the first 17 integers from the state xor'd against the first 17 numbers from above. This means that what we need for the second block of input here is fully known, and all we care about targeting in our first block are those 8 64-bit values (which matches the security level).

Since we have 2^1088 inputs for the first block, targeting 512 bits, there should exist approximately 2^(1088-512) skeleton keys, which is ~2^576 for SHA3-256. Under NIST's initial desire to lower the security level for SHA3-256, it would have been 2^(1344-256), which is ~2^1088 skeleton keys. These skeleton keys exist independently of the padding bit flips suggested by the initial Keccak proposal and what SHA-3 ending up using.

Apparently this problem was already identified back in 2011, as a colleague of mine just found this paper: https://eprint.iacr.org/2011/261.pdf
What we know is just more of a simplification of that, and we've already computed the (much) easier half of the problem. The concept which really bothered me in this article is mentioned by the paper: it seems that one property present in older designs (MD5, SHA-1, SHA-2) that we took for granted and there was no attempt to define it precisely as a property (or requirement) for SHA3. Therefore, For all other known cryptographic hash functions (MD5, SHA-1, SHA-2, SHA-3 candidates), there is not known explicit form for a class of second preimages for an arbitrary message M. However, for Keccak one class of second preimages is known and is defined.

Therefore we can conclude that Keccak is an algorithm which allows for backdoors. We cannot assume at this point that anyone has the skeleton keys for this backdoor, but we also cannot assume that no one has a skeleton key.

Troubling to consider is that Keccak has its roots in Panama which as a hash function offered a 2^128 security level against collisions, but using some of its mathematical properties actually has a break in 2^6. It was created and broken several years later by some of Keccak's authors. The highly skilled Keccak team, extremely knowledgeable regarding this kind of novel structure used in these algorithms, improved Panama against these earlier attacks they found, producing Keccak. However, by the same token, they would also have the skill to hide a backdoor in the algorithm. There's a decent chance they have some skeleton keys at their disposal if they created an algorithm which has some mathematical property others are not yet aware of.

Knowing this, why should we use an algorithm which might be compromised when we have other options?

kajal said...

"Your hard work does not go unnoticed. Thank you for your dedication."
For more information visit us on Clinical Research Courses in pune

Ethan said...

That was a very informative post, thank you. Find motivation in the Bible

Karthik Rajalingam said...

Nice post, I really like the content posted by you. Thank you for sharing these details here with us.
"Manjummel Boys" captivates with its heartfelt portrayal of friendship amidst the vibrant culture of Kerala. The film delves into the complexities of youth, weaving a narrative rich in emotion and authenticity. With stellar performances and a compelling storyline, it leaves a lasting impression, resonating with audiences long after the credits roll.