Timing Attacks

Time: Column:Backend & Servers views:222

Alternative String Comparison

In Java's Play Framework, there is a piece of code used to verify the validity of data in cookies (sessions), which includes signature validation, as shown below:

boolean safeEqual(String a, String b) {
   if (a.length() != b.length()) {
       return false;
   }
   int equal = 0;
   for (int i = 0; i < a.length(); i++) {
       equal |= a.charAt(i) ^ b.charAt(i);
   }
   return equal == 0;
}

Upon first encountering this code, one might find it quite strange. The function's purpose is to compare whether two strings are equal. A typical approach to check for string equality would look like this (derived from JDK 8's String.equals(), with some parts omitted):

public boolean equals(Object anObject) {
    String anotherString = (String) anObject;
    int n = value.length;
    if (n == anotherString.value.length) {
        char v1[] = value;
        char v2[] = anotherString.value;
        int i = 0;
        while (n-- != 0) {
            if (v1[i] != v2[i]) // <- Exit on the first differing character
                return false;
            i++;
        }
        return true;
    }
    return false;
}

In a typical string comparison, we:

  1. Check if the lengths of the two strings are equal; if not, return false immediately.

  2. If the lengths are equal, we compare each character in sequence; if any character differs, we return false.

  3. If all characters are equal, we return true. The moment we encounter a differing character, we exit with false.

However, the code in the Play Framework operates differently. Particularly in the second step, it utilizes XOR. For those familiar with bitwise operations, this is straightforward. The XOR operation (1^1=0, 1^0=1, 0^0=0) allows for the comparison of each bit. If all bits are equal, the cumulative XOR value stored in the variable equal will be 0 (since identical characters will produce an even count), otherwise it will be 1.

The XOR method does not return false upon encountering the first differing character; instead, it performs a complete comparison. This approach is inefficient. Why is that? The reason is security.

Timing Attacks

Timing attacks (Wikipedia) are a type of side-channel attack, which exploit information gained from the physical implementation of a system rather than weaknesses in the implemented algorithm itself (e.g., cryptanalysis and software bugs). Timing information, power consumption, electromagnetic leaks, and even sounds can provide additional information that can be exploited. In many isolated environments (black boxes), this new type of attack can be significantly more effective than traditional mathematical methods of cryptanalysis. (Note: Attempts to breach password systems through social engineering, such as deceiving or coercing authorized users, are generally not considered side-channel attacks.)

Timing attacks are among the most commonly used attack methods. How can normal string comparisons be exploited by hackers using timing attacks?

We know that in a typical string comparison, the function exits upon encountering each differing character. Thus, theoretically, the time taken to compare two strings with only two matching characters will be shorter than that of two strings with ten matching characters at the beginning. You might say, how much difference can that make? A few microseconds, perhaps. But let’s amplify this situation. For instance, in a web application, we could record the response time for each request (generally in milliseconds). If we repeat this 50 times, we can observe the average time or the p50 time to see which character results in a longer response time. If a particular string we are trying shows a longer response time, we can confidently conclude that the preceding characters must be correct. (Of course, you might say that network request noise is too much, and it’s hard to judge based on millisecond-level requests. This requires statistical methods to filter out the noise, which will be discussed later.)

This situation can be exploited for HMAC attacks. HMAC, or Hash-based Message Authentication Code, allows a client to send a string and its signature to the server. The server uses a private key to sign the string sent by the client, generating the signature, which is then compared. The signature uses a hashing algorithm like MD5 or SHA, which is irreversible.

Pseudocode for this process might look like:

bool verify(message, digest) {
    my_digest = HMAC(key, message);
    return my_digest.equals(digest);
}

In this scenario, an attacker who does not know the signing algorithm or private key but knows the API endpoint can effectively crack the signature by enumerating potential signatures while simultaneously tracking the response times. The article "HMAC Timing Attacks" details the entire attack process. It notes:

If a signature is 40 characters long, such as f5acdffbf0bb39b2cdf59ccc19625015b33f55fe, the attacker might start from 0000000000000000000000000000000000000000 and enumerate through the first character (from 0 to f, which is the range of HMAC values) while tracking timing statistics.

Here are the timing results (in seconds) for the first character tested:

0 0.005450913
1 0.005829198
2 0.004905407
3 0.005286876
4 0.005597611
5 0.004814430
6 0.004969118
7 0.005335884
8 0.004433182
9 0.004440246
a 0.004860263
b 0.004561121
c 0.004463188
d 0.004406799
e 0.004978907
f 0.004887240

We can see that the timing results for the first test (in seconds) show little variance between the value “f” and the rest of the sample. All results appear very close. In other words, there is considerable noise masking the signal. Therefore, multiple samples (scaling the tests) and statistical tools are necessary to filter out the signal from the noise. To separate the signal from the noise, we must scale the tests by an arbitrary constant. Through experimentation, the author found that 500 is a good number. In other words, running the test 500 times and recording the results allows for visual observation of differences; however, this method is difficult to automate.

Thus, the author provides another statistical algorithm, which sends 16 requests to the server from 0 to f, records the response times for each request, and ranks them from 1 to 16 (with 1 being the longest and 16 the shortest). The rankings for each character (0 – f) are recorded, and the process is repeated 500 times. Here’s a sample of the results (only showing 25 samples):

{
"0"=>[7, 1, 3, 3, 15, 5, 4, 9, 15, 10, 13, 2, 14, 9, 4, 14, 7, 9, 15, 2, 14, 9, 14, 6, 11...],
"1"=>[13, 4, 7, 11, 0, 4, 0, 2, 14, 11, 6, 7, 2, 2, 14, 11, 8, 10, 5, 13, 11, 7, 4, 9, 3...],
"2"=>[14, 5, 15, 5, 1, 0, 3, 1, 9, 12, 4, 4, 1, 1, 8, 6, 9, 4, 9, 5, 8, 3, 12, 8, 5...],
"3"=>[15, 2, 9, 7, 2, 1, 14, 11, 7, 8, 8, 1, 4, 7, 12, 15, 13, 0, 4, 1, 7, 0, 3, 0, 0...],
"4"=>[12, 10, 14, 15, 8, 9, 10, 12, 10, 4, 1, 13, 15, 15, 3, 1, 6, 8, 2, 6, 15, 4, 0, 3, 2...],
"5"=>[5, 13, 13, 12, 7, 8, 13, 14, 3, 13, 2, 12, 7, 14, 2, 10, 12, 5, 8, 0, 4, 10, 5, 10, 12...]
"6"=>[0, 15, 11, 13, 5, 15, 8, 8, 4, 7, 12, 9, 10, 11, 11, 7, 0, 6, 0, 9, 2, 6, 15, 13, 14...]
"7"=>[1, 9, 0, 10, 6, 6, 2, 4, 12, 9, 5, 10, 5, 10, 7, 2, 4, 14, 6, 7, 13, 11, 6, 12, 4...],
"8"=>[4, 0, 2, 1, 9, 11, 12, 13, 11, 14, 0, 15, 9, 0, 0, 13, 11, 13, 1, 8, 6, 5, 11, 15, 7...],
"9"=>[11, 11, 10, 4, 13, 7, 6, 3, 2, 2, 14, 5, 3, 3, 15, 9, 14, 7, 10, 3, 0, 14, 1, 5, 15...],
"a"=>[8, 3, 6, 14, 10, 2, 7, 5, 1, 3, 3, 0, 0, 6, 10, 12, 15, 12, 12, 15, 9, 13, 13, 11, 9...],
"b"=>[9, 12, 5, 8, 3, 3, 5, 15, 0, 6, 11, 11, 12, 8, 1, 3, 1, 11, 11, 14, 5, 1, 2, 1, 6...],
"c"=>[6, 7, 8, 2, 12, 10, 9, 10, 6, 1, 10, 8, 6, 4, 6, 4, 3, 2, 7, 11, 1, 8, 7, 2, 13...],
"d"=>[2, 14, 4, 0, 14, 12, 11, 0, 8, 0, 15, 3, 8, 12, 5, 0, 10, 1, 3, 4, 12, 12, 8, 14, 8...],
"e"=>[10, 8, 12, 6, 11, 13, 1, 6, 13, 5, 7, 14, 11, 5, 9, 5, 2, 15, 14, 10, 10, 2, 10, 4, 1...],
"f"=>[3, 6, 1, 9, 4, 14, 15, 7, 5, 15, 9, 6, 13, 13, 13, 8, 5, 3, 13, 12, 3, 15, 9, 7, 10...]
}

Next, by averaging the 500 rankings for each character, we derive an output similar to:

"f", 5.302
"0", 7.17
"6", 7.396
"3", 7.472
"5", 7.562
"a", 7.602
"2", 7.608
"8", 7.626
"9", 7.688
"b", 7.698
"1", 7.704
"e", 7.812
"4", 7.82
"d", 7.826
"7", 7.854
"c", 7.86

Thus, "f" stands out. This algorithm can be repeated for the remaining 39 characters.

This statistical technique allows us to extract real signals from noise. Therefore, the total number of requests made would be: 16 * 500 * 40 = 320,000, whereas brute-force enumeration would require 16^40 requests.

Additionally, a paper in academia claims to have broken the RSA encryption algorithm in OpenSSL 0.9.7 using this timing attack method. The paper titled "Remote Timing Attacks are Practical" (PDF) notes (I've roughly translated the abstract; interested readers can refer to the original text):

"Timing attacks are often used to target devices with weaker performance, such as smart cards. Our experiments demonstrate that they can also be used to attack ordinary software systems. This paper provides experimental evidence that timing attacks can successfully break the private keys of a web server based on OpenSSL. The results show that timing attacks can be practically employed for network attacks, necessitating that major security systems guard against this risk."

References:

Corresponding Functions in Various Languages

Now, let’s take a look at how different languages handle functions to mitigate timing attacks.

PHP: PHP RFC for Timing Attack

bool hash_equals ( string $known_string , string $user_string )
boolean password_verify ( string $password , string $hash )

Java: Java had issues in version 1.6, which were fixed in 1.6.0_17. Below is the source code from JDK 8 for MessageDigest.isEqual():

public static boolean MessageDigest.isEqual(byte[] digesta, byte[] digestb) {
    if (digesta == digestb) return true;
    if (digesta == null || digestb == null) {
        return false;
    }
    if (digesta.length != digestb.length) {
        return false;
    }
    int result = 0;
    // time-constant comparison
    for (int i = 0; i < digesta.length; i++) {
        result |= digesta[i] ^ digestb[i];
    }
    return result == 0;
}

C/C++: There are no related functions found in common libraries, so it’s better to implement your own:

int util_cmp_const(const void * a, const void *b, const size_t size) 
{
  const unsigned char *_a = (const unsigned char *) a;
  const unsigned char *_b = (const unsigned char *) b;
  unsigned char result = 0;
  size_t i;
  for (i = 0; i < size; i++) {
    result |= _a[i] ^ _b[i];
  }
  return result; /* returns 0 if equal, nonzero otherwise */
}

Python: For versions 2.7.7 and above, use hmac.compare_digest(a, b); otherwise, use the following Django code:

# Taken from Django Source Code
def constant_time_compare(val1, val2):
    """
    Returns True if the two strings are equal, False otherwise.
    The time taken is independent of the number of characters that match.
    For the sake of simplicity, this function executes in constant time only
    when the two strings have the same length. It short-circuits when they
    have different lengths.
    """
    if len(val1) != len(val2):
        return False
    result = 0
    for x, y in zip(val1, val2):
        result |= ord(x) ^ ord(y)
    return result == 0

Go: Use the crypto/subtle package:

func ConstantTimeByteEq(x, y uint8) int
func ConstantTimeCompare(x, y []byte) int
func ConstantTimeCopy(v int, x, y []byte)
func ConstantTimeEq(x, y int32) int
func ConstantTimeLessOrEq(x, y int) int
func ConstantTimeSelect(v, x, y int) int

One More Thing

Before concluding the article, I want to mention one more issue.

All of the above code samples have a common problem—they check if the lengths of the strings are equal, returning false if they are not. This means that timing attacks can potentially reveal the length of the string, such as your password length. Theoretically, the length of a string should also be considered “sensitive data” (though this is not the case for signatures).