Week11 CRYPTO: Public Key Crypto Attacking
According to the @CTF101: https://ctf101.org/
Cryptography is the reason we can use banking apps, transmit sensitive information over the web, and in general protect our privacy. However, a large part of CTFs is breaking widely used encryption schemes which are improperly implemented. The math may seem daunting, but more often than not, a simple understanding of the underlying principles will allow you to find flaws and crack the code.
The word “cryptography” technically means the art of writing codes. When it comes to digital forensics, it’s a method you can use to understand how data is constructed for your analysis.
What is cryptography used for?
Uses in every day software
- Securing web traffic (passwords, communication, etc.)
- Securing copyrighted software code
Malicious uses
- Hiding malicious communication
- Hiding malicious code
Topics
- XOR
- Cesear Cipher
- Substitution Cipher
- Vigenere Cipher
- Hashing Functions
- RSA
XOR
Data Representation
Data can be represented in different bases, an 'A' needs to be a numerical representation of Base 2 or binary so computers can understand them
XOR Basics
An XOR or eXclusive OR is a bitwise operation indicated by ^
and shown by the following truth table:
A | B | A ^ B |
---|---|---|
0 | 0 | 0 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 1 | 0 |
So what XOR'ing bytes in the action 0xA0 ^ 0x2C
translates to is:
1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 |
1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
---|---|---|---|---|---|---|---|
0b10001100` is equivelent to `0x8C`, a cool property of XOR is that it is reversable meaning `0x8C ^ 0x2C = 0xA0` and `0x8C ^ 0xA0 = 0x2C
What does this have to do with CTF?
XOR is a cheap way to encrypt data with a password. Any data can be encrypted using XOR as shown in this Python example:
>>> data = 'CAPTURETHEFLAG'
>>> key = 'A'
>>> encrypted = ''.join([chr(ord(x) ^ ord(key)) for x in data])
>>> encrypted
'\x02\x00\x11\x15\x14\x13\x04\x15\t\x04\x07\r\x00\x06'
>>> decrypted = ''.join([chr(ord(x) ^ ord(key)) for x in encrypted])
>>> decrypted
'CAPTURETHEFLAG'
This can be extended using a multibyte key by iterating in parallel with the data.
Exploiting XOR Encryption
Single Byte XOR Encryption
Single Byte XOR Encryption is trivial to bruteforce as there are only 255 key combinations to try.
Multibyte XOR Encryption
Multibyte XOR gets exponentially harder the longer the key, but if the encrypted text is long enough, character frequency analysis is a viable method to find the key. Character Frequency Analysis means that we split the cipher text into groups based on the number of characters in the key. These groups then are bruteforced using the idea that some letters appear more frequently in the english alphabet than others.
Substitution Cipher
A Substitution Cipher is system of encryption where different symobls substitute a normal alphabet.
Caesar Cipher/ROT 13
The Caesar Cipher or Caesar Shift is a cipher which uses the alphabet in order to encode texts.
CAESAR` encoded with a shift of 8 is `KIMAIZ` so `ABCDEFGHIJKLMNOPQRSTUVWXYZ` becomes `IJKLMNOPQRSTUVWXYZABCDEFGH
ROT13 is the same thing but a fixed shift of 13, this is a trivial cipher to bruteforce because there are only 25 shifts.
Vigenere Cipher
A Vigenere Cipher is an extended Caesar Cipher where a message is encrypted using various Caesar shifted alphabets.
The following table can be used to encode a message:
Encryption
For example, encrypting the text SUPERSECRET
with CODE
would follow this process:
CODE
gets padded to the length ofSUPERSECRET
so the key becomesCODECODECOD
- For each letter in
SUPERSECRET
we use the table to get the Alphabet to use, in this instance rowC
and columnS
- The ciphertext's first letter then becomes
U
- We eventually get
UISITGHGTSW
Decryption
- Go to the row of the key, in this case
C
- Find the letter of the cipher text in this row, in this case
U
- The column is the first letter of the decrypted ciphertext, so we get
S
- After repeating this process we get back to
SUPERSECRET
Hashing Functions
Hashing functions are one way functions which theoretically provide a unique output for every input. MD5, SHA-1, and other hashes which were considered secure are now found to have collisions or two different pieces of data which produce the same supposed unique output.
String Hashing
A string hash is a number or string generated using an algorithm that runs on text or data.
The idea is that each hash should be unique to the text or data (although sometimes it isn’t). For example, the hash for “dog” should be different from other hashes.
You can use command line tools tools or online resources such as this one. Example: $ echo -n password | md5 5f4dcc3b5aa765d61d8327deb882cf99
Here, “password” is hashed with different hashing algorithms:
- SHA-1: 5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8
- SHA-2: 5E884898DA28047151D0E56F8DC6292773603D0D6AABBDD62A11EF721D1542D8
- MD5: 5F4DCC3B5AA765D61D8327DEB882CF99
- CRC32: BBEDA74F
Generally, when verifying a hash visually, you can simply look at the first and last four characters of the string.
File Hashing
A file hash is a number or string generated using an algorithm that is run on text or data. The premise is that it should be unique to the text or data. If the file or text changes in any way, the hash will change.
What is it used for? - File and data identification - Password/certificate storage comparison
How can we determine the hash of a file? You can use the md5sum command (or similar).
$ md5sum samplefile.txt
3b85ec9ab2984b91070128be6aae25eb samplefile.txt
Hash Collisions
A collision is when two pieces of data or text have the same cryptographic hash. This is very rare.
What’s significant about collisions is that they can be used to crack password hashes. Passwords are usually stored as hashes on a computer, since it’s hard to get the passwords from hashes.
If you bruteforce by trying every possible piece of text or data, eventually you’ll find something with the same hash. Enter it, and the computer accepts it as if you entered the actual password.
Two different files on the same hard drive with the same cryptographic hash can be very interesting.
“It’s now well-known that the cryptographic hash function MD5 has been broken,” said Peter Selinger of Dalhousie University. “In March 2005, Xiaoyun Wang and Hongbo Yu of Shandong University in China published an article in which they described an algorithm that can find two different sequences of 128 bytes with the same MD5 hash.”
For example, he cited this famous pair:
and
Each of these blocks has MD5 hash 79054025255fb1a26e4bc422aef54eb4.
Selinger said that “the algorithm of Wang and Yu can be used to create files of arbitrary length that have identical MD5 hashes, and that differ only in 128 bytes somewhere in the middle of the file. Several people have used this technique to create pairs of interesting files with identical MD5 hashes.”
Ben Laurie has a nice website that visualizes this MD5 collision. For a non-technical, though slightly outdated, introduction to hash functions, see Steve Friedl’s Illustrated Guide. And here’s a good article from DFI News that explores the same topic.
RSA
RSA, which is an abbreviation of the author's names (Rivest–Shamir–Adleman), is a cryptosystem which allows for asymmetric encryption. Asymmetric cryptosystems are alos commonly referred to as Public Key Cryptography where a public key is used to encrypt data and only a secret, private key can be used to decrypt the data.
Definitions
- The Public Key is made up of (n, e)
- The Private Key is made up of (n, d)
- The message is represented as m and is converted into a number
- The encrypted message or ciphertext is represented by c
- p and q are prime numbers which make up n
- e is the public exponent
- n is the modulus and its length in bits is the bit length (i.e. 1024 bit RSA)
- d is the private exponent
- The totient λ(n) is used to compute d and is equal to the lcm(p-1, q-1), another definition for λ(n) is that λ(pq) = lcm(λ(p), λ(q))
What makes RSA viable?
If public n, public e, private d are all very large numbers and a message m holds true for 0 < m < n, then we can say:
(me)d ≡ m (mod n)
The triple equals sign in this case refers to modular congruence which in this case means that there exists an integer k such that (me)d = kn + m
RSA is viable because it is incredibly hard to find d even with m, n, and e because factoring large numbers is an arduous process.
Implementation
RSA follows 4 steps to be implemented: 1. Key Generation 2. Encryption 3. Decryption
Key Generation
We are going to follow along Wikipedia's small numbers example in order to make this idea a bit easier to understand.
In This example we are using Carmichael's totient function where λ(n) = lcm(λ(p), λ(q)), but Euler's totient function is perfectly valid to use with RSA. Euler's totient is φ(n) = (p − 1)(q − 1)
- Choose two prime numbers such as:
- p = 61 and q = 53
- Find
- n = pq = 3233
- Calculate λ(n) = lcm(p-1, q-1)
- λ(3233) = lcm(60, 52) = 780
- Choose a public exponent such that 1 < e < λ(n) and is coprime (not a factor of) λ(n). The standard is most cases is 65537, but we will be using:
- e = 17
- Calculate d as the modular multiplicative inverse or in english find d such that: d x e mod λ(n) = 1
- d x 17 mod 780 = 1
- d = 413
Now we have a public key of (3233, 17) and a private key of (3233, 413)
Encryption
With the public key, m can be encrypted trivially
The ciphertext is equal to me mod n or:
c = m^17 mod 3233
Decryption
With the private key, m can be decrypted trivially as well
The plaintext is equal to cd mod n or:
m = c^413 mod 3233
Exploitation
From the RsaCtfTool README
Attacks:
- Weak public key factorization
- Wiener's attack
- Hastad's attack (Small public exponent attack)
- Small q (q < 100,000)
- Common factor between ciphertext and modulus attack
- Fermat's factorisation for close p and q
- Gimmicky Primes method
- Past CTF Primes method
- Self-Initializing Quadratic Sieve (SIQS) using Yafu
- Common factor attacks across multiple keys
- Small fractions method when p/q is close to a small fraction
- Boneh Durfee Method when the private exponent d is too small compared to the modulus (i.e d < n0.292)
- Elliptic Curve Method
- Pollards p-1 for relatively smooth numbers
- Mersenne primes factorization
Exercise
(4 pt) Pseudo Random Primes
A wiser once said: "every number can be factored to the sum of primes."
Try to factor the number and find me some flag.
Checkpoint: what's the first 20 numbers in f(pow(10,6))? Find those numbers and gain 2 points.
Hint: try to convert enc into the binary.
(4 pt) GCD Oracle
Some oracle can get you LSB and help solve the challenge. Now break the number N with the GCD Oracle.
Byte
The number N has 8 bits at most. Oracle can give you 10 times of response and then you should give the value of N.
Solving this gives you the checkpoint for 2 points. But you can directly solve the second question and get 4 points.
nc ali.infury.org 10008
Full
The number N has 512 bits at most. If you solve this, don't need to solve the previous question.
Hint: you may need a script to solve the second question.
nc ali.infury.org 10007
(2 pt) RSA in the triangle
A modern public key encryption requires a key pair of 512 bytes usually. Sometimes I just confused about the private key. Therefore, I designed a "super secure" method to find me a private key.
Hint: find a faster way to calculate the triangle.