Week11 CRYPTO: Public Key Crypto Attacking

According to the @CTF101: https://ctf101.org/

Cryptography is the reason we can use banking apps, transmit sensitive information over the web, and in general protect our privacy. However, a large part of CTFs is breaking widely used encryption schemes which are improperly implemented. The math may seem daunting, but more often than not, a simple understanding of the underlying principles will allow you to find flaws and crack the code.

The word “cryptography” technically means the art of writing codes. When it comes to digital forensics, it’s a method you can use to understand how data is constructed for your analysis.

What is cryptography used for?

Uses in every day software

Securing web traffic (passwords, communication, etc.)
Securing copyrighted software code

Malicious uses

Hiding malicious communication
Hiding malicious code

Topics

XOR
Cesear Cipher
Substitution Cipher
Vigenere Cipher
Hashing Functions
RSA

XOR

Data Representation

Data can be represented in different bases, an 'A' needs to be a numerical representation of Base 2 or binary so computers can understand them

Data Representation

XOR Basics

An XOR or eXclusive OR is a bitwise operation indicated by ^ and shown by the following truth table:

A	B	A ^ B
0	0	0
0	1	1
1	0	1
1	1	0

So what XOR'ing bytes in the action 0xA0 ^ 0x2C translates to is:


1	0	1	0	0	0	0	0
0	0	1	0	1	1	0	0

1	0	0	0	1	1	0	0

0b10001100` is equivelent to `0x8C`, a cool property of XOR is that it is reversable meaning `0x8C ^ 0x2C = 0xA0` and `0x8C ^ 0xA0 = 0x2C

XOR Basics

What does this have to do with CTF?

XOR is a cheap way to encrypt data with a password. Any data can be encrypted using XOR as shown in this Python example:

>>> data = 'CAPTURETHEFLAG'
>>> key = 'A'
>>> encrypted = ''.join([chr(ord(x) ^ ord(key)) for x in data])
>>> encrypted
'\x02\x00\x11\x15\x14\x13\x04\x15\t\x04\x07\r\x00\x06'
>>> decrypted = ''.join([chr(ord(x) ^ ord(key)) for x in encrypted])
>>> decrypted
'CAPTURETHEFLAG'

This can be extended using a multibyte key by iterating in parallel with the data.

Exploiting XOR Encryption

Single Byte XOR Encryption

Single Byte XOR Encryption is trivial to bruteforce as there are only 255 key combinations to try.

Multibyte XOR Encryption

Multibyte XOR gets exponentially harder the longer the key, but if the encrypted text is long enough, character frequency analysis is a viable method to find the key. Character Frequency Analysis means that we split the cipher text into groups based on the number of characters in the key. These groups then are bruteforced using the idea that some letters appear more frequently in the english alphabet than others.

Substitution Cipher

A Substitution Cipher is system of encryption where different symobls substitute a normal alphabet.

Substitution Cipher

Caesar Cipher/ROT 13

The Caesar Cipher or Caesar Shift is a cipher which uses the alphabet in order to encode texts.

CAESAR` encoded with a shift of 8 is `KIMAIZ` so `ABCDEFGHIJKLMNOPQRSTUVWXYZ` becomes `IJKLMNOPQRSTUVWXYZABCDEFGH

ROT13 is the same thing but a fixed shift of 13, this is a trivial cipher to bruteforce because there are only 25 shifts.

Caesar Cipher

Vigenere Cipher

A Vigenere Cipher is an extended Caesar Cipher where a message is encrypted using various Caesar shifted alphabets.

The following table can be used to encode a message: Vigenere Square

Encryption

For example, encrypting the text SUPERSECRET with CODE would follow this process:

CODE gets padded to the length of SUPERSECRET so the key becomes CODECODECOD
For each letter in SUPERSECRET we use the table to get the Alphabet to use, in this instance row C and column S
The ciphertext's first letter then becomes U
We eventually get UISITGHGTSW

Decryption

Go to the row of the key, in this case C
Find the letter of the cipher text in this row, in this case U
The column is the first letter of the decrypted ciphertext, so we get S
After repeating this process we get back to SUPERSECRET

Hashing Functions

Hashing functions are one way functions which theoretically provide a unique output for every input. MD5, SHA-1, and other hashes which were considered secure are now found to have collisions or two different pieces of data which produce the same supposed unique output.

String Hashing

A string hash is a number or string generated using an algorithm that runs on text or data.

The idea is that each hash should be unique to the text or data (although sometimes it isn’t). For example, the hash for “dog” should be different from other hashes.

You can use command line tools tools or online resources such as this one. Example: $ echo -n password | md5 5f4dcc3b5aa765d61d8327deb882cf99 Here, “password” is hashed with different hashing algorithms:

SHA-1: 5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8
SHA-2: 5E884898DA28047151D0E56F8DC6292773603D0D6AABBDD62A11EF721D1542D8
MD5: 5F4DCC3B5AA765D61D8327DEB882CF99
CRC32: BBEDA74F

Generally, when verifying a hash visually, you can simply look at the first and last four characters of the string.

File Hashing

A file hash is a number or string generated using an algorithm that is run on text or data. The premise is that it should be unique to the text or data. If the file or text changes in any way, the hash will change.

What is it used for? - File and data identification - Password/certificate storage comparison

How can we determine the hash of a file? You can use the md5sum command (or similar).

$ md5sum samplefile.txt
3b85ec9ab2984b91070128be6aae25eb samplefile.txt

Hash Collisions

A collision is when two pieces of data or text have the same cryptographic hash. This is very rare.

What’s significant about collisions is that they can be used to crack password hashes. Passwords are usually stored as hashes on a computer, since it’s hard to get the passwords from hashes.

Password to Hash

If you bruteforce by trying every possible piece of text or data, eventually you’ll find something with the same hash. Enter it, and the computer accepts it as if you entered the actual password.

Two different files on the same hard drive with the same cryptographic hash can be very interesting.

“It’s now well-known that the cryptographic hash function MD5 has been broken,” said Peter Selinger of Dalhousie University. “In March 2005, Xiaoyun Wang and Hongbo Yu of Shandong University in China published an article in which they described an algorithm that can find two different sequences of 128 bytes with the same MD5 hash.”

For example, he cited this famous pair:

Password to Hash

and

Password to Hash

Each of these blocks has MD5 hash 79054025255fb1a26e4bc422aef54eb4.

Selinger said that “the algorithm of Wang and Yu can be used to create files of arbitrary length that have identical MD5 hashes, and that differ only in 128 bytes somewhere in the middle of the file. Several people have used this technique to create pairs of interesting files with identical MD5 hashes.”

Ben Laurie has a nice website that visualizes this MD5 collision. For a non-technical, though slightly outdated, introduction to hash functions, see Steve Friedl’s Illustrated Guide. And here’s a good article from DFI News that explores the same topic.

RSA

RSA, which is an abbreviation of the author's names (Rivest–Shamir–Adleman), is a cryptosystem which allows for asymmetric encryption. Asymmetric cryptosystems are alos commonly referred to as Public Key Cryptography where a public key is used to encrypt data and only a secret, private key can be used to decrypt the data.

Definitions

The Public Key is made up of (n, e)
The Private Key is made up of (n, d)
The message is represented as m and is converted into a number
The encrypted message or ciphertext is represented by c
p and q are prime numbers which make up n
e is the public exponent
n is the modulus and its length in bits is the bit length (i.e. 1024 bit RSA)
d is the private exponent
The totient λ(n) is used to compute d and is equal to the lcm(p-1, q-1), another definition for λ(n) is that λ(pq) = lcm(λ(p), λ(q))

What makes RSA viable?

If public n, public e, private d are all very large numbers and a message m holds true for 0 < m < n, then we can say:

(me)d ≡ m (mod n)

The triple equals sign in this case refers to modular congruence which in this case means that there exists an integer k such that (me)d = kn + m

RSA is viable because it is incredibly hard to find d even with m, n, and e because factoring large numbers is an arduous process.

Implementation

RSA follows 4 steps to be implemented: 1. Key Generation 2. Encryption 3. Decryption

Key Generation

We are going to follow along Wikipedia's small numbers example in order to make this idea a bit easier to understand.

In This example we are using Carmichael's totient function where λ(n) = lcm(λ(p), λ(q)), but Euler's totient function is perfectly valid to use with RSA. Euler's totient is φ(n) = (p − 1)(q − 1)

Choose two prime numbers such as:
p = 61 and q = 53
Find
n = pq = 3233
Calculate λ(n) = lcm(p-1, q-1)
λ(3233) = lcm(60, 52) = 780
Choose a public exponent such that 1 < e < λ(n) and is coprime (not a factor of) λ(n). The standard is most cases is 65537, but we will be using:
e = 17
Calculate d as the modular multiplicative inverse or in english find d such that: d x e mod λ(n) = 1
d x 17 mod 780 = 1
d = 413

Now we have a public key of (3233, 17) and a private key of (3233, 413)

Encryption

With the public key, m can be encrypted trivially

The ciphertext is equal to me mod n or:

c = m^17 mod 3233

Decryption

With the private key, m can be decrypted trivially as well

The plaintext is equal to cd mod n or:

m = c^413 mod 3233

Exploitation

From the RsaCtfTool README

Attacks:

Weak public key factorization

Wiener's attack

Hastad's attack (Small public exponent attack)

Small q (q < 100,000)

Common factor between ciphertext and modulus attack

Fermat's factorisation for close p and q

Gimmicky Primes method

Past CTF Primes method

Self-Initializing Quadratic Sieve (SIQS) using Yafu

Common factor attacks across multiple keys

Small fractions method when p/q is close to a small fraction

Boneh Durfee Method when the private exponent d is too small compared to the modulus (i.e d < n0.292)

Elliptic Curve Method

Pollards p-1 for relatively smooth numbers

Mersenne primes factorization

Exercise

(4 pt) Pseudo Random Primes

A wiser once said: "every number can be factored to the sum of primes."

Try to factor the number and find me some flag.

Checkpoint: what's the first 20 numbers in f(pow(10,6))? Find those numbers and gain 2 points.

Hint: try to convert enc into the binary.

main.py

(4 pt) GCD Oracle

Some oracle can get you LSB and help solve the challenge. Now break the number N with the GCD Oracle.

Byte

The number N has 8 bits at most. Oracle can give you 10 times of response and then you should give the value of N.

Solving this gives you the checkpoint for 2 points. But you can directly solve the second question and get 4 points.

nc ali.infury.org 10008

Full

The number N has 512 bits at most. If you solve this, don't need to solve the previous question.

Hint: you may need a script to solve the second question.

nc ali.infury.org 10007

main.py

(2 pt) RSA in the triangle

A modern public key encryption requires a key pair of 512 bytes usually. Sometimes I just confused about the private key. Therefore, I designed a "super secure" method to find me a private key.

Hint: find a faster way to calculate the triangle.

main.py

challenge.txt