Forensics is the art of recovering the digital trail left on a computer. There are various methods to find data which is seemingly deleted, not stored, or worse, covertly recorded. In a CTF context, “Forensics” challenges can include file format analysis, steganography, memory dump analysis, or network packet capture analysis.
For solving forensics CTF challenges, the three most useful abilities are probably:
- Knowing a scripting language (e.g., Python).
- Knowing how to manipulate binary data (byte-level manipulations) in that language.
- Recognizing formats, protocols, structures, and encodings.
Common Forensics Concepts
Below follows a high-level overview of some of the common concepts in forensics CTF challenges, and some recommended commands for performing common tasks.
Forensic CTF challenges often require the use of exploratory steps to determine what to do next. Useful commands to know are strings to search for all plain-text strings in the file, grep to search for particular strings, bgrep to search for non-text data patterns, and hexdump
- strings: search for plaintext strings in a file.
- grep: search for a particular string in a file.
- bgrep: search for non-text data patterns.
- hexdump: displays the content of a file in hexadecimal.
Binary is 1’s and 0’s, but often is transmitted as text. It would be wasteful to transmit actual sequences of 101010101, so the data is first encoded using one of a variety of methods. When doing a strings analysis of a file as discussed above, you may uncover this binary data encoded as text strings.
The ability to recognize encodings is beneficial to the solving of forensic CTF challenges. Certain encodings, such as Base64 encoded content, are easily identifiable by its alphanumeric charset and its “=” padding suffix (when present). See example below:
$ echo aGVsbG8gd29ybGQh | base64 -D hello world!
ASCII-encoded hexadecimal is also identifiable by its charset (0-9, A-F). ASCII characters themselves occupy a certain range of bytes (0x00 through 0x7f, see man ascii), so if you are examining a file and find a string like 68 65 6c 6c 6f 20 77 6f 72 6c 64 21, it’s important to notice the preponderance of 0x60’s here: this is ASCII. Technically, it’s text (“hello world!”) encoded as ASCII (binary) encoded as hexadecimal (text again).
Common file formats one can encounter during forensics CTF challenges are:
- Archive files (ZIP, TGZ)
- Image file formats (JPG, GIF, BMP, PNG)
- Filesystem images (especially EXT4)
- Packet captures (PCAP, PCAPNG)
- Video (especially MP4) or Audio (especially WAV, MP3)
- Microsoft’s Office formats (RTF, OLE, OOXML)
Many file formats are well-described in the public domain. For example, Ange Albertini offers visual illustrations of well-known file formats. Below is a GIF example.
When analyzing file formats, a file-format-aware hex-editor like 010 Editor can become quite handy. Many forensic CTF challenges often require the reconstructing of a file based on missing or zeroed-out format fields, therefore knowledge of common file formats can be beneficial.
File Extensions are not the sole way to identify the type of a file, files have certain leading bytes called file signatures which allow programs to parse the data in a consistent manner. File signatures (also known as File Magic Numbers) are bytes within a file used to identify the format of the file. Generally they’re 2-4 bytes long, found at the beginning of a file. Files can sometimes come without an extension, or with incorrect ones. We use file signature analysis to identify the format (file type) of the file.
A Hex Editor is recommended to view file signatures. Once you find the file signature, you can check it against file signature repositories such as Gary Kessler’s.
Data about data. Different types of files have different metadata.
Image File Analysis
An image file’s metadata can be viewed using exiftool. The tool displays metadata for an input file, including file size, dimensions (width and height), file type, as well as program used to create (e.g., Photoshop). Run the following command:
Timestamps are data that indicate the time of certain events (MAC):
- Modification: when a file was modified.
- Access: when a file was read or accessed
- Creation: when a file was created.
Certain events such as creating, moving, copying, opening, editing, etc. might affect the MAC times. If the MAC timestamps can be attained, a timeline of events could be created.
Steganography is the art of hiding data in images or audio. While extraordinarily rare in the real world, steganography is often a popular CTF challenge. Steganography could be implemented using any kind of data as the “cover text” but media file formats are ideal because they tolerate a certain amount of unnoticeable data loss. One example is Least Significant Bit (LSB) Steganography, where data is recorded in the lowest bit of a byte.
File are made of bytes. Each byte is composed of eight bits. As shown in the images below, changing the least-significant bit doesn’t affect the value very much.
Therefore, one can modify the LSB without changing the file noticeably, allowing for a message to be hidden inside.
The difficulty with steganography is that extracting the hidden message requires not only a detection that steganography has been used, but also the exact steganographic tool used to embed it. A bit of trail and error might be required.
Recommended tools to tackle steganography include:
- Stegsolve: used to apply various steganography techniques to image files in an attempt to detect and extract hidden data.
- Steghide: hide data in various kinds of image- and audio-files.
- zsteg: detect hidden data in PNG and GMP files.
- OpenStego: free steganography solution.
- Foremost: a forensic program to recover lost files based on their headers, footers, and internal data structures.
- StegOnline: online steganography tool.
Occasionally, a forensic CTF challenge will involve a full disk image. A disk image is a computer file containing the contents and structure of a disk volume or of an entire data storage device, such as a hard disk drive. The first logical step will be to mount the disk image file. Below is an example of mounting a CD-ROM filesystem image:
mkdir /mnt/challenge mount -t iso9660 challengefile /mnt/challenge
Searching for a flag in a mounted disk image is similar to finding a needle in this haystack – a strategy will be required. Once the filesystem is mounted, the tree command can be used to view the directory structure and see if anything sticks out requiring further analysis. Therefore, a bit of understanding and insight of well-known filesystems will be beneficial:
- New Technology File System (NTFS): a modern, well-formed filesystem that is most commonly used by Windows.
- File Allocation Table (FAT): a general purpose file system that is compatible with all major operating systems.
- Extended (EXT) filesystem: created to be used with the Linux kernel (EXT4 is the most recent version).
- Hierarchical File System (HFS) Plus: a file system developed by Apple for Mac OS X.
In certain cases, one might not be looking for a visible file within the filesystem, but rather a hidden volume, unallocated space (disk space that is not a part of any partition), a deleted file, or a non-file filesystem structure. For the recovery of deleted or missing files, the following tools are commended:
- extundelete: find deleted files in EXT3 and EXT4 filesystems.
- TestDisk: recover missing partition tables, fix corrupted ones, undelete files on FAT or NTFS, etc.
The Sleuth Kit and its accompanying web-based user interface, Autopsy is a powerful open-source toolkit for filesystem analysis. Although more geared toward law-enforcement tasks, available features can be helpful for tasks like searching for a keyword across the entire disk image, or looking at the unallocated space.
Network Traffic Analysis
Network traffic is stored and captured as Packet capture (PCAP) file using programs like tcpdump or Wireshark (both based on libpcap). A popular forensic CTF challenge is to provide a PCAP file representing some network traffic and challenge the player to recover/reconstitute a transferred file or transmitted secret. Complicating matters, the packets of interest are usually in an ocean of unrelated traffic, so analysis triage and filtering the data is also required.
For initial analysis, take a high-level view of the packets using Wireshark’s statistics or conversations view, or applying the capinfos command. Wireshark, and its command-line version tshark, both support the concept of using filters that can reduce the scope of the analysis. Alternatively, PCAP files up to 50MB can be submitted to an online service called PacketTotal, which can graphically display timelines of connections and SSL metadata on the secure connections.