James' USENIX 2007 notes: tutorial S1

Computer Forensics
Simson L. Garfinkel, Naval Postgraduate School

Computer forensics is the study of information stored in computer systems for the purpose of learning what happened to that computer at some point in the past—and for making a convincing argument about what was learned in a court of law.

This tutorial was fairly well-attended.

morning session 1

If you're a bad guy, you should be using a Mac, because forensic tools are least heavily developed for Macintosh computers. Additionally, OS X has some of the best anti-forensic tools embedded into it out-of-the-box.

Most of the people who are developing the open-source forensic tools are Linux hackers, so those tools tend to understand Linux operating systems rather well. Most of the commercial tools don't delve deeply into the Linux platform; they mostly target Windows machines.

Security through obscurity really does work against computer forensics, because the forensic tools need to deeply understand the context of the data they analyze. One of the best protections against computer forensics is to use an oddball operating system. (Simson once read a sci-fi novel in which hackers thwarted the government by using 50-year-old computers the government couldn't analyze. I think he might have been referring to Turing's Delirium.)

Most computer forensic investigations that are being conducted today are going after child pornography. This is because most DA offices are judged based on the quantity of their convictions, not the quality, and getting convictions on child porn is low-hanging fruit. (Fire up Kazaa, search for the special child porn terms, gather IPs, subpoena the ISPs, get warrants, conduct raids, seize the evidence, arrest, prosecute, convict, lather, rinse, repeat.)

(And yes, there are specific terms that authorities use to search for child porn. No, Simson did not tell us what they were.)

If you're going to do an investigation, make it a good investigation. Have a process, take notes, record those notes, etc. Even if you're doing an investigation for something simple (e.g.: terminating an employee), a detailed investigation might save you much pain two years later, if you e.g. get hit with a wrongful termination lawsuit, or the FBI comes a-knockin' (because the employee was up to evil things that you didn't know about).

Bruce Sterling has written several books that touch on computer forensics (e.g. The Hacker Crackdown).

You must be qualified by the Court to be permitted to present expert testimony. (Most but not all states simply follow the FRE).)

As an expert witness, you are permitted to present your opinion as evidence. Legally speaking, this is very rare; opinions are almost never permitted as evidence.

At the ACM Conference on Computer and Communications Security in November 2005, Li Zhuang, Feng Zhou, and J. D. Tygar presented a paper entitled Keyboard Acoustic Emanations Revisited. In the paper, they present a novel attack taking as input a 10-minute sound recording of a user typing English text using a keyboard, and then recovering up to 96% of typed characters. This is partly because different keys tend to sound different, but it is also because people tend to type with distinctive patterns, akin to a ham radio operator's fist.

(For more information about a ham radio operator's fist, read Malcolm Gladwell's Blink. Here's an excerpt that includes the discussion of fists, but you really should go read the book; it's quite fascinating.)

For one investigator in DC, his main clientele consists of wives who want divorces. He tells them to bring their husbands' laptops to him (before they file for divorce), which they can do, because a spouse can give consent to search joint marital property. (About 10% of the time, he finds child pornography on the husband's laptop, which makes the divorce settlement really interesting.)

I've only testified in one court case, and boy—that was fun.

These days, hard drives are essentially computer systems: they boot, they have firmware, they have RAM, they have diagnostics, they have protected areas.

When working with a hard drive, a forensic investigator normally uses a device called a write blocker. This is a passthrough hardware device that sits between the hard drive and the controller; its job is to ensure that the hard drive is not written to in any way.

A write blocker can function in two ways: it can either allow only commands that it knows are read-only (blocking all write commands and unknown commands), or it can block all write commands that it knows about. With the former approach, you run the risk that the drive won't function properly—e.g., because one of the unknown commands that the write blocker blocks is in fact read-only and is critical for initializing the drive properly. With the latter approach, you run the risk that one of the unknown commands that the write blocker permits is in fact a write command that modifies the drive. Deciding which approach to use is a challenging task, and there is no one-size-fits-all answer.

EnCase is the most popular computer forensics tool out there. It has its own image file format (called E01). AFF is the presenter's own image file format. Another format, sgzip, stands for seekable gzip.

The presenter isn't aware of any case that has been thrown out because the investigator didn't use the EnCase software. Furthermore, there's a growing movement that using a proprietary investigative toolkit violates a defendant's right to face his accuser. (In one case in Europe, both the prosecution and the defense were required to use only open source software.)

The DNA software by AccessData can mine the suspect's hard drive for potential passwords.

morning session 2

If you're using dd, make sure to use conv=noerror,sync.

Simson has imaged at least 1,000 drives, and has never managed to damage either his computer or the hard drive by hotplugging the ATA connector, even though this is verboten. He did mention, however, that he always connects the ATA connector before connecting power to the drive.

Most hard drives sold on eBay are sold for a reason. Only about 2/3 of the drives I buy that are advertised as fully working actually work.

One time, Simson asked some people who did drive recovery what techniques they use for trying to get data off of dead drives. Their quick answer: put the drive in the freezer overnight, then spin the drive up while it's still cold. If that fails, allow the drive to warm to room temperature, then try again. He's managed to revive about 1/3 of his dead drives by following these steps.

If the freeze the drive trick fails, give up and call DriveSavers. There are other drive recovery companies; they charge e.g. $1K/drive if they can recover in software, and $10K/drive if they have to use their clean room. However, Simson knows the owner of Drive Savers, and the owner told him that some companies' clean room is, in fact, send the drive to Drive Savers.

It doesn't matter if MD5 and SHA1 are secure for generating hashes of raw devices; it matters if the defense can argue that they're insecure. The presenter knows at least one case that was thrown out because the prosecution used MD5 instead of SHA1, and the defense attacked the decision by arguing that MD5 is an inferior tool. The moral of the story: use the best available technology that you have available; if you don't, the defense can argue that you chose to use inferior technology.

Simson is trying very hard to convince the Federal Government not to standardize on the EnCase file format (E01), because the format is undocumented.

The Iran-Contra conspiracy was unraveled because deleted files were recovered from backup tapes.

A company Simson was buying refurbished drives from (the drives 1-236 are dominated by failed sanitization attempts slide) were initially clean, but then they started to ship him drives with data on them. At about the same time, the media reported that the company was experiencing financial difficulties. Simson's hypothesis: the company started cutting corners. In samples from the same vendor, Simson found drives in which a wipe operation had clearly been started, but was aborted before it completed.

DBAN uses a pseudo-random-number-generator, seeds with a random number, writes out the resulting pattern, reinitializes the PRNG with the same seed, and then compares the data the drive returns to what is generated by the PRNG. If the patterns don't match, then the drive might not actually have implemented all of the write operations.

Not even DBAN can wipe bad blocks, or wipe blocks that lie about their capacity (e.g., because the drive has an HPA, or because it is using DCO).

For a good paper about HPAs and DCOs, see Hidden Disk Areas: HPA and DCO, presented at CERIAS 2006 Information Security Symposium.

Very few organizations will actually destroy hard drives when companies are decommissioned.

There are companies that specifically refurbish PCs and send them to third world countries. Simson once asked one of these companies how they sanitize the hard drives. The company replied, Don't worry; we're installing Linux on them, so the hard drives are sanitized. He replied, Not only are you not sanitizing the hard drives, but by installing Linux, you are providing people all of the tools they need to recover the data from that hard drive!

Assertion: nobody in the unclassified world knows how to recover data from drives whose platters have been physically damaged (warped, bent, et. al.). There's not so much as a hint that anyone in the classified world knows how to do this, either—but if the technology does exist, one would expect that even the knowledge of that knowledge would be treated as top secret.

In turns out that public researchers discovered the vulnerability in SHA only a few years after NIST discovered it (replacing it with SHA-1). This might mean that the public is only a few years' behind the NSA. (Or not.)

The NSRL is an excellent tool to discard non-contraband data. It's also an excellent tool for making an inventory of what software was installed on the hard drive.

On FAT and NTFS, files greater than 4 KiB are always aligned on sector boundaries. One of the techniques the author is researching is making hashes of sectors. It turns out that sectors of files are often unique, because there's lots of opportunity for uniqueness in 4 KiB.

File carving is a technique for recovering files for which the metadata (e.g., the directory entry) is not available. (We don't have the File Carving slides, alias.)

A surprising number of people prefer to transfer JPEG files around by pasting them into Microsoft Word files.

File carving can recover (images, text files and documents, cryptographic keys). Images are the most common, because they're easy to find, and forensic investigations are most commonly searching for child porn.

Header/footer carving with JPEG is fast an effective, but error-prone. This is the strategy used by Foremost and Scalpel.

With simple header/footer carving, objects must be validated after they are saved in files (Carving With Validation).

Header/maximum size carving is when you start at the header and carve until the file is invalid. This technique works for JPEG files, and any other file type that doesn't index backwards from a known footer.

Fragment recovery carving attempts to reassemble fragmented files.

I wrote down this code fragment, but now I'm not sure what it's for:

LEN = S-F+1
for I in range(0,LEN):
  for J in range(0, LEN-I):
    data = blocks[S:S+I] + blocks[F-J:J]
    if valid==

Header/embedded length carving is when you look for structures that code length. Microsoft Office and ZIP files use this, for example. (Coincidentally enough, the structure of Office files largely resembles FAT—because Microsoft had the code.)

Carving tools available today...

open source
Foremost
Scalpel
CarvFS
PhotoRec
RevIT & S2
commercial
EnCase
DataLifter

Cross-drive analysis can be used to find correlated information. Currently, though, no one is using this technique.

Credit cards have multiple layers of security/validation codes: the Luhn algorithm, which is a simple checksum; the CVC1 (or CVV1) code, which is encoded on the magnetic stripe of the card; and the CVV2 (or CVC2, or CCID), which is printed on the card.

The NSRL is essentially a stoplist. We can thank the federal government for maintaining it—a (rare?) example of our tax dollars put to a good use.

For forensic examiners, there's never enough time to find all the data.

If you're evil, it's easy to defeat the NSRL by modifying every single file in a trivial way. This will generate a massive amount of noise that will make a forensic investigator's task miserable. This is one of the reasons why Simson is interested in sector hashes, not file hashes.

afternoon session 1


Simson talked a little about how the relatively of time affects
forensic investigations.  Not only do you have to worry about
timezones, DST, leap seconds, et. al., but you have to worry about
computers with inaccurate clocks, or no internal system clock at all.

The Sleuth Kit is a good tool to use to deal with disk images.  Simson
stepped through some common usages.

There are two problems with mounting the filesystem: you could
accidentally modify it (if you don't mount it read-only), and you
won't see unlinked files.  (Plus, if the filesystem is corrupt, you
could wind up crashing your OS—in fact, the fsfuzzer tool
was written specifically to find bugs in filesystem implementations by
subtly corrupting filesystems.)

Mounting an ext3 filesystem read-only STILL MODIFIES THE FILESYSTEM,
as it will replay the journal.

If you're going to go into court, you MUST make a copy of the hard
drive.  The only reason you should be touching the original hard drive
in any way, shape or form is to image it.

14:25 - network forensics

Constructing a packet monitoring infrastructure is a very cool
project.

Harvard uses a system called QRadar by www.q1labs.com, and they've
been very pleased with it.

Like many improvement in security, the black hats were performing
network forensics (hostile packet sniffing) before the white hats
were.

I would have been happy to develop Omnivore for the US DoJ for half
that cost.

The very large ISPs had been working with the FBI for years before
Omnivore/Carnivore.

The names were chosen to get FBI agents (who at the time were
surprisingly computer-phobic) excited about the technology, but in
retrospect, they weren't wise names from a PR perspective.

A story someone Simson knows likes to tell: a company hires
consultants to do some development work.  Upon arriving, the
Consultants open a VPN connection back to their company in India, and
then scan every single computer on the network.  Problem: the
consultants were hired to write code, not perform network scanning.
(Their explanation was the usual excuse: their PCs were infected with
viruses.)

afternoon session 2


Log files are useful even if you don't look at them daily, because
they give you information about the past.

Simson told a story about he place he formerly worked at.  They had
hundreds of PCs.  Physical access was controlled by card readers.
Simson worked on the 3rd floor.  One night, a major heist occurred on
the 5th floor; someone made off with about a hundred PCs.  It was an
inside job, because the thieves also took the log printout from the
card reader system—which was the ONLY log the card reader kept.

The police nabbed the author of the Melissa virus via Caller ID
information that was generated when he dialed into his ISP to upload
it.  However, a security researcher in Boston (name?) also managed to
identify the author (at about the same) by analyzing the meta
information embedded in the Microsoft Word document.

The we put black boxes over the sensitive information gaffe is
unfortunately common.

Word was developed way back when computers were wimpy and DOS didn't
implement virtual memory systems.  As a result, Word documents are
essentially filesystems, so that they can be paged.  At the time, it
was very ingenious, but this is why it's often possible to recover
information from Word files.

Firewire was designed to be a hard drive replacement, so there are
capabilities within firewire for reading and writing arbitrary parts
of memory—i.e., via DMA.  Going through the DMA controller largely
bypasses the operating system, which is what makes the iPod
attack possible.  (Connect a malicious iPod to your PC via
Firewire, and it can read and write your memory.  All of it.)

(At a recent black hat conference, someone presented a tool to block
the iPod attack.  All her tool could do was detect an attack and crash
the victim's PC, but it might be possible to improve upon the
countermeasure technique.)

Cell phone forensics

The cheaper the cell phone is, the less standardized it is, so more
expensive cell phones are actually easier to perform forensic
investigations upon.  (Remember this if you're up to no good.)

Even with GSM phones with SIM cards, there is unique information that
is still stored in the phone, and this information is transmitted.
(Organized crime members in Sweden, Italy, and other EU countries were
using throwaway prepaid SIM cards in attempt to make it impossible to
trace them.  Unfortunately for them, law enforcement could correlate
their use of throwaway SIM cards with ease.)

Simson's smartphone has a remote self-destruct capability: if he sends
it an SMS message with a particular password, it will erase all of its
data in a forensically secure manner, and then lock itself.

The StrongHold box is essentially a Faraday cage.

All of the anti-forensic tools are detectable—if you know where to
look.  As with many black hat versus white hat battles, it boils down
to a technological arms race (or a cat-and-mouse game).

Syscall proxying is calling pivoting.

Storage is getting small enough that it's becoming easier to simply
physically hide it (e.g., the sushi USB drive).

42.zip was a ZIP bomb, but sooner or later someone will hide valid
data in all the chaff.

The whole goal of anti-forensics is to make the forensic analyst spend
more time, because time is almost always the analyst's most
limited resource.

Q&A session


There wasn't really a Q&A session, as Simson finished pretty much
exactly at 17:00.  He did show some of (2.2GB worth!) of data he
sniffed off of the wireless network.


You can go to the index of my Usenix notes.