Computer forensics is the study of information stored in computer systems for the purpose of learning what happened to that computer at some point in the past—and for making a convincing argument about what was learned in a court of law.
This tutorial was fairly well-attended.
If you're a bad guy, you should be using a Mac, because forensic tools are least heavily developed for Macintosh computers. Additionally, OS X has some of the best anti-forensic tools embedded into it out-of-the-box.
Most of the people who are developing the open-source forensic tools are Linux hackers, so those tools tend to understand Linux operating systems rather well. Most of the commercial tools don't delve deeply into the Linux platform; they mostly target Windows machines.
Security through obscurity really does work against computer forensics, because the forensic tools need to deeply understand the context of the data they analyze. One of the best protections against computer forensics is to use an oddball operating system. (Simson once read a sci-fi novel in which hackers thwarted the government by using 50-year-old computers the government couldn't analyze. I think he might have been referring to Turing's Delirium.)
Most computer forensic investigations that are being conducted today are going after child pornography. This is because most DA offices are judged based on the quantity of their convictions, not the quality, and getting convictions on child porn is low-hanging fruit. (Fire up Kazaa, search for the special child porn terms, gather IPs, subpoena the ISPs, get warrants, conduct raids, seize the evidence, arrest, prosecute, convict, lather, rinse, repeat.)
(And yes, there are specific terms that authorities use to search for child porn. No, Simson did not tell us what they were.)
If you're going to do an investigation, make it a good investigation. Have a process, take notes, record those notes, etc. Even if you're doing an investigation for something simple (e.g.: terminating an employee), a detailed investigation might save you much pain two years later, if you e.g. get hit with a wrongful termination lawsuit, or the FBI comes a-knockin' (because the employee was up to evil things that you didn't know about).
Bruce Sterling has written several books that touch on computer forensics (e.g. The Hacker Crackdown).
You must be qualified by the Court to be permitted to present expert testimony. (Most but not all states simply follow the FRE).)
As an expert witness, you are permitted to present your opinion as evidence. Legally speaking, this is very rare; opinions are almost never permitted as evidence.
At the ACM Conference on Computer and Communications
Security in November 2005, Li Zhuang, Feng Zhou, and J. D. Tygar presented
a paper entitled Keyboard
Acoustic Emanations Revisited. In the paper, they
present a novel attack taking as input a 10-minute sound
recording of a user typing English text using a keyboard, and then
recovering up to 96% of typed characters.
This is partly because
different keys tend to sound different, but it is also because
people tend to type with distinctive patterns, akin to a ham radio
operator's fist
.
(For more information about a ham radio operator's fist, read Malcolm Gladwell's Blink. Here's an excerpt that includes the discussion of fists, but you really should go read the book; it's quite fascinating.)
For one investigator in DC, his main clientele consists of wives who want divorces. He tells them to bring their husbands' laptops to him (before they file for divorce), which they can do, because a spouse can give consent to search joint marital property. (About 10% of the time, he finds child pornography on the husband's laptop, which makes the divorce settlement really interesting.)
I've only testified in one court case, and boy—that was fun.
These days, hard drives are essentially computer systems: they boot, they have firmware, they have RAM, they have diagnostics, they have protected areas.
When working with a hard drive, a forensic investigator normally uses a device called a write blocker. This is a passthrough hardware device that sits between the hard drive and the controller; its job is to ensure that the hard drive is not written to in any way.
A write blocker can function in two ways: it can either allow only commands that it knows are read-only (blocking all write commands and unknown commands), or it can block all write commands that it knows about. With the former approach, you run the risk that the drive won't function properly—e.g., because one of the unknown commands that the write blocker blocks is in fact read-only and is critical for initializing the drive properly. With the latter approach, you run the risk that one of the unknown commands that the write blocker permits is in fact a write command that modifies the drive. Deciding which approach to use is a challenging task, and there is no one-size-fits-all answer.
EnCase is the most popular computer forensics tool out there. It has its own image file format (called E01). AFF is the presenter's own image file format. Another format, sgzip, stands for seekable gzip.
The presenter isn't aware of any case that has been thrown out because the investigator didn't use the EnCase software. Furthermore, there's a growing movement that using a proprietary investigative toolkit violates a defendant's right to face his accuser. (In one case in Europe, both the prosecution and the defense were required to use only open source software.)
The DNA software by AccessData can mine the suspect's hard drive for potential passwords.
If you're using dd, make sure to use conv=noerror,sync.
Simson has imaged at least 1,000 drives, and has never managed to damage either his computer or the hard drive by hotplugging the ATA connector, even though this is verboten. He did mention, however, that he always connects the ATA connector before connecting power to the drive.
Most hard drives sold on eBay are sold for a reason. Only about
2/3 of the drives I buy that are advertised as
fully working
actually work.
One time, Simson asked some people who did drive recovery what
techniques they use for trying to get data off of dead drives.
Their quick answer: put the drive in the freezer overnight, then
spin the drive up while it's still cold. If that fails, allow the
drive to warm to room temperature, then try again. He's managed to
revive about 1/3 of his dead
drives by following these steps.
If the freeze the drive
trick fails, give up and call DriveSavers. There are other
drive recovery companies; they charge e.g. $1K/drive if they can
recover in software, and $10K/drive if they have to use their
clean room
. However, Simson knows the owner of Drive Savers,
and the owner told him that some companies' clean room
is, in
fact, send the drive to Drive Savers
.
It doesn't matter if MD5 and SHA1 are secure
for generating hashes of raw devices; it matters if the defense can
argue that they're insecure. The presenter knows at least one case
that was thrown out because the prosecution used MD5
instead of SHA1, and the defense attacked the decision by
arguing that MD5 is an inferior tool. The moral of the
story: use the best available technology that you have available; if
you don't, the defense can argue that you chose to use
inferior
technology.
Simson is trying very hard to convince the Federal Government not to standardize on the EnCase file format (E01), because the format is undocumented.
The Iran-Contra conspiracy was unraveled because deleted files were recovered from backup tapes.
A company Simson was buying refurbished drives from (the drives
1-236 are dominated by failed sanitization attempts
slide) were
initially clean, but then they started to ship him drives with data
on them. At about the same time, the media reported that the
company was experiencing financial difficulties. Simson's
hypothesis: the company started cutting corners. In samples from
the same vendor, Simson found drives in which a wipe operation had
clearly been started, but was aborted before it completed.
DBAN uses a pseudo-random-number-generator, seeds with a random number, writes out the resulting pattern, reinitializes the PRNG with the same seed, and then compares the data the drive returns to what is generated by the PRNG. If the patterns don't match, then the drive might not actually have implemented all of the write operations.
Not even DBAN can wipe bad blocks, or wipe blocks that lie about their capacity (e.g., because the drive has an HPA, or because it is using DCO).
For a good paper about HPAs and DCOs, see Hidden Disk Areas: HPA and DCO, presented at CERIAS 2006 Information Security Symposium.
Very few organizations will actually destroy hard drives when companies are decommissioned.
There are companies that specifically refurbish PCs and send them to
third world countries. Simson once asked one of these companies how
they sanitize the hard drives. The company replied, Don't worry;
we're installing Linux on them, so the hard drives are
sanitized.
He replied, Not only are you not sanitizing the
hard drives, but by installing Linux, you are providing people
all of the tools they need to recover the data from that hard
drive!
Assertion: nobody in the unclassified world knows how to recover data from drives whose platters have been physically damaged (warped, bent, et. al.). There's not so much as a hint that anyone in the classified world knows how to do this, either—but if the technology does exist, one would expect that even the knowledge of that knowledge would be treated as top secret.
In turns out that public researchers discovered the vulnerability in SHA only a few years after NIST discovered it (replacing it with SHA-1). This might mean that the public is only a few years' behind the NSA. (Or not.)
The NSRL is an excellent tool to discard non-contraband data. It's also an excellent tool for making an inventory of what software was installed on the hard drive.
On FAT and NTFS, files greater than 4 KiB are always aligned on sector boundaries. One of the techniques the author is researching is making hashes of sectors. It turns out that sectors of files are often unique, because there's lots of opportunity for uniqueness in 4 KiB.
File carving is a technique for recovering files for which the metadata (e.g., the directory entry) is not available. (We don't have the File Carving slides, alias.)
A surprising number of people prefer to transfer JPEG files around by pasting them into Microsoft Word files.
File carving can recover (images, text files and documents, cryptographic keys). Images are the most common, because they're easy to find, and forensic investigations are most commonly searching for child porn.
Header/footer carving with JPEG is fast an effective, but error-prone. This is the strategy used by Foremost and Scalpel.
With simple header/footer carving, objects must be validated after they are saved in files (Carving With Validation).
Header/maximum size carving is when you start at the header and carve until the file is invalid. This technique works for JPEG files, and any other file type that doesn't index backwards from a known footer.
Fragment recovery carving attempts to reassemble fragmented files.
I wrote down this code fragment, but now I'm not sure what it's for:
LEN = S-F+1
for I in range(0,LEN):
for J in range(0, LEN-I):
data = blocks[S:S+I] + blocks[F-J:J]
if valid==
Header/embedded length carving is when you look for structures that code length. Microsoft Office and ZIP files use this, for example. (Coincidentally enough, the structure of Office files largely resembles FAT—because Microsoft had the code.)
Carving tools available today...
Cross-drive analysis can be used to find correlated information. Currently, though, no one is using this technique.
Credit cards have multiple layers of security/validation codes: the Luhn algorithm, which is a simple checksum; the CVC1 (or CVV1) code, which is encoded on the magnetic stripe of the card; and the CVV2 (or CVC2, or CCID), which is printed on the card.
The NSRL is essentially a stoplist. We can thank the federal government for maintaining it—a (rare?) example of our tax dollars put to a good use.
For forensic examiners, there's never enough time to find all the data.
If you're evil, it's easy to defeat the NSRL by modifying every
single file in a trivial way. This will generate a massive amount
of noise
that will make a forensic investigator's task
miserable. This is one of the reasons why Simson is interested in
sector hashes, not file hashes.
Simson talked a little about how the relatively of time affects forensic investigations. Not only do you have to worry about timezones, DST, leap seconds, et. al., but you have to worry about computers with inaccurate clocks, or no internal system clock at all. The Sleuth Kit is a good tool to use to deal with disk images. Simson stepped through some common usages. There are two problems with mounting the filesystem: you could accidentally modify it (if you don't mount it read-only), and you won't see unlinked files. (Plus, if the filesystem is corrupt, you could wind up crashing your OS—in fact, the fsfuzzer tool was written specifically to find bugs in filesystem implementations by subtly corrupting filesystems.) Mounting an ext3 filesystem read-only STILL MODIFIES THE FILESYSTEM, as it will replay the journal. If you're going to go into court, you MUST make a copy of the hard drive. The only reason you should be touching the original hard drive in any way, shape or form is to image it. 14:25 - network forensics Constructing a packet monitoring infrastructure is a very cool project. Harvard uses a system called QRadar by www.q1labs.com, and they've been very pleased with it. Like many improvement in security, the black hats were performing network forensics (hostile packet sniffing) before the white hats were.I would have been happy to develop Omnivore for the US DoJ for half that cost.The very large ISPs had been working with the FBI for years before Omnivore/Carnivore. The names were chosen to get FBI agents (who at the time were surprisingly computer-phobic) excited about the technology, but in retrospect, they weren't wise names from a PR perspective. A story someone Simson knows likes to tell: a company hires consultants to do some development work. Upon arriving, the Consultants open a VPN connection back to their company in India, and then scan every single computer on the network. Problem: the consultants were hired to write code, not perform network scanning. (Their explanation was the usual excuse: their PCs were infected with viruses.)
Log files are useful even if you don't look at them daily, because they give you information about the past. Simson told a story about he place he formerly worked at. They had hundreds of PCs. Physical access was controlled by card readers. Simson worked on the 3rd floor. One night, a major heist occurred on the 5th floor; someone made off with about a hundred PCs. It was an inside job, because the thieves also took the log printout from the card reader system—which was the ONLY log the card reader kept. The police nabbed the author of the Melissa virus via Caller ID information that was generated when he dialed into his ISP to upload it. However, a security researcher in Boston (name?) also managed to identify the author (at about the same) by analyzing the meta information embedded in the Microsoft Word document. Thewe put black boxes over the sensitive informationgaffe is unfortunately common. Word was developed way back when computers were wimpy and DOS didn't implement virtual memory systems. As a result, Word documents are essentially filesystems, so that they can be paged. At the time, it was very ingenious, but this is why it's often possible to recover information from Word files. Firewire was designed to be a hard drive replacement, so there are capabilities within firewire for reading and writing arbitrary parts of memory—i.e., via DMA. Going through the DMA controller largely bypasses the operating system, which is what makes theiPod attackpossible. (Connect a malicious iPod to your PC via Firewire, and it can read and write your memory. All of it.) (At a recent black hat conference, someone presented a tool to block the iPod attack. All her tool could do was detect an attack and crash the victim's PC, but it might be possible to improve upon the countermeasure technique.) Cell phone forensics The cheaper the cell phone is, the less standardized it is, so more expensive cell phones are actually easier to perform forensic investigations upon. (Remember this if you're up to no good.) Even with GSM phones with SIM cards, there is unique information that is still stored in the phone, and this information is transmitted. (Organized crime members in Sweden, Italy, and other EU countries were using throwaway prepaid SIM cards in attempt to make it impossible to trace them. Unfortunately for them, law enforcement could correlate their use of throwaway SIM cards with ease.) Simson's smartphone has a remote self-destruct capability: if he sends it an SMS message with a particular password, it will erase all of its data in a forensically secure manner, and then lock itself. The StrongHold box is essentially a Faraday cage. All of the anti-forensic tools are detectable—if you know where to look. As with many black hat versus white hat battles, it boils down to a technological arms race (or a cat-and-mouse game). Syscall proxying is calling pivoting. Storage is getting small enough that it's becoming easier to simply physically hide it (e.g., the sushi USB drive). 42.zip was a ZIP bomb, but sooner or later someone will hide valid data in all the chaff. The whole goal of anti-forensics is to make the forensic analyst spend more time, because time is almost always the analyst's most limited resource.
There wasn't really a Q&A session, as Simson finished pretty much exactly at 17:00. He did show some of (2.2GB worth!) of data he sniffed off of the wireless network.
You can go to the index of my Usenix notes.