Sign in with
Sign up | Sign in
SSD Recovery: How Pros Bring Flash Memory Back To Life
By ,
1. When Bad Things Happen To Good Flash

In November of last year, I wrote Disaster Strikes: How Is Data Recovered From A Dead Hard Drive?, chronicling the process as some of my personal storage was brought back from the dead by Seagate Recovery Solutions. Of course, these days we have to worry about more than the loss of important files from mechanical disks. Whether it’s the USB drive in your pocket, the eMMC in your phone, or the SSD in your notebook, flash storage is now just as critical to personal and professional interests, and prone to failure, just like hard drives.

As a follow-up to our coverage last year, we connected with Flashback Data, an Austin-based rescue lab that handles all types of storage devices but carries special expertise in flash media. We even got a special kick when calling the company’s main office and finding Queen’s “Flash” as the on-hold music. Flashback agreed to take us on a tour behind the scenes and show what it takes for a top-level recovery lab to salvage your precious NAND-based bits from the jaws of ruin.

2. A Range Of Reading

When Flashback first got into the recovery business, most of its activities focused on swapping out faulty chips. Over time, this became increasingly difficult as vendors started sourcing different components for different manufacturing runs of the same drive model. Encryption also began to appear on some drives, making matters even more complex. This required Flashback to be able to read the drive’s memory directly, which in turn meant that the lab needed a dizzying host of ways to read chips from across the breadth of the flash industry.

Note that when Flashback refers to “encryption,” this state is generally unknown to the user. Since about 2006, for example, SanDisk has been encrypting data on all of its flash drives, according to co-founder and vice president Russell Chozick. As with self-encrypting hard drives, the controller encrypts all data as it proceeds to flash memory. However, since no password is employed to lock the encryption, data is decrypted as it gets fetched from the media. So in the case of a broken PCB, Flashback attempts to move the controller and memory chips to a new board. “If the controller chip fries, though, it’s going to be almost impossible to get that data back. The controller keeps all of the information about how to decrypt the data. Lose that and you’re pretty much...well, you’ve got issues, big time.”

3. Flash Types

These dark gray chips are of the TSOP48 (48-pin) variety. They were fairly standard on USB flash drives, SSDs, SD cards, and CF cards for years, although they have started to give way to other formats recently. The bottom specimen is the underside of a TLGA chip. Notice how, instead of pins on the sides, the TGLA has pads on its underside. TLGAs are common in all types of flash and are in newer iPhones, as well.

During recovery, Flashback used to plant TSOP48 chips into reader sockets, but TLGAs had to be soldered onto a board. Obviously, this made analysis and data retrieval much harder. Life hasn’t gotten any easier as smartphones have pushed flash memory into newer, smaller package types that make these older “monolithic” formats look simple in comparison.

4. Flash Types, Continued

These SD cards and LaCie USB-based thumb drive are also monolith chips. Whereas most flash cards have a separate controller chip and memory chip, a monolith has both components contained in one tiny package. Obviously, breaks in such devices can happen at any of many different points. If the controller itself fails, technicians can still access the data through other means than the regular access pins where the device would connect to a card reader or camera/phone. To illustrate, this photo shows the LaCie drive with its traces partially exposed.  Recovery technicians have to remove the black solder mask over the traces, to find out where the points are that they need to connect to a logic analyzer. Once all of the points are identified, the card can be wired up similarly to images later in this article.

To remove the mask, Flashback uses a surprisingly mundane approach: rubbing compound and a buffing wheel. It is possible to use chemicals for the same purpose, but Chozick says that Flashback has better luck with slow, careful buffing. With sanding, it’s too easy to damage the flash product’s very fine traces. We asked Chozick if Flashback could wire up the LaCie drive to illustrate, but we changed our minds upon learning that such work can take a technician an entire day.

5. Typical Flash Drive Failures

We’ve all seen pictures of hard drive damage, most of which tends to involve head crashes with circular grooves plowed into the magnetic media. With SSDs and flash products, nearly all of the damage that Flashback sees is invisible. In rare cases, there might be a burn mark on a PCB, but by and large, broken controllers or burned fuses leave no visible evidence. As a result, technicians have to go through the drive testing each resistor in a long, laborious process of trial and discovery. In comparison, a clean connector break like the one shown here is a cake walk for repair techs.

6. What About Wear-Out?

We’ve written previously about the tug-o’-war endurance battle between improving leveling algorithms and higher capacities versus shrinking fab processes. In particular, we’ve worried that flash drives and smaller SSDs that have been in service for several years now might start to exhibit wear-out.

Fortunately, Chozick says that most of the SSDs Flashback receives are less than a year old and haven’t had time to show NAND wear-out. In fact, actual wear-out cases are extremely rare. With USB flash drives, though, especially older ones with lower-grade leveling algorithms, wear-out is a bit more common. Technicians can read the chips just fine, but when they check the data, there are so many ECC bit errors that no data can be extracted. The four red dots in a later image show ECC problems. Major wear problems would be the opposite, with maybe four green dots.

Chozick says he has seen cases in which techs would do one analysis, take the chip out to, say, clean the solder pads, bring it back, and the data would be even worse because of the additional reading.  So yes, wear-out is a real danger, but it’s not the ever-looming crisis some might fear.

7. Get It Hot

Many times, chips will need to be removed from PCBs with the help of a solder rework station. One of the first tools in this process is hot air. In this image, technicians are removing a TLGA chip from a USB drive. Technicians control the temperature and air pressure, heating the device just enough to melt the solder points so that the chip can be removed. These reworking stations also contain soldering irons, flux, ohmmeters, and other diagnostic equipment. Several of these stations occupy Flashback’s main lab, which spans roughly 5000 square feet.

8. Removing Memory

This SSD’s controller is toast, so Flashback technicians begin the gentle tearing down of the drive. Each memory chip is hand-numbered for tracking and easier reassembly of the data.

“Sometimes we won’t know exactly which components are bad,” says Chozick. “We just know that this is the type of drive where we see this or that firmware failure. Or this type of drive often has this kind of failure, so we need to remove the chips to start working on it. Obviously, our customers are often in a hurry, so many times you don’t get to know the exact reason why something fried or what is fried. But you do know that you’re not going to get it to read through this controller, and it’s not encrypted, so we can just start pulling chips, get them read, and do a rebuild.”

9. Pulling Chips

Flash drives and SSDs aren’t the only devices to get the heat treatment. Flashback sees a steady stream of cell phones come through its labs, such as this HTC Evo Android-based phone that drew its last breath in a swimming pool. Flash data recovery services run from the hundreds into the thousands of dollars, so it’s a safe bet that this phone’s contents weren’t your typical kid and kitty videos. Chozick says that it’s not uncommon to see phones come in containing the last known images of a departed friend or loved one. They also receive phones regularly as part of criminal investigations. A perp might try to crush his smoking gun underfoot, so to speak, but if the flash memory remains intact, the data can usually be retrieved for judge and jury.

The Evo is a couple of years old now. Newer phones, such as the Samsung Galaxy series and several others from HTC, often contain eMMC technology, which features the controller built into the memory module, as on an SD card. This can make retrieval considerably easier.

10. Hard Drive Vs. Flash Memory

The service area of a hard disk contains information that lets the drive communicate with itself. For the heads to be able to translate data into the read/write channels, the device must have information about where bad sectors are, how many heads there are, which are turned on or off, and so on. This information resides on the platters in a special service area separate from the user-addressable space.

With flash, manufacturers also leave room for a service area. This contains all the information about error correction codes, whether there’s a bit error in any given sector, where those sectors are located, etc.

Whereas a hard drive would be comprised mostly of 512-byte sectors, flash memory typically uses 528-byte sectors, where 512 bytes would be data and 16 bytes would be the service area. SSDs end up translating down to that user-accessible 512-byte size. But when Flashback reads the raw data, technicians get both pieces. The data area gets mixed in with the service area, so the resulting dump looks like data, service area, data, service area, alternating over and over. When technicians reassemble the image into workable information, all of the service area parts have to be removed.

Image: http://commons.wikimedia.org/wiki/File%3ADisassembled_HDD_and_SSD.JPG. By Rochellesinger at en.wikipedia [CC-BY-SA-2.5], from Wikimedia Commons.

11. Getting Up Close

Sometimes, technicians need to conduct a close visual examination of flash chips and their fragile innards. The best tool for this job is the Mantis microscope from Vision Engineering. Each unit costs $2000 or so, but they allow recovery workers to go hands-free and examine circuitry in 3D (via twin light paths projecting through a single viewing lens) at up to 20x magnification. The more natural experience and comfort of the Mantis helps technicians discover problems they might otherwise miss with conventional eyepiece microscopes. It also greatly helps with solder work, both in disassembly and repair.

12. Scanning Stations

Once chips are connected in such a way that they can be read by external devices, Flashback brings in its house-built systems for reading the data. The systems are fairly simple except the fact that they have Flashback’s special imagers in them that allow techs to jump around to different sectors, control time-outs, and so forth. If something is reading more slowly than normal, the imager can force drives to jump to the next good sectors and keep going in order to get the most data possible as quickly as possible.

“We can go forward and backward and jump around,” says Chozick. “We can tell it to just scan the MFT of the drive and only image the allocated data instead of trying to get free space, so we can do the job very quickly. Sometimes, you’re fighting against a device still in the process of failing, and sometimes you’ve got a client who just needs one or two things back with absolute haste.”

13. Stock It—It’s A Socket

In order to connect flash chips to its reading systems, Flashback employs a startling range of chip mountings. Highlighted here are one type of adapter used to read TSOP48 chips as well as a TLGA reader. Within the adapters, each of the socket pins touches one of the pins on the memory chip. The adapter screws into an underlying board with data contacts for connecting with the TSOP socket. The lower board, in turn, features a USB interface for linking into the scanning station systems.

14. Data Spaghetti

Remember the memory chip pulled off that HTC phone? Here it is again, partially wired up for reading. Flashback had these PCBs custom made for this exact use. They hook up to a USB device programmer. The holes in each corner help secure the chip to an underlying board. With the TSOP adapter shown on the prior page, each of the socket pins touches one of the pins on the memory chip. But in this “spaghetti” shot, all of the chip pads are exposed so that techs can solder right onto them instead of requiring a socket. Since there are so many monoliths and pin-outs, Flashback needs to wire to specific data points and solder directly to the chip.

This is an 8-bit chip, as evidenced by there being eight wires connected to the PCB. If this has been a 16-bit product, there would be twice as many wires.

15. Reading For Hours

Wiring up monolith chips follows on much the same “spaghetti” approach as seen in the last page. Different devices require different wiring, but the idea stays the same, with each lead connecting to a distinct feature. The lead on the top-right, for example, is 3.3 V power. Examining this process, you begin to appreciate how time-intensive simply extracting data from chips can be.

16. Welcome To The Jumble

Let’s take a look at what recovery techs have to work with. What you see here are the contents of a raw data dump of an SSD’s master boot record. The data is jumbled up by the algorithms applied by the controller when optimizing read and write speeds, wear leveling, etc…

“Once we’ve read in the chips,” says Chozick, “we simply have a pile of raw data. In this example, the flash memory chip has a 528-byte sector. The first 512 bytes are used for the data. The last 16 bytes are used to store information about what order the data is in and error correction information. We call this the service area. So, when we first look at a flash dump in hex(idecimal code), we need to find known data structures to see how the data is mixed up.“

This and the following page show examples of known data structures in hex.

17. FAT Under The Microscope

Shown here are a FAT16 file system and a boot sector.

“The MBR is usually sector 0,” says Chozick. “Now, it’s not going to be 0 in the flash dump, but we can at least find where that is and determine the known structure of data. We know where it is, how far it is from the boot sector, and so on. You can see it in the next image. It’s like evidence gathering. We find the MBR, the boot sector, the FAT. Now we can look at these known structures and figure out how to put everything back together.”

Chozick notes that sometimes techs can’t find any of these structures, usually because of some algorithm applied to the device. Some algorithms will invert all of the data bits. If that approach is discovered, techs know to reinvert it back. Some algorithms will join everything by a single byte instead of a whole sector, so every byte will be on a different flash chip. This necessitates rejoining those data by byte rather than by the whole sector. Some algorithms will use ciphers, which makes things, understandably, even harder.  For a process driven by computer, recovery often proves very manual.

18. Come Back Together

Let’s take a closer look at the sector data in a case wherein information is scattered across multiple memory chips. You can see what the first part of each sector looks like.

Hex numbers are supposed to run in this order: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 1A, 1B, 1C, and so on. On flash chip #1, though, you can see that the hex order is broken twice. First, there’s a jump from 09 to 0E, then there’s another four-sector gap between 11 and 16. What happened to the corresponding data? The answer sits waiting on flash chip #2.

19. Back In Order

Technicians need to rejoin these separated 2112 bytes (528 bytes per sector * 4). When joined, the result will look as you see it here.

Now imagine if there are 64 flash dumps that have to be combined! Why 64? Because a single chip might not have only one dump. Some could have four—one for each bank within the chip. So take the 16 flash chips you might find on an SSD, multiply by four, and that’s 64 dumps to put back together.

20. Before And After

It can be difficult to visualize what all of these byte-level gyrations mean at the macro level. An empty cell on a spreadsheet (or a corrupt file, for that matter) doesn't quite capture it.

Flashback supplied this image to illustrate the concept. Some examples will still have their header and part of their data intact, so they may look close, but they could be jumbled up or render as half-pictures with gobs of noise all over them.

Starting with an originally corrupted JPEG, technicians applied ECC correction and block translation to order data and remove bit errors that were handled by the controller. They also reordered and removed the service area from the reassembled data so as to have a clean, continuous data stream.

21. The End Result

After hours of manipulation and repair, even using algorithms that help automate some of the data restitching, Flashback technicians finally have information that looks like files and folders. Everything is back in order. The burning question remains, though, as to whether the data is sound and in its original form.

In part, such things can be checked by file headers. SD cards and similar media tend to contain a lot of images, and those are pretty easy to check visually for errors. ECC errors in specific files are fairly easy to spot. Other data types can be harder. Utilities can tell technicians by the file header if they think a good file is sound, but they may not flag a bad sector that would be plainly visible to an observer.

“With most customers, we try to be pretty hands-on,” says Chozick. “We ask them what they’re looking for and test files for them if they ask us to. If it ends up being something where we can’t recover the directory structure, we might just have to do it by file header. This is like a raw recovery where we won’t get file names anymore. We’ll pull off data and sometimes get people more than they thought they had, because we might also get deleted information. Sometimes we might find the FAT table is completely damaged, and then we’ll have to do this sort of raw-type recovery.”

22. What Matters?

In an earlier article we did on data recovery, at least one commenter noted that essentially anyone could get into the recovery business and that Flashback was a small fry operation on a completely different level than more recognized names. Of course, the proof is in the recovery results and the client roster, which includes a broad spectrum of commercial and government accounts.

According to Chozick, Flashback’s lead engineers have over 15 years of experience in data recovery. The company has hundreds of thousands of dollars in equipment and parts inventory.

“It is very difficult to learn this stuff yourself.,” he says. “It has taken years of R&D to get to where we are. Flashback is not as tiny as we seem. We have a 5000 sq. ft. lab with very high security. We have a four-zone biometric access system to our lab with 24x7 video monitoring. We have full ESD (anti-static) flooring in the lab with copper strips run to ground, so there is no risk of electrical damage due to static. We have a steel evidence cage for secure data that is being stored for media involved in litigation. And for hard drives, we have Class 10 and Class 100 laminar flow clean workstations. Our Forensics Lab is the only private ASCLD internationally accredited lab in the world (ISO 17025).“

23. Not So Small, After All

The data recovery portion of Flashback’s lab consists of three rooms. First, there’s a large space lined with computers containing solder stations, recovery machines, imaging machines, and firmware machines. The area also contains servers for data storage and similar tasks. Another room stores parts, including thousands of hard drives, different firmware versions, and an avalanche of device models and makes and sizes so that techs have parts on-hand, whether they need circuit boards, internal read/write heads, or anything else. Not least of all, there’s the clean room, stocked with forced airflow workstations for working on hard drives.

One additional layer of security guards the forensic area, where all of the law enforcement or litigation cases go. Flashback uses a big evidence cage (see prior page) that is bolted to the ground with motion sensors all around it.

Again, regardless of Flashback’s size, this article should give you a sense of what goes on behind the scenes of a reputable recovery company trusted with your money, your broken flash storage, and your irreplaceable data. It’s not simply a matter of plug-and-copy. A formidable amount of work and expertise goes into reviving your bits from beyond the pale. We all hope never to need such services, but if the time ever comes that you need them, this is what you can expect to happen.