Cache Size Effects - QLC Goes To 8TB: Samsung 870 QVO and Sabrent Rocket Q 8TB SSDs Reviewed

Posted by Tisa Delillo on Monday, September 9, 2024

Whole-Drive Fill

This test starts with a freshly-erased drive and fills it with 128kB sequential writes at queue depth 32, recording the write speed for each 1GB segment. This test is not representative of any ordinary client/consumer usage pattern, but it does allow us to observe transitions in the drive's behavior as it fills up. This can allow us to estimate the size of any SLC write cache, and get a sense for how much performance remains on the rare occasions where real-world usage keeps writing data after filling the cache.

The Sabrent Rocket Q takes the strategy of providing the largest practical SLC cache size, which in this case is a whopping 2TB. The Samsung 870 QVO takes the opposite (and less common for QLC drives) approach of limiting the SLC cache to just 78GB, the same as on the 2TB and 4TB models.

Both drives maintain fairly steady write performance after their caches run out, but the Sabrent Rocket Q's post-cache write speed is twice as high. The post-cache write speed of the Rocket Q is still a bit slower than a TLC SATA drive, and is just a fraction of what's typical for TLC NVMe SSDs.

On paper, Samsung's 92L QLC is capable of a program throughput of 18MB/s per die, and the 8TB 870 QVO has 64 of those dies, for an aggregate theoretical write throughput of over 1GB/s. SLC caching can account for some of the performance loss, but the lack of performance scaling beyond the 2TB model is a controller limitation rather than a NAND limitation. The Rocket Q is affected by a similar limitation, but also benefits from QLC NAND with a considerably higher program throughput of 30MB/s per die.

Working Set Size

Most mainstream SSDs have enough DRAM to store the entire mapping table that translates logical block addresses into physical flash memory addresses. DRAMless drives only have small buffers to cache a portion of this mapping information. Some NVMe SSDs support the Host Memory Buffer feature and can borrow a piece of the host system's DRAM for this cache rather needing lots of on-controller memory.

When accessing a logical block whose mapping is not cached, the drive needs to read the mapping from the full table stored on the flash memory before it can read the user data stored at that logical block. This adds extra latency to read operations and in the worst case may double random read latency.

We can see the effects of the size of any mapping buffer by performing random reads from different sized portions of the drive. When performing random reads from a small slice of the drive, we expect the mappings to all fit in the cache, and when performing random reads from the entire drive, we expect mostly cache misses.

When performing this test on mainstream drives with a full-sized DRAM cache, we expect performance to be generally constant regardless of the working set size, or for performance to drop only slightly as the working set size increases.

The Sabrent Rocket Q's random read performance is unusually unsteady at small working set sizes, but levels out at a bit over 8k IOPS for working set sizes of at least 16GB. Reads scattered across the entire drive do show a substantial drop in performance, due to the limited size of the DRAM buffer on this drive.

The Samsung drive has the full 8GB of DRAM and can keep the entire drive's address mapping mapping table in RAM, so its random read performance does not vary with working set size. However, it's clearly slower than the smaller capacities of the 870 QVO; there's some extra overhead in connecting this much flash to a 4-channel controller.

ncG1vNJzZmivp6x7orrAp5utnZOde6S7zGiqoaenZH53fZJvZqqkk2KFta6MrKqdZaKaw6qx1maqmqWjqruoeZdwZ2appqR6tK3Bq5ynrF2nvKS3xK1kqmdi