Solid State Drives
Extend SSD Lifetime Using the Network Database Model
Solid-State Drives are emerging as a replacement storage device for traditional hard drives and flash systems in embedded devices. Efficiently managing data on these devices is increasingly important to meet the application needs without increasing the size of SSDs or recalling due to ‘bad blocks’.
BY JOHN PAI, RAIMA DIVISION OF BIRDSTEP TECHNOLOGY
Page 1 of 1
Solid-State Drives (SSDs) have evolved to become a viable option to replace rotating Hard Disk Drives (HDDs) in many embedded systems. This is because SSDs eliminate the single largest failure mechanism in many embedded systems—the moving parts of HDDs.
Despite the obvious need for these new technology trends, designers are already beginning to face a number of challenges as next-generation devices find their way into embedded applications. The most significant challenges include endurance, limited storage and storage management issues that affect product life and space utilization. Consequently, designers must properly arm themselves with accurate knowledge of these concerns and guidance for how to overcome the limited lifetime of a flash-based SSD and limited capacity of a RAM-based SSD due to the RAM cost.
Device Lifespan and Performance
When deciding on the appropriate SSD for a project, system designers basically have two practical options, the flashed-based SSD or RAM-based SSD.
System designs with flash-based SSD have various strategies to deal with write endurance management, but have the common issue of scoring how many times a block of memory has been written to, and then dynamically and transparently reallocating physical blocks to logical blocks in order to spread the load across the disk. In a well-designed flash SSD, the system would have to write the endurance number of cycles to the whole disk for it to be in any danger.
Flash SSDs are not likely to continue performing at the same level as when first operated. That’s important to know, given the speed with which SSDs have proliferated in the marketplace amid claims that they’re faster, use less power and can be more reliable since there are no moving parts. Flash SSD performance and endurance are related because the management overhead of a flash SSD is related to how many writes and erases to the drive take place. The more write/erase cycles there are, the shorter the drive’s service life.
Flash memory cells are nominally guaranteed for only one million write cycles. Once the quota is reached, the disk can become unreliable. Special firmware or flash SSD controller chips help mitigate this problem with dynamic reallocation rather than rewriting files to a single location.
Although less popular than its flash counterpart, the RAM-based SSD is significantly faster at both read and write operations. A typical RAM SSD does not face the same write cycle limitation as flash SSD because most of the I/O is performed in SSD RAM. The data is then copied from volatile memory to nonvolatile memory when instructed or when powering down. RAM SSDs are usually armed with their own batteries, which last long enough to preserve data in case the system unexpectedly powers off.
Two Data Management Strategies
Embedded system designers have a few basic options when deciding on data management strategy for embedded SSD devices. Currently, the most widespread data management model is a relational model.
The relational model stores data in tables composed of columns and rows. When data from more than one table is needed, a joint operation relates these different data using a duplicate column from each table (Figure 1). While the relational model is flexible, performance is limited by the need to create new tables holding the results from relational operations, and storing redundant columns. Even when designed efficiently, there are several sources of overhead. The main source of overhead comes in the form of data duplication to help preserve the relational database integrity, and a need for a foreign key to efficiently manage relationships. The overhead results in excess in file size and extra I/O needed to perform basic database operation. Such overhead is especially expensive in both flash- and RAM-based SSD devices.
Relational Model (top). The cost of Relational Model as the database grows. (bottom)
Embedded systems designers can exploit the network database model for significant advancements in data management to mitigate the lifespan limitations on solid-state drives. The network model is conceived as a flexible way of representing objects and their relationships. The network model predates the relational model and can be viewed as a superset. This implies that anything expressed in the relational model can be expressed in the network model, even SQL support. The main advantage is the way the relationships are modeled.
A primary distinction to the relational data model is that the network model allows designers to describe relationships between records using “sets,” where pointers are used to relate objects directly and navigate between them (Figure 2). A set is a linked list representing a one-to-many relationship, which contains pointers to the next and previous member link of the set.
Network Model (top). The cost of Network Model as the database grows. (bottom)
Network Model Streamlines Writes and Minimizes Footprint
When compared to the relational model, the network model is faster, more reliable, more efficient with disk space, and requires less I/O to perform the same tasks. In both read and write operations, data structured in the relational model have costly overheads due to the primary key and foreign key relationship.
Consider writing a record into a relational model, where a write operation can be expensive. After a record is inserted into the table, the database inspects the B-Tree to locate the record’s index position. If there is no room available in the B-Tree, the tree needs to be reorganized to maintain efficiency. This reorganization process is write-intensive and unpredictable due to the required fullness of the tree and where in the tree the change must be made. The more nodes a tree contains, the greater the chance of a larger reorganization, which may be space- and time-consuming as well as write intensive. Also, the reorganization process may require the operating system to devote large amount of computing resources to reorganizing in order to meet the time constraints. After the reorganization process, the database can perform a write operation to reference the new record in the B-Tree.
In a network model, adding a record is relatively simple, is less write intensive, wastes no space from duplication of data, and is predictable. The process involves adding a new record and setting pointers to owner, previous and next record. Subsequently, set the owner’s last pointer to the new record. This process is fast, predictable, and does not require reorganization of a B-Tree. Most importantly, it requires minimal write cycles, thus, minimizing wear on the SSD, reducing re-claiming cycles, and optimizing space by removing unnecessary duplication.
Further examination of the differences between relational and network model databases reveals space savings from the network model. This saving is a result of the network model making relationships through set pointers instead of unnecessary data duplication and indexes. In the network model, data is inserted with minimal overhead. A record requires only data and pointers. On the average, one can expect a relational model to require at least 30% more space than a network model database. When considering which data management model to use for the system, remember that the relational model overhead is expensive. Consider inserting 1 Mbyte of data into the SSD with the network model. Inserting the same data in a relational database balloons the size to a minimum of 1.3 Mbytes.
In this comparison, after multiple repeated inserts, a flash-based SSD with a network database will endure a longer life by at least 25%, due to the 30% relational model overhead. Similarly, a RAM-based SSD will have at least 30% extra storage and thus reduce the space reclaim frequency. Once an SSD reaches a certain point, the operating system reclaims space. With the reduced overhead of the network model, the frequency of reclamation from the operating system is significantly reduced as well.
Data Management Strategies
To further extend the life or maximize space of an SSD, there are several data management strategies in addition to the network model that designers can consider to enhance performance, minimize disk space and minimize write cycles. The design strategies that designers can add to a network database include sparse indexing, optimizing cache, and combining the use of an in-memory database.
Sparse indexing can save space by referencing the indexed data rather than duplicating it. Traditional databases duplicate the indexed data for search efficiency because of data locality, but this uses vast amounts of space. Referencing data is a non-issue for RAM SSD, allowing application designers to specify full duplication, partial duplication, or no duplication of data to reduce storage utilization.
Cache optimization customizes the cache to be large enough to minimize write cycles by updating the database only at the end of transactions. Then, when data is inserted into the files, it writes to each file sequentially. This will write the updated pages in each file in ascending order by offset in the file, which may also lengthen the service life of a flash SSD.
The use of an in-memory database can be critical to keeping unnecessary write cycles in main memory. Similar to cache optimization, a hybrid in-memory database can reduce unnecessary writes and disk usage by storing the ordered duplicate, key information in main memory to preserve the data, maintaining the transactional integrity of the system. Such a strategy may also prolong the life of a flash SSD, reduce the re-claiming frequency and maximize storage space.
There are many ways to extend the life of a flash SSD and save space for a RAM SSD. With minimum resources and overhead required, a network database along with a combination of sparse indexing, cache optimization and in-memory database will yield an optimal data management solution to help prolong the service life of a solid-state device.
Raima Division, Birdstep Technology
[ www.ramia.com ]