Software Optimization Can Reduce MCU “Active-Phase” Power Consumption


Task switching in a cooperative scheduler can lead to a highly efficient implementation since the number of context switches required is minimized.

Only a system-level view of a low-power application is going to achieve anywhere near the optimal solution for any product. This applies to both the active and inactive phases. When the active power mode is significant, then there is much to analyze, and the optimal solution is likely to be achieved by combining a detailed knowledge of the application with deep experience working with whatever communication methods are used.

Clearly we need to begin by understanding the functional specification of the product together with the power
design goals and avoid unnecessary feature creep. It is unlikely that a generalized solution is going to be the
most power-efficient for any specific set of requirements,
although there are some obvious measures that can be
taken as a first step; for example, using a compiler that
optimizes for minimal power consumption and running it
on the highest level of optimization.

There are two main ways to achieve low power consumption on any embedded application: minimizing power in the inactive/passive state and minimizing power in the active state. This article focuses on the role of software and the issue of minimizing the power consumption of a system in the active mode. The net benefit to be gained from working on these areas depends on the ratio of active and passive power consumption:

(average_active_power x active_time) / (average_passive_power x passive_time)

As one area is improved, the relative
contribution of the other grows.

The important principle is that a system-level design approach to any application is essential to achieving optimal power consumption. That means carefully analyzing the requirements of the product so that the activities that take the most power can be analyzed and optimized, then constructing a hardware and software design that fulfills those requirements, eliminating unnecessary or redundant functionality. Most deeply embedded systems have one or more input sources and one or more outputs. Some systems are more complex so naturally the options are more complicated, and this has an impact on the possibility for improvement. Active-phase design goals include reducing the execution time of software required to achieve any specific task as well as reducing the time spent driving external I/O and external peripherals.

System Task Scheduling

The simplest possible solution in any embedded system is that every task when required is initiated by interrupts. All code is then event-driven: it should only do what is required and then return to a low-power mode. However, this is only likely to be practical for the simplest applications. Where data communication is required, it becomes impractical and the code overhead can build rapidly. The solution is to use
a scheduler or similar operating system to handle task and data interactions, but each of those has a different impact on power consumption.

Choosing a scheduling mechanism that can support low-power optimization can have a significant impact on the system power consumption, especially if it makes use of low-power modes available on the target device.

For example, use a super loop only if the application is very simple and power requirements are not critical. It pro- vides limited software reuse with no roadmap for future development. This method is less than ideal since by definition some unnecessary processing is required to decide what to execute next. It is also generally difficult, but by no means impossible, to use third-party components with super loops.

Another option is to use a pre-emptive RTOS. The developer immediately gains a benefit because most systems can be event-driven, i.e., only executing code in response
to a specific event. This makes for a very flexible system, though it is not always optimal when it comes to low-power design. RTOSs tend to be designed for high performance rather than low power. In particular, the response time of
the highest priority task is designed to be minimized. This means that for RTOS-based designs, often the most import- ant feature is the context-switch time and the interrupt latency. When looking at low-power systems, interrupt latency is unlikely to be critical in the context of modern processors, and context-switch time is unlikely to be critical given that the system is going to be mostly idle.

A co-operative scheduler (Figure 1) can be a good option since tasks will only relinquish control when they have completed their work, allowing the next highest priority ready task to take over. This minimizes the number of context switches required.

Regardless of the scheduling method chosen, careful consideration should be given to interrupt handling. After
an interrupt has occurred, the amount of work required to service that interrupt is not particularly critical to power consumption. The most important objective is to minimize the number of interrupts in the system by making best use of FIFOs and timeouts, and choosing a scheduling mechanism that can support low-power modes available on the target device (Figure 2).

Smart Meter File System. An advanced Smart-Meter File System (SMFS) from HCC was custom-designed to meet the low power consumption requirements of smart-energy and smart-meter applications. The structured database reduces complexity of the application by using a minimum number of flash operations to preserve both the flash and the battery.

Flash Storage

Storing data in flash can be very complex, but it potentially provides the source of the largest savings. There is a vast range of storage options including NOR flash, NAND flash, SD cards and eMMC, to name a few of the more popular options used in embedded systems. Each has its own strengths and weaknesses, which are far too varied to examine here but some general principles apply.

All these devices have complex internal erase, read and write architectures that are most efficiently handled by mapping your use to the geometry of the flash device. Most also require wear-leveling but this can also depend on the use case; to wear-level a lightly used flash device would be wasteful of both time and power.

It is worth examining the benefits of using a file system. A file system has two primary roles: to make data accessible by an external system (e.g., a FAT file system running on a PC), and to provide an easy-to-use API for the application to hide the underlying flash complexity. Naturally this implies some overhead that may not be appropriate for a system that needs optimized battery life.

HCC’s Smart-Meter File System (SMFS) seen in Figure 3 is a good example of a power-optimized approach to flash data storage. Instead of taking a layered “file system plus driver” approach, it maps databases directly to the flash device. This makes the application simpler and the required amount of flash management is minimized. These are appropriate attributes for a low-power solution since as little time as possible is expended processing complex file system operations, and the number of read/write/erase cycles required is reduced dramatically. This reduction in required flash accesses alone, compared to using a full-featured file system, can result in significant system power savings.

It is also the case that some flash devices have low-power modes that must be handled carefully to ensure the con- tents of their RAM buffers will not be lost. File systems that handle these types of flash devices most effectively should ensure that hooks are built in to indicate when the mode is entered and exited.

Aggregating data is very important when attempting to minimize power consumption. Practically all flash devices operate most efficiently when the size of data to be stored is a multiple of the base storage unit of the device. In some cases the difference in work required is large but this introduces other difficulties. If data is aggregated and the system resets, then data can be lost. It may be useful to consider a hybrid solution where some small static storage area (e.g., a small FRAM) is used for intermediate storage until data is ready for committing to the flash device.

There are also situations where reducing the amount of RAM may be counterproductive if you are trying to reduce power consumption. Being able to maintain cache and file system metadata in RAM will greatly reduce the need to access flash to retrieve information.


Time spent actively driving an external network interface is likely to use a significant amount of power. Returning from this state to a low-power mode as quickly as possible is essential. If possible, the designer should also minimize the number of data transfers required to ensure minimal power consumption.

Sockets interfaces used by many network stacks are inherently inefficient. Practically speaking, they enforce a copy of the data and additional handling that is not beneficial to a low-power system. Creating a system design where data can be directly accumulated to the buffer to be transmitted and then read directly from the Ethernet buffer is likely to be the most efficient solution. As with flash storage, aggregation of data into packets will further reduce the system load, but the implications of a system reset with untransmitted data must be carefully considered.

Other important possibilities for improvement include switching off TCP checksums where the underlying media is reliable (typically because it has its own checksums), or using UDP to send data. But UDP must be used with caution; it is a fire-and-forget protocol, and if you need to make sure the message arrives, you must add some management—which effectively implies a protocol very similar to TCP!


SSL/TLS modules can be power-optimized and it’s worth looking for these modules. Remember that TLS software offers three different functions: authentication to ensure you are talking to who you think you are talking to, encryption to ensure your data is not readable by a third party, and integrity to ensure your message has not been modified in transit.

Different algorithms are used for the different functions, and by default three algorithms are agreed to in a TLS handshake. The developer needs to look carefully at the design requirements and verify what is really required so that unnecessary processing can be avoided; for instance, data may be in a format that is not useful to an intruder, or tampering with the data is not practical because of the nature of the data used. If encryption and integrity are enforced with TLS, all data will pass through two additional processing steps—an encrypt/decrypt module and a hashing module.

Software-based security algorithms tend to be CPU-intensive. From a power-consumption perspective, it is clearly advantageous to reduce the usage of these algorithms to the minimum required by the system rather than accept a default setup.

As this article stresses, analysis of the requirements and developing a design that accurately meets those goals is critical. But this concept should also be extended to all levels of the software implementation. Selecting components that are developed using a defined process is the best way of guaranteeing that they do what you want them to do, and that they do no more than you want them to. A very small “corner-case” with seemingly little implication on the device’s functionality could have serious implications to power consumption; for example, an IO that is not switched off after a particular exception condition could be extremely difficult to detect in normal test. Risk will inevitably be minimized by the use of tried and tested methods such as a full V model design.

HCC Embedded USA
New York, NY
(212) 734-1345