ARM’s new high-performance Cortex-M7 processor brings advanced opportunities to engineers looking to bring more connectivity and performance to their embedded applications. By understanding the relative advantages to Cortex-M4 and adaptations, developers can realize substantial advantages and reduce time-to-market.
BY JOSEPH YIU, ARM
The ARM Cortex-M processor family is a range of scalable and compatible, energy-efficient, easy to use processors designed to help developers meet the needs of tomorrow’s smart and connected embedded applications. The Cortex-M4, unveiled in 2010, built on the Cortex-M3 foundation with a set of instruction set extensions explicitly tailored for digital signal processing, along with an optional single-precision floating-point unit delivering 1.25DMIPS/MHz. Since its launch, over 10 semiconductor vendors have introduced Cortex-M4 based general MCU products along with a wide range of sensor hub products based on Cortex-M4.
In past few years, the capability and processing needs of connected embedded systems has become more demanding, with even the simplest of systems expected to have a graphical user interface, HMI or audio recognition or other natural ways of interaction and multiple connectivity options. Processors need to become more capable and offer more local processing capabilities. Microcontrollers in growing automotive and industrial automation applications need to support higher processing requiring CPU performance uplift. Industrial plants require an increasing amount of precision and operate on large amounts of data in a short space of time. These future system demands include delivering more features at a lower cost, increasing connectivity, better code reuse and improved energy efficiency. It is with future in mind that ARM along with its partners designed the ARM Cortex-M7 processor, the most recent and highest performance member of the Cortex-M family.
A Closer Look at the Cortex-M7
Doubling the performance of the Cortex-M4 and delivering 5 CoreMark/MHz, the Cortex-M7 is designed to address the more demanding applications and remove the barrier that previously faced Cortex-M CPU based solutions. Cortex-M7 is designed for a wide range of embedded applications including microcontrollers, automotive controllers, industrial control systems and wireless communication controllers (e.g. Wi-Fi). For those who are familiar with the wide range of Cortex-M family CPUs available for embedded applications, Cortex-M7 is based on the ARMv7-M architecture and brings architectural compatibility all the way from Cortex-M0 (Figure 1).
ARM Cortex-M7 Processor features
The Cortex-M7 sports a six-stage superscalar pipeline and provides integer, floating point and DSP performance along with tightly coupled memories, caches and options to enable larger memory systems while handling deterministic behavior. The advanced pipeline relative to Cortex-M4 enables greater performance, allowing the Cortex-M7 to execute up to two instructions per clock cycle.
A large focus of the development of the Cortex-M7 was on improving the instructions-per-clock (IPC) efficiency relative to earlier Cortex-M family processors. Cortex-M7 is the first Cortex-M profile processor to integrate both the option of instruction and data caches of up to 64KB each. The cache enables efficient operations with a larger memory system (which is typically slower than the processor). Additional support for tightly coupled interfaces to memory arrays is integrated with support for custom Error Correction Code (ECC) implementation for each of the tightly coupled memory interfaces so that fast access to memory enables time-critical interrupt handling and real-time application tasks. This integration allows engineers to execute a large proportion of code from the internal cache to reduce the number of read and write occurrences from the external memory, leading to the power savings.
Cortex-M7 also offers application engineers the option of ECC support for each of the cache memories hence enhancing the reliability of the system. For a given solution, if a memory location is corrupted with a single bit error, the data can be corrected and restored. In addition to the ECC, the memory system can also be enhanced through the optional Memory Protection Unit (MPU) with 8 or 16 regions for better system reliability.
The memory system has also been advanced to support the increased CPU capabilities with a 64-bit AXI bus interface offering greater bandwidth than the 32-bit AHB and allowing multiple outstanding transfers for maximum bus system performance. For easy integration with legacy peripherals used in previous Cortex-M designs, there is an optional low-latency AHB peripheral bus interface. To allow flexible interrupt management and low interrupt latency, the Integrated Nested Vectored Interrupt Controller (NVIC) with 1 to 240 interrupts, and with 3 to 8-bit programmable priority level registers is closely integrated with the processor. There is also support for ETM, designed for use with CoreSight, ARM’s extensible, system-wide debug and trace architecture.
Cortex-M7 further expands the family’s floating-point facilities to include a double-precision option; the simultaneous issue of integer and floating point instructions is also now supported if the FPU is present. Given the range of applications that the Cortex-M7 based MCUs may enable in the future, it is fully supported with powerful debug features, with optional full instruction and data trace. These features make the processor an attractive solution for applications requiring a performance upgrade on devices already using the Cortex-M4 processor.
Migrating Designs to Cortex-M7
Given most embedded engineers and developers are familiar with Cortex-M4, let’s look at some of the software development benefits Cortex-M7 brings. From a developer’s perspective, the Cortex-M7 supports all the instructions available on the Cortex-M4 processor, and uses the same exception model for interrupt handling. In most cases, program code written for Cortex-M4 processor should run on the Cortex-M7 processors without any problem. However, there are a few cases where changes may be needed, and software developers must understand these to reduce the time required when migrating applications from Cortex-M4 to the Cortex-M7 processors.
In order to get the best performance out of the Cortex-M7 processor, a number of C compilers and their runtime libraries have been optimized and updated (Figure 2). In addition, a number of changes in the debug system for the Cortex-M7 processor compared to Cortex-M4 mean that software developers must update their tool chains to newer versions in order to debug applications on Cortex-M7 based microcontroller products. In a few cases the firmware on the debug adaptor might also need an update. As a result, updating to the latest development tool chain is strongly recommended.
2X Performance improvement over the Cortex-M4
Typically the following changes should be done when migrating software from the Cortex-M4 to the Cortex-M7 processor:
• Update the CMSIS- CORE header to use Cortex-M7 header files. The CMSIS-CORE header files for the Cortex-M7 processor is available from CMSIS version 4.2. The most updated CMSIS package is available from www.arm.com/cmsis
• Update CMSIS-DSP library to the Cortex-M7 specific version. The Cortex-M7 specific version is optimized for the pipeline behavior of the Cortex-M7 processor and therefore can offer higher performance.
• New APIs are included in the CMSIS-CORE headers for cache configuration. If the Cortex-M7 device being used executes program from a slow memory (e.g. flash memory) via the AXI interface, the caches should be enabled
for better performance.
In addition, all code should be recompiled in other to allow the compiler to optimize the instruction sequencing better for the Cortex-M7 processor pipeline. In some cases, additional cache maintenance operation might be needed during runtime. For example, if a cacheable memory location is shared between the processor and a separate bus master such as a DMA controller:
a. If the memory location updated by the Cortex-M7 processor has to be accessed by another bus master, a cache clean is needed to ensure the other bus master can see the new data.
b. If the memory location has been updated by a different bus master, the Cortex-M7 processor has to do a cache invalidate so that next time it reads the memory location, it will fetch the information from the main memory system.
The Cortex-M7 processor supports several floating point support options, which allow for no FPU, single precision FPU and for single and double precision FPU. If the application can benefit from the double precision floating point unit support, the application should be updated and recompiled to make use of the double precision FPU. Even if the application uses only single precision floating point operations, recompiling the code for the Cortex-M7 processor can also be beneficial because the FPU in the Cortex-M7 is based on FPv5, whereas the FPU in the Cortex-M4 processor is FPv4. The FPv5 has additional floating point processing instructions, which might help speed up the floating point data processing in the target application.
Program Code Changes
There are a number of potential areas in the program code that might need changes. Due to the higher performance of the processor, some program code might need adjusting due to the faster execution. This is most common for applications that use hard coded timing delay loops.
System memory maps often change when migrating from one microcontroller device to another. Also, in the Cortex-M7 processor the initial vector table does not necessary start at address 0x00000000. If application code assumes initial vector table as address 0, users might need to update the code so that it determines the initial vector table location from the Vector Table Offset Register.
Due to the multiple bus interfaces and more capable write buffers in the Cortex-M7 processor, users might find it necessary to insert additional memory barrier instructions in the program code. The guide line for memory barrier usage is documented in ARM application note AN321 – ARM Cortex-M Programming Guide to Memory Barrier Instructions. In the Cortex-M4 processor, due to the simple nature of the pipeline, omitting the memory barriers does not usually cause any issue. In the Cortex-M7 processor the memory barrier requirements are stricter.
Not only does Cortex-M7 inherit the characteristics from Cortex-M processor series, such as energy efficiency, high performance, ease of use and smaller code, but it is also designed with exceptional memory and connectivity options for design flexibility making it especially suited for the automotive, IoT and industrial connectivity markets. Announcements of Cortex-M7 based MCUs have followed soon after launch of the processor itself with the following:
• STM32 F7 series from STMicroelectronics announced at ARM TechCon in October 2014
• SAM E70 and the SAM S70 series from Atmel targeted at connectivity and general purpose industrial applications announced at CES 2015
• Automotive-qualified Atmel SAM V70 and V71 series take advantage of the Cortex-M7 DSP extensions, targeting at infotainment connectivity and audio applications, also announced at CES 2015
• Freescale has also publicly announced their plans of utilizing the high-performance of Cortex-M7 for power conversion, motor control and industrial automation
Given that there are many architectural similarities between the ARM Cortex-M4 and Cortex-M7 processors, and that ensuring that the majority of application code is directly ready for migration, software developers can get started to ensure their applications are suited for the next generation of embedded connected intelligence. Migration requires some adaption and changes to be made by the user. Developers can follow up the migration process in further details with the whitepaper titled “Migrating Applications from an ARM Cortex-M4 Processor to a Cortex-M7 Processor – A Software Developer’s Guide” on the ARM Connected Community and assess in-depth technical discussions.
San Jose, CA