Code optimization with Stm32 Cube IDE

I’m using STM32 Cube IDE suite for STM32 development. This suite is based on Eclipse and GCC and it works quite well. In a recent project I was looking for some optimization I could apply on top of the basic existing settings to reduce the FLASH and RAM size.

In this post I want to share what I found and how it helped me.

Display the size

Before optimize something we need to see it. There are some option to get a report during link step to have these information. The option:

 -Wl,--print-memory-usage 

you pass to the linker is an interesting one because it prints the result even if your code base or memory footprint is too large.

Memory region         Used Size  Region Size  %age Used
             RAM:       22816 B        25 KB     89.12%
           FLASH:      149540 B       192 KB     76.06%

Flash size optimization

Basically, the default settings on STM32 Cube IDE are good, most of the optimization are already set but we can get some more.

The first simple option you have to use to reduce the code size is to tell the compiler to optimize your compilation for this. This is obtained by selecting it with the gcc option -Os. Nothing you may don’t know here.

This as an impact of generating a code less optimized for speed and as a consequence less optimized for power consumption. So for certain peace of code like the wake-up/sleep procedure your system will have to call on regular basis it could be better to keep an optimization for time even if the rest of the project is optimized for size. This is something you can do in your code with inline directives:

void __attribute__((optimize("O3"))) lowPower_switch()

If you need to apply this to a whole C file, or a part of it, you can use it the following way:

#pragma GCC push_options
#pragma GCC optimize ("O3")

... all the code you want to optimize that way

#pragma GCC pop_options

The next nice thing I found on this really good blog post is the link time optimization (LTO). Basically this feature requests the linker to optimized the generated code. As the linker has the entire view of the code it can result some really good optimizations. To apply the LTO, you need to add the -flto option to the compiler (CFLAGS) and the linker (LDFLAGS). This is done by adding the option in the project properties.

You have to do the same in MCU GCC Compiler Miscellaneous menu.

The result is good, let see with that example of a big firmware where 174KB became 145KB

Without -flto option
 177892	   1480	  21344	 200716	  3100c	xxxxx-stm32.elf

With -flto option
 153540	   1476	  21344	 170884	  29b84	xxxxx-stm32.elf

As you can see there is no big savings in RAM memory expected from that way.

When using LTO optimization, you need to make some change in the startup file. It seems that’s a bug with weak function in the GCC version used by ST Cube IDE, so it may be fixed later. The current problem is: when weak interrupt functions are declared in the Core/Startup/startup_stm32xxxxx.S file the irq handler in the Core/Src/stm32lOxx_it.c file are ignored and removed. The consequence is the device is not booting.

You can fix this by modifying the Core/Startup/startup_stm32xxxxx.S commenting all the C defined interrupt handlers.

/*
   .weak      NMI_Handler
   .thumb_set NMI_Handler,Default_Handler

   .weak      HardFault_Handler
   .thumb_set HardFault_Handler,Default_Handler

   .weak      SVC_Handler
   .thumb_set SVC_Handler,Default_Handler

   .weak      PendSV_Handler
   .thumb_set PendSV_Handler,Default_Handler

   .weak      SysTick_Handler
   .thumb_set SysTick_Handler,Default_Handler
*/
   .weak      WWDG_IRQHandler
   .thumb_set WWDG_IRQHandler,Default_Handler

   .weak      PVD_IRQHandler
   .thumb_set PVD_IRQHandler,Default_Handler
/*
   .weak      RTC_IRQHandler
   .thumb_set RTC_IRQHandler,Default_Handler
*/

Ram Size optimization

The RAM area is more difficult to optimize because compilation can’t be a big help for you: you need to optimize your code. But for doing a code optimization you need to know where to look at. The Map file have the information but it’s a bit hard to read.

Eclipse has a helper for this: on the bottom right tabs, you can take a look at the Build Analyzer tab.

Here we see the different zone of the RAM:

  • BSS – the non initialized data, basically variables declared as uint8_t tab[128];
  • DATA – the initialized data, basically variables declared as uint8_t var1 = 0xA5; data segment impact the size of the RAM and the size of the flash as these initialization have to be stored in flash to be set in RAM.
  • User_heap_stack
    • heap is used for malloc/calloc memory allocation
    • stack is used for local variables, function tree and function parameters/returned.

The heap size and stack size are defined in file STM32Lxxxxx_FLASH.ld at the root of the Cube IDE project.

_Min_Heap_Size = 0x200 ;	/* required amount of heap  */
_Min_Stack_Size = 0x400 ;	/* required amount of stack */

Regarding you project you can change this setting. The heap can be drastically reduced if you have no malloc/free in your project.

Stack size details

The stack size depends on your code. The static Stack analyzer tab can help you for this but for this you need to compile with the stack-usage option enabled. You can generate the static stack usage files adding an option for gcc compiler (CFLAGS) -fstack-usage. This option is not compatible with -flto previously seen. So if you use both you will get empty stack usage files. The stack usage files are generated per c file and visible in the same folder as object files (.o).

To get the Static Stack Analyzer tab updated you need to get a success in project compilation. So when the available memory is not large enough to get it compiled, you can made a RAM size modification in the STM32L0xxxx_FLASH.ld to get it compiled and analyze the result.

Bss and Data segment details

One you have identified these zone you can look at the objects in the different zone and optimize your code to reduce the size.

Here you can see a library having a big memory reservation (3KB), basically a buffer. This lib would better have an API to request memory so the user would had the ability to use the heap for it as this need is only requested during a Sigfox transmission and could be released the rest of the time.

Regarding the data segment, the constants values can be moved from the data segment to the flash or eeprom. This is a way to save this precious space.

To be continued

Optimization is a never ending process and research. This post will be updated regularly.

This entry was posted in Programming and tagged , . Bookmark the permalink.

5 Responses to Code optimization with Stm32 Cube IDE

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.