Memory assignments optimizations

user2162550 Source

It is well known fact that compilers can mix assignments order in order to optimize execution, so-

a=b;
c=d;

Can be actually performed

c=d;
a=b;

However with the following code:

a=b;
x=a;
func(x);

By the time func(x) is called, x must contain b first, or else result can be unpredictable.

Now, what about the following code:

int *addr1 = some_addr;
int *addr2 = (int *)0xf00;

/* The following applies:
 *      some_other_addr >= some_addr
 */
for (addr1; addr1 < some_other_addr; addr1++) 
{
    *addr1 += 1;
}

*addr2 *= 8;

When addr2 points to an address in the range of the for loop, we need to know whether or not it is promised that *addr2 will be incremented before the multiplition in 8, as if not, and some optimization step placed *addr2 *= 8; before the for loop, the result of *addr2 will be different than if it would have been executed without an optimization.

Would the answer be different in case some_addr and some_other_addr are defined in the scope and in case they are passed as arguments? because at the first case it is quite easy for the compiler to know that *addr2 is inside the range of the for loop, while at the second case it's not that obvious.

And also, if we look at it from the Assembly perspective, let's take for example a sample reset_handler code snippet of bss section initialization:

    ldr  r1, =__BSS_SIZE__
    cmp  r1, #0
    beq  FINISHED_LABEL
    ldr  r0, =__BSS_START__
    ldr  r2, =0
LOOP_LABEL:
    str  r2, [r0]
    add  r0, r4
    subs r1, r4
    bne  LOOP_LABEL

If the next instruction after this code (at FINISHED_LABEL) loads a value (ldr) from address in the bss range, is it promised that the content will be valid (0) at that time?

cgccassemblyoptimizationmemory-barriers

Answers

answered 3 months ago Peter Cordes #1

What the compiler has to do to get this right is called "alias analysis".

If the compiler can prove that addr2 isn't in the range that addr1 loops over, it can reorder it or keep *addr2 in a register throughout the loop.

This is a very useful optimization for a case like for(...; addr1++) { *addr1 += *addr2; } to avoid reloading addr2 every time, and one of the reasons the restrict keyword exists.

If inputs might overlap, compilers can (and do) emit code that checks for overlap and runs an optimized (e.g. auto-vectorized) loop if there's no overlap, or runs a safe loop if there is overlap.


If a compiler can't prove that a transformation will give the same final results as the C abstract machine, it can't do the transformation. (I say "final" because the order of stores to memory aren't part of the observable results, unless you use std::atomic. So compile-time transformations aren't allowed to break single-threaded code, very similar to what out-of-order CPUs do: provide the illusion of everything happening in program order for a single thread.)

The as-if rule only allows optimizations that work in all cases that don't lead to UB, including obscure stuff like unsigned size = 0xffffffff, which can often lead to compilers not being allowed to do optimizations you hoped for, unless you tweak your source.

UB is key for allowing some optimizations (like not redoing sign-extension of a signed array index inside a loop). See What Every C Programmer Should Know About Undefined Behavior #1/3.

comments powered by Disqus