I'm working on some templated container classes. My goal is to eliminate as much overhead as possible. Consider the following assembler generated:
53: a.allocate(10);
004010C1 mov esi,dword ptr [__imp__malloc (40823Ch)]
push 28h
004010C9 mov dword ptr [a+4 (407134h)],0Ah
004010D3 call esi
54: len = 10;
55: ba = (int*) malloc(sizeof(int)*len);
004010D5 push 28h
004010D7 mov dword ptr [a (407130h)],eax
004010DC mov dword ptr [len (40712Ch)],0Ah
004010E6 call esi
56:
57: a[3] = 3;
004010E8 mov ecx,dword ptr [a (407130h)]
004010EE mov dword ptr [ba (407128h)],eax
004010F3 mov eax,3
004010F8 mov dword ptr [ecx+0Ch],eax
58: ba[3] = 3;
004010FB mov edx,dword ptr [ba (407128h)]
00401101 add esp,8
00401104 mov dword ptr [edx+0Ch],eax
Now, when we move the instructions out of order and group them according to which variables they are operating on, and add some comments...:
004010C1 mov esi,dword ptr [__imp__malloc (40823Ch)] ;move addr of malloc into a register
53: a.allocate(10);
mov dword ptr [a+4 (407134h)],0Ah ;put value of 10 into a._size
push 28h ;push value of 40 onto the stack
call esi ;call malloc with arg of 40
mov dword ptr [a (407130h)],eax ;move ret value of malloc into a._begin
54: len = 10;
mov dword ptr [len (40712Ch)],0Ah ;put value of 10 into len
55: ba = (int*) malloc(sizeof(int)*len);
push 28h ;push value of 40 onto the stack
call esi ;call malloc with arg of 40
mov dword ptr [ba (407128h)],eax ;put ret value of malloc into ba
;;;
mov eax,3 ;move value of 3 into a register
57: a[3] = 3;
mov ecx,dword ptr [a (407130h)] ;move address of a._begin into register
mov dword ptr [ecx+0Ch],eax ;move value of eax (3) into a._begin+3
58: ba[3] = 3;
mov edx,dword ptr [ba (407128h)] ;move address of ba into register
mov dword ptr [edx+0Ch],eax ;move value of eax (3) into ba+3
add esp,8 ;increment the stack pointer by 8
...we see that the optimized assembler generated is identical for my container class as for manually allocating an array using malloc. Since this is all wrapped up into a templated class, you are less likely to make mistakes while calling malloc. The downside is that the compiler will occasionally be able to register optimize the len variable, but in practice, you will need to keep the variable around anyway, so this isn't much of an issue.
