Sunday, June 25, 2006

I'm working on some templated container classes. My goal is to eliminate as much overhead as possible. Consider the following assembler generated:

53: a.allocate(10);
004010C1 mov esi,dword ptr [__imp__malloc (40823Ch)]
push 28h
004010C9 mov dword ptr [a+4 (407134h)],0Ah
004010D3 call esi
54: len = 10;
55: ba = (int*) malloc(sizeof(int)*len);
004010D5 push 28h
004010D7 mov dword ptr [a (407130h)],eax
004010DC mov dword ptr [len (40712Ch)],0Ah
004010E6 call esi
56:
57: a[3] = 3;
004010E8 mov ecx,dword ptr [a (407130h)]
004010EE mov dword ptr [ba (407128h)],eax
004010F3 mov eax,3
004010F8 mov dword ptr [ecx+0Ch],eax
58: ba[3] = 3;
004010FB mov edx,dword ptr [ba (407128h)]
00401101 add esp,8
00401104 mov dword ptr [edx+0Ch],eax


Now, when we move the instructions out of order and group them according to which variables they are operating on, and add some comments...:

004010C1 mov esi,dword ptr [__imp__malloc (40823Ch)] ;move addr of malloc into a register

53: a.allocate(10);
mov dword ptr [a+4 (407134h)],0Ah ;put value of 10 into a._size

push 28h ;push value of 40 onto the stack
call esi ;call malloc with arg of 40

mov dword ptr [a (407130h)],eax ;move ret value of malloc into a._begin


54: len = 10;

mov dword ptr [len (40712Ch)],0Ah ;put value of 10 into len

55: ba = (int*) malloc(sizeof(int)*len);
push 28h
;push value of 40 onto the stack
call esi
;call malloc with arg of 40
mov dword ptr [ba (407128h)],eax ;put ret value of malloc into ba


;;;

mov eax,3 ;move value of 3 into a register

57: a[3] = 3;
mov ecx,dword ptr [a (407130h)] ;move address of a._begin into register
mov dword ptr [ecx+0Ch],eax ;move value of eax (3) into a._begin+3

58: ba[3] = 3;
mov edx,dword ptr [ba (407128h)] ;move address of ba into register
mov dword ptr [edx+0Ch],eax
;move value of eax (3) into ba+3


add esp,8 ;increment the stack pointer by 8



...we see that the optimized assembler generated is identical for my container class as for manually allocating an array using malloc. Since this is all wrapped up into a templated class, you are less likely to make mistakes while calling malloc. The downside is that the compiler will occasionally be able to register optimize the len variable, but in practice, you will need to keep the variable around anyway, so this isn't much of an issue.

Saturday, June 24, 2006

I've recieved several emails about MacSolitaireX, including one bug submission and a few deck backing image submissions (all from the same individual). I'm working on the next version... slowly. I've fixed the bug. It turned out that I didn't update the OpenGL context after a game had been won. What happened was that the user would win a game in vegas scoring, and the old game image would still be in the framebuffer, even though a new hand had been dealt. This would cause the user to deal again, which would cause the view to redraw, but also to subtract an additional $52 from the score.