-
Modern CPUs fetch memory in 64-byte cache lines and prefetch adjacent memory.
Problems with OOP
-
Not cache friendly. Each
a->brequires 3 memory accesses. One fora, other for thevtable, other forb. It would be much better ifa[i], so only one memory access is required.-
See the talk from Sergiy Migdalskiy where he goes further into this.
-
-
Thought exercise:
-
A
Creatureis anEntity, whereEntitydefines function pointers (.process_physics) for specialized behavior, so a creature could call for examplebat.process_physics; it's basically a method. -
Problems:
-
Intrinsic slower performance.
-
Looping over an array of entities while only affecting one "specialization" (looping over bats) would be bad, as:
-
An
Entitydoesn't store a "specialization" enum, so there's no way to know which entity belongs to that specialization. -
Even if
Entitywould store an enum (or similar) to indicate the "species variant", this is still not cache friendly at all, as the whole array with lots of entities that don't belong would have to be loaded into cache. Instead of getting the bats right away, it would be necessary to loop over all entities and branch out using a switch statement for the "species variant"; this sounds terrible for cache.
-
-
-
Performance: Fixed Array (Small Array) vs Dynamic Array
Discussion
-
Considering the struct
IK_Chain:IK_Chain :: struct { joints: [dynamic]^Joint, bone_lengths: [dynamic]f32, target: eng.Transform_Node, is_target_moving: bool, placement: IK_Placement, } -
A question about performance: I'm storing
bone_lengths: [dynamic]f32as a cache inside aIK_Chain. I opted for a[dynamic]array as I don't know for sure what the length of a chain will be. Tho, now knowing about the existence of aSmall_Array, I question whether I should make this abone_length: sarray.Small_Array(SOME_REASONABLE_NUMBER, f32), for cache locality. Seems like a trade-off between memory and speed, as by using a Small_Array I will overestimate the size of this array, to be sure to fit into all major cases of aIK_ChainI would build. For context, now aIK_Chainonly has 4~5 Joints, but it could have 20+ or more, for some creatures. I update theIK_Chainevery frame, for visuals. So, using a small array with ~20 cap would be a nice trade off, instead of using a dynamic array? Oh, probably relevant, but using this for bone_lengths would also imply that I would also use this for the joints:[dynamic]^Jointin theIK_Chain; consider that the ^Joint are on the stack. -
TLDR :
-
SmallArray wins because the entire struct + bone data often fits in 1-2 cache lines
-
Location and continuity
-
Dynamic Array :
-
A
[dynamic]f32stores its metadata (pointer, length, capacity) in the struct, but the actual data is heap-allocated. -
When you do
make([dynamic]f32, 0, capacity), Odin allocates a contiguous block of memory on the heap. All elements are stored sequentially in this block: -
[elem0][elem1][elem2]... -
As long as you don't exceed this, you're going to have the values in a fixed contiguous region.
-
Indirection / Pointer chasing.
-
-
Small_Array :
-
Is embedded directly in the struct (whether the struct is on stack/heap depends on context).
-
Elements are also contiguous, but embedded within the parent struct.
-
-
Example :
-
Ex1: Array of structs :
chains: [100]IK_Chain-
Dynamic Array :
-
Only struct metadata is contiguous. Actual data is fragmented:
[Chain0] → (heap_ptr0 → bones0) → (heap_ptr1 → joints0) [Chain1] → (heap_ptr2 → bones1) → (heap_ptr3 → joints1)
-
-
Small_Array :
-
All data (struct fields + bone lengths + joints) is in one contiguous memory block.
[Chain0][bones0][joints0][Chain1][bones1][joints1]...
-
-
-
Ex2: Single struct :
-
Dynamic Array :
-
Requires at least two separate memory loads:
-
Load
IK_Chainstruct (metadata) -
Load bone data from heap pointer
-
-
-
Small_Array :
-
All data loaded in 1-2 cache lines.
-
-
-
-
They are both backing arrays fixed at point in memory and should benefit from caching.
-
"What about the
joints?" :-
The
[dynamic]^Jointcan be problematic due to double indirection (pointer to array + pointer to Joint).-
This is worse than just dynamic arrays of values.
-
-
Dynamic Array reallocation
-
Reallocations are rare if capacity is preset, but when they occur:
-
Invalidates all caches for bone data.
-
Costs CPU cycles for allocation/memcpy.
-
-
A Small_Array avoids this entirely.
Deciding between memory efficiency and performance
-
Considerations :
-
If you only need AI data in one system and skeletons in another → Cache locality benefits diminish.
-
If you allocate for 50 creatures but typically have 10-15 → 70-80% memory wasted.
-
Bad if creature size > cache line (typically 64-128B).
-
-
DO use Small_Array for :
-
Core metadata (position, health).
-
Hot components (AI state, IK chains).
-
Fixed-size subsystems.
-
-
AVOID Small_Array for :
-
The entire Creature struct.
-
Large/variable data (animations).
-
The creatures container itself.
-