Caio Raphael

A normal function.
Is logic or a process that operates on one or more components of entities to modify their states or perform some operation.

With Sparse Sets

Intersection into an array of Entity

"give me all entities that have both Name and Person component".
Pros
- Contiguous index buffer → easy to split into chunks for parallel workers.
- Great for data-local processing (pull components by index into tight loops).
- Enables SIMD easily because you can iterate contiguous component arrays by index with known bounds.
- Allows simple scheduling (divide vec into ranges, spawn jobs, no shared iterator state).
- Good throughput when query result is large or will be iterated multiple times.
Cons
- Upfront cost to compute intersection (time) and memory for the vector.
- Allocation overhead unless you reuse buffers or reserve capacity.
- If the query result is small or you only need first few matches, cost may be wasted.
Giving windows for optimization :
- Avoiding re-queries :
  - The entities rarely change, and so the query is mostly the the same every frame, but all the query is repeated every frame.
  - Solution :
    - Store the query upfront, re-querying on modification to the stored Components for that query, via a dirty flag.
- Parallelization :
  - Threads:
    - Doesn't sound good, as the threads wouldn't be able to sleep, causing high cpu consumption.
    - Without an iterator:
      - Use a parallel_for , dividing the work into batches, sent to different threads; classic usage.
    - With an iterator:
      - Trick. The next operator is only gathered after the first one is finished, as it's a sequential operation.
      - A pseudo-parallel_for could be used, as a way to divide the work between threads; a parallel_iterator , so to speak.
        
        The problem with this solution is that differently from the parallel_for , who divides the work in a homogenous way, the parallel_iterator would divide not the result , but the source of the iterator (the .data array), which means that some threads might have more work to do if the results of the iterator are heterogenous across the source.
  - SIMD operations.
    - The code idea seems to require that all elements be grouped together in a SIMD Vector ( #simd[N]T ) and so intrinsics operations are performed.
      - Seems quite inconvenient.
    - I'm not that worried about SIMD, as this seems a bit advanced and maybe too specific, meanwhile there's always the option to send this to the GPU, which is probably better than SIMD anyway.
    - Without an iterator:
      - .
    - With an iterator:
      - .
  - GPU Compute:
    - The data would have to previously exist in GPU memory, or be copied to it before the compute.
    - If the data were to be sent to the GPU, ideally this data should be pretty small to avoid being bandwidth bound.
    - One way would be to have the whole ECS on GPU.
    - Without an iterator:
      - .
    - With an iterator:
      - .
  - Solution :
    - Focus on a full packed loop, instead of getting the next .data via an iterator.
Other optimization ideas :
- Iterate the smallest component’s dense array and use sparse lookups for other components.

Iterator

"iterate over every Name component for entities that also have a Person component".
Pros
- Low memory overhead, no allocation when you only consume a subset or stream.
- Lower latency to obtain first match(s).
- Useful for low-cardinality queries or queries that short-circuit early.
Cons
- Harder to parallelize unless the iterator supports splitting.
- Potentially worse cache behavior because membership checks may bounce between sparse maps.
- Harder to vectorize because you don’t have fixed contiguous ranges to operate on.
- Iterator state may create contention if shared between threads.
Makes any type of parallelization harder (SIMD, threads or GPU Compute).
I will consider this only if its API/Safety benefits are so large that it out-weights the lack of optimization options.
Ideally, I'd prefer a different option because of that.

view: ecs.View
ecs.view_init(&view, &my_ecs, {&ais, &positions})

it: ecs.Iterator
ecs.iterator_init(&it, &view)
for ecs.iterator_next(&it) {
    eid = ecs.get_entity(&it)
    pos1 = ecs.get_component(&positions, &it)
    ai = ecs.get_component(&ais, &it)
    fmt.println("Iterating over view: ", eid, pos1, ai)
}

I don't love the pattern. The programmer still have to type the name of the component twice: once on view_init and again when inside the loop.

In Rust:
```
for (pos, vel) in (&positions, &velocities).join() {
    pos.x += vel.x
}
```
- .join() here is just a method from some class. There's nothing special here; I believe it's somewhat equivalent to the Odin version.

Ideas

Small overview for solving the problems :
- The existence of _query_result: [COMPONENT_MAX_ENTITIES]Entity_ID is quite ugly.
  - Solution :
    - Store the data inside a struct, instead of storing in global scope.
- Fetching component.sparse[entity] twice.
  - This can have its ups and downs:
    - Having to fetch again the .sparse array means that the operation is usually safer, as there's window for error handling in case of the entity not being found inside that array.
    - Tho, having to handle this possibility of error is annoying from the API standpoint, also if you consider that if the query is still valid, the entity is still be valid, so re-fetching the data through the .sparse array just introduces an unnecessary error handling.
    - So, when is it safe to avoid double fetching and storing the dense_idx right away? Only if you can ensure that the query is still valid. For that, it's only safe to store the dense_idx if:
      1. The query is constructed every frame.
      2. The query is constructed upfront, but with usage of dirty flags to invalidate the query and re-query when necessary.
  - Solution :
    - Store the []dense_idx , instead of storing the Entity_ID .
      - Basically an "Archetype" as the result of a query, with the option to only change only when any of its components are dirty.
- The programmer has to type the name of the component twice.
  - Solution :
    - Store a pointer to the component inside a struct, so it can be referenced later.

Idea 10: Return many []^T arrays :

@(require_results)
intersect2 :: proc(
    c0: ^Component($T0),
    c1: ^Component($T1),
    ) -> (
    sa0: [COMPONENT_MAX_ENTITIES]^T0,
    sa1: [COMPONENT_MAX_ENTITIES]^T1,
    idx: int,
    ) {
    
    for entity, d0 in c0.dense {
        d1 := c1.sparse[entity].? or_continue

        sa0[idx] = &c0.data[d0]
        sa1[idx] = &c1.data[d1]
        idx += 1
    }

    return
}

@(require_results)
intersect3 :: proc(
    c0: ^Component($T0),
    c1: ^Component($T1),
    c2: ^Component($T2),
    ) -> (
    sa0: [COMPONENT_MAX_ENTITIES]^T0,
    sa1: [COMPONENT_MAX_ENTITIES]^T1,
    sa2: [COMPONENT_MAX_ENTITIES]^T2,
    idx: int,
    ) {
    
    for entity, d0 in c0.dense {
        d1 := c1.sparse[entity].? or_continue
        d2 := c2.sparse[entity].? or_continue

        sa0[idx] = &c0.data[d0]
        sa1[idx] = &c1.data[d1]
        sa2[idx] = &c2.data[d2]
        idx += 1
    }

    return
}

intersect :: proc{
    intersect2,
    intersect3
}

intersect :: proc{
    intersect2,
    intersect3
}

{
    health_bars, healths, size := ecs.intersect(&_health_bars, &_healths)
    for i in 0..<size {
        draw_health_bar(cmd, health_bars[i], healths[i]^)
    }
}

{
    dyn_bodies, mov_dirs, velocities_max, velocities_max_base, healths, size := ecs.intersect(&_dynamic_bodies, &_movement_directions, &_velocities_max, &_velocities_max_base, &_healths)
    for i in 0..<size {
        movement_direction_via_input_and_network_and_debug(dyn_bodies[i]^, mov_dirs[i], velocities_max[i], velocities_max_base[i]^, healths[i], 
            &_game.character_net, _game.fixed_cycle_physics.tick_count)
    }
}

Explicit and direct.
Problems :
- Wastes memory for having to store some extra arrays; the memory is not that big of a deal, as it's on the stack and the size is uintptr * COMPONENT_MAX_ENTITIES .
- I'm not sure how I feel about having an array of pointers. How does the CPU cache compare to just having an array of indices? Also, the safety can be a little bit more annoying if I intent to store the query result with caching and dirty flags.
- It's a lot of things to type from the API user side...

Idea 4: Iterator with or without static idx :

@(private="file") _iter_idx: u32

intersect2_next:: proc(
    c0: ^Component($T0),
    c1: ^Component($T1),
    ) -> (
    t0:    ^T0,
    t1:    ^T1,
    found: bool,
    ) {
    
    /* 
    The last element in the `.dense` slice was a match, now the last iteration should be false.
    This is a valid behavior.
    */
    if _iter_idx >= u32(len(c0.dense) - 1) {
        _iter_idx = 0
        return
    }
    
    for entity, d0 in c0.dense[_iter_idx:] {
        _iter_idx += 1

        d1 := c1.sparse[entity].? or_continue

        t0    = &c0.data[d0]
        t1    = &c1.data[d1]
        found = true
        return
    }

    /*
    Looped over the whole `.dense` slice and didn't find a match; reset the _iter_idx. 
    This is a valid behavior.
    */
    _iter_idx = 0
    return
}

for dyn_bodies, mov_dirs, velocities_max, velocities_max_base, healths in ecs.intersect_next(&_dynamic_bodies, &_movement_directions, &_velocities_max, &_velocities_max_base, &_healths) {
    movement_direction_via_input_and_network_and_debug(dyn_bodies^, mov_dirs, velocities_max, velocities_max_base^, healths, 
        &_game.character_net, _game.fixed_cycle_physics.tick_count)
}

See the Odin note for an explanation on this pattern.
Problems :
- Parallelization?
  - Just use an option different from iterators; this is not a good option for parallelization.

Idea 9: Query and Execute :
```
query_and_execute :: proc(
    $FN: $F, 
    c0: ^Component($T0),
    c1: ^Component($T1),
    c2: ^Component($T2) = nil,
    c3: ^Component($T3) = nil,
    c4: ^Component($T4) = nil,
    c5: ^Component($T5) = nil,
    c6: ^Component($T6) = nil,
    ) 
    where
        intrinsics.type_proc_parameter_count(F) >= 2
    {
```
- Doesn't allow caching with dirty flags, but that's ok; Not a big deal.
- Problems :
  - -> Procedures like draw_health_bar :: proc(cmd: rd.Command_Buffer, health_bar: ^Health_Bar, health: Health) { are a problem, as they receive the result of a component, BUT also receive random data (in this case: cmd ). This creates a much more complicated API, which I didn't went too much into.
  - The components should be constant for better comp-time checks, but they can't; this is not a big deal, but makes the comp-time error be more confusing.
    - "Constant parameters cannot have a default value".
    - "Cannot determine polymorphic type from parameter: 'untyped nil' to '^Component($T)'".
    - "'c3' of type '^Component($T)' has no field 'sparse'".

Idea 6: A Query stores many []dense_idx arrays :

Query3_Idea6 :: struct($A, $B, $C: typeid) {
    ca:       ^Component(A),
    cb:       ^Component(B),
    cc:       ^Component(C),

    dense_a:  sa.Small_Array(COMPONENT_MAX_ENTITIES, u32),
    dense_b:  sa.Small_Array(COMPONENT_MAX_ENTITIES, u32),
    dense_c:  sa.Small_Array(COMPONENT_MAX_ENTITIES, u32),
    len:      int,

    iter_idx: u32,
}

@(require_results)
query3_via_intersect_id6 :: proc(ca: ^Component($A), cb: ^Component($B), cc: ^Component($C)) -> (query: Query3_Idea6(A, B, C)) {
    query.ca = ca
    query.cb = cb
    query.cc = cc

    for entity, dense_a in ca.dense {
        dense_b, ok_cb := cb.sparse[entity].?
        if !ok_cb do continue
        dense_c, ok_cc := cc.sparse[entity].?
        if !ok_cc do continue

        sa.append(&query.dense_a, u32(dense_a))
        sa.append(&query.dense_b, dense_b)
        sa.append(&query.dense_c, dense_c)
    }
    
    query.len = sa.len(query.dense_a)

    return
}

query := query3_via_intersect_id6(&_dynamic_bodies, &_velocities_max, &_movement_directions)

for i in 0..<query.len {
    dynamic_body_velocity_with_movement_direction(
        &query.ca.data[query.dense_a.data[i]], // dense_a.data here is so I get the actual thing inside the Small_Array.
        query.cb.data[query.dense_b.data[i]],
        query.cc.data[query.dense_c.data[i]],
    )
}

This can be used as an iterator, for example; this was for an old code, but the principle is the same:

query3_next:: proc(q: ^Query3($A, $B, $C)) -> (A, B, C, bool) {
    if q.idx == len(q.queried_data_indices[0]) - 1 do return
    q.idx += 1
    
    return
        &q.ca.dense[q.queried_data_indices[0][q.idx]],
        &q.cb.dense[q.queried_data_indices[1][q.idx]],
        &q.cc.dense[q.queried_data_indices[2][q.idx]], 
        true
}

for dynamic_body, vel_max, mov_direction in query3_next(&query) {
    dynamic_body_velocity_with_movement_direction(dynamic_body, vel_max, mov_direction)
}

Problems :
- More memory waste than Idea 1, as every Query has many []dense_idx .

~~Idea 8: A Query stores a []Entity_ID array~~ :

Pattern query3_intersect :: proc(ca: Component($A), cb: Component($B), cc: Component($C)) -> Query(A, B, C) {

Query3_Idea8 :: struct($A, $B, $C: typeid) {
    ca:       ^Component(A),
    cb:       ^Component(B),
    cc:       ^Component(C),
    entities: sa.Small_Array(COMPONENT_MAX_ENTITIES, Entity_ID),

    iter_idx: u32,
}

@(require_results)
query3_via_intersect_id8 :: proc(ca: ^Component($A), cb: ^Component($B), cc: ^Component($C)) -> (query: Query3_Idea8(A, B, C)) {
    query.ca = ca
    query.cb = cb
    query.cc = cc

    for entity in ca.dense {
        if cb.sparse[entity] == nil || cc.sparse[entity] == nil do continue
        
        sa.append(&query.entities, entity)
    }
    return
}

query := query3_via_intersect_id8(_dynamic_bodies, _velocities_max, _movement_directions)

for entity in sa.slice(&query.entities) {
    dynamic_body_velocity_with_movement_direction(
        component_get(query.ca, entity),
        component_get(query.cb, entity)^,
        component_get(query.cc, entity)^,
    )
}

Same extra memory from Idea 1.
The []Entity_ID is a stack allocated array.
Advantages :
- The query can be made upfront, caching the results with dirty flags.
Problems :
- Fetches component.sparse[entity] twice.
- Confusing API.

~~Idea 1: Array of Entity with a global _query_result~~ :

Pattern intersect2 :: proc(ca: Component($T), cb: Component($W)) -> []Entity_ID {

_query_result: [COMPONENT_MAX_ENTITIES]Entity_ID

@(require_results)
intersect3 :: proc(ca: Component($A), cb: Component($B), cc: Component($C)) -> []Entity_ID {
    idx := 0
    for entity in ca.dense {
        if cb.sparse[entity] == nil || cc.sparse[entity] == nil do continue
        
        _query_result[idx] = entity
        idx += 1
    }
    return _query_result[:idx]
}

for entity in intersect3(_dynamic_bodies, _velocities_max, _movement_directions) {
    dynamic_body_velocity_with_movement_direction(
        component_get(&_dynamic_bodies,      entity),
        component_get(&_velocities_max,      entity)^,
        component_get(&_movement_directions, entity)^,
    )
}

Advantages :
- No extra struct. Every thing is just a call to intersect.
  - This by it self can be an issue, as the query doesn't support caching upfront.
Problems :
- The existence of _query_result: [COMPONENT_MAX_ENTITIES]Entity_ID is quite ugly.
- Fetches component.sparse[entity] twice.
- The programmer has to type the name of the component twice.

~~Idea 2: Many []dense_idx arrays, using stack arrays from the global scope~~ :
- Pattern intersect2 :: proc(ca: Component($T), cb: Component($W)) -> (r1, r2: []u32) {
- Sort-of an extension from Idea 1.
```
dyn_body_indices, vel_max_indices, mov_indices := intersect3(_dynamic_bodies, _velocities_max, _movement_directions)
for i in 0..<len(dyn_body_indices) {
    dynamic_body_velocity_with_movement_direction(
        _dynamic_bodies.dense[dyn_body_indices[i]],
        _velocities_max.dense[vel_max_indices[i]]^,
        _movement_directions.dense[mov_indices[i]]^,
    )
}
```
- Avoids fetching component.sparse[entity] twice.
- Problems :
  - ~There has to be lots of different arrays to store each one of dyn_body_indices, vel_max_indices, mov_indices .
  - The programmer has to type the name of the component twice.

~~Idea 8: Each Component(T) has a array of query_result array of ^T~~ :

Pattern intersect2 :: proc(ca: Component($T), cb: Component($W)) -> (count: int) {
The query intersect2 function returns a length to be used inside the for loop.

component_get_query :: #force_inline proc(component: Component($T), idx: int) -> ^T {
    return component.query_result[idx]
}
for i in 0..<intersect3(_dynamic_bodies, _velocities_max, _movement_directions) {
    dynamic_body_velocity_with_movement_direction(
        component_get_query(_dynamic_bodies,      i),
        component_get_query(_velocities_max,      i)^,
        component_get_query(_movement_directions, i)^,
    )
}

Problems :
- When comparing to Idea 4, it comes down to what's lighter, as the ideas are basically the same.
  - Idea 4 seems better, as a u32 is lighter than a uintptr (size of int / uint ).

~~Idea 3: Many arrays of T~~ :
- Pattern intersect2 :: proc(ca: Component($T), cb: Component($W)) -> (r1: []T, r2: []W) { .
- Problems :
  - The array being []T doesn't make sense, as I want a pointer to the element, but doing so would give me a pointer to a copy of the element.
  - It's necessary that the array be defined as []^T (same as Idea 6) or []u32 (indices to .data , which is exactly the same as Idea 4).
  - The generic nature of T and W makes this quite difficult.
    1. Make a Component be generic enough so it doesn't need T or W .
    2. ~~The returns are allocated~~.
      - Seems very bad for performance.
~~Idea 5: Each Component(T) has a array of query_result array of T~~ :
- Problems :
  - The array being []T doesn't make sense, as I want a pointer to the element, but doing so would give me a pointer to a copy of the element.
  - It's necessary that the array be defined as []^T (same as Idea 6) or []u32 (indices to .data , which is exactly the same as Idea 4).

With Archetypes

void movement_system(float dt) {
    for (int i = 0; i < a_pv.count; i++) {
        a_pv.pos[i].x += a_pv.vel[i].vx * dt;
        a_pv.pos[i].y += a_pv.vel[i].vy * dt;
    }

    for (int i = 0; i < a_pvh.count; i++) {
        a_pvh.pos[i].x += a_pvh.vel[i].vx * dt;
        a_pvh.pos[i].y += a_pvh.vel[i].vy * dt;
    }
}

No has_component checks.