-
A normal function.
-
Is logic or a process that operates on one or more components of entities to modify their states or perform some operation.
With Sparse Sets
Intersection into an array of Entity
-
"give me all entities that have both Name and Person component".
-
Pros
-
Contiguous index buffer → easy to split into chunks for parallel workers.
-
Great for data-local processing (pull components by index into tight loops).
-
Enables SIMD easily because you can iterate contiguous component arrays by index with known bounds.
-
Allows simple scheduling (divide vec into ranges, spawn jobs, no shared iterator state).
-
Good throughput when query result is large or will be iterated multiple times.
-
-
Cons
-
Upfront cost to compute intersection (time) and memory for the vector.
-
Allocation overhead unless you reuse buffers or reserve capacity.
-
If the query result is small or you only need first few matches, cost may be wasted.
-
-
Giving windows for optimization :
-
Avoiding re-queries :
-
The entities rarely change, and so the query is mostly the the same every frame, but all the query is repeated every frame.
-
Solution :
-
Store the query upfront, re-querying on modification to the stored Components for that query, via a dirty flag.
-
-
-
Parallelization :
-
Threads:
-
Doesn't sound good, as the threads wouldn't be able to sleep, causing high cpu consumption.
-
Without an iterator:
-
Use a
parallel_for, dividing the work into batches, sent to different threads; classic usage.
-
-
With an iterator:
-
Trick. The next operator is only gathered after the first one is finished, as it's a sequential operation.
-
A
pseudo-parallel_forcould be used, as a way to divide the work between threads; aparallel_iterator, so to speak.-
The problem with this solution is that differently from the
parallel_for, who divides the work in a homogenous way, theparallel_iteratorwould divide not the result , but the source of the iterator (the.dataarray), which means that some threads might have more work to do if the results of the iterator are heterogenous across the source.
-
-
-
-
SIMD operations.
-
The code idea seems to require that all elements be grouped together in a SIMD Vector (
#simd[N]T) and so intrinsics operations are performed.-
Seems quite inconvenient.
-
-
I'm not that worried about SIMD, as this seems a bit advanced and maybe too specific, meanwhile there's always the option to send this to the GPU, which is probably better than SIMD anyway.
-
Without an iterator:
-
.
-
-
With an iterator:
-
.
-
-
-
GPU Compute:
-
The data would have to previously exist in GPU memory, or be copied to it before the compute.
-
If the data were to be sent to the GPU, ideally this data should be pretty small to avoid being bandwidth bound.
-
One way would be to have the whole ECS on GPU.
-
Without an iterator:
-
.
-
-
With an iterator:
-
.
-
-
-
Solution :
-
Focus on a full packed loop, instead of getting the next
.datavia an iterator.
-
-
-
-
Other optimization ideas :
-
Iterate the smallest component’s dense array and use sparse lookups for other components.
-
Iterator
-
"iterate over every Name component for entities that also have a Person component".
-
Pros
-
Low memory overhead, no allocation when you only consume a subset or stream.
-
Lower latency to obtain first match(s).
-
Useful for low-cardinality queries or queries that short-circuit early.
-
-
Cons
-
Harder to parallelize unless the iterator supports splitting.
-
Potentially worse cache behavior because membership checks may bounce between sparse maps.
-
Harder to vectorize because you don’t have fixed contiguous ranges to operate on.
-
Iterator state may create contention if shared between threads.
-
-
Makes any type of parallelization harder (SIMD, threads or GPU Compute).
-
I will consider this only if its API/Safety benefits are so large that it out-weights the lack of optimization options.
-
Ideally, I'd prefer a different option because of that.
-
ode_ecs.view: ecs.View ecs.view_init(&view, &my_ecs, {&ais, &positions}) it: ecs.Iterator ecs.iterator_init(&it, &view) for ecs.iterator_next(&it) { eid = ecs.get_entity(&it) pos1 = ecs.get_component(&positions, &it) ai = ecs.get_component(&ais, &it) fmt.println("Iterating over view: ", eid, pos1, ai) }-
I don't love the pattern. The programmer still have to type the name of the component twice: once on
view_initand again when inside the loop.
-
-
In Rust:
for (pos, vel) in (&positions, &velocities).join() { pos.x += vel.x }-
.join()here is just a method from some class. There's nothing special here; I believe it's somewhat equivalent to the Odin version.
-
Ideas
-
Small overview for solving the problems :
-
The existence of
_query_result: [COMPONENT_MAX_ENTITIES]Entity_IDis quite ugly.-
Solution :
-
Store the data inside a struct, instead of storing in global scope.
-
-
-
Fetching
component.sparse[entity]twice.-
This can have its ups and downs:
-
Having to fetch again the
.sparsearray means that the operation is usually safer, as there's window for error handling in case of the entity not being found inside that array. -
Tho, having to handle this possibility of error is annoying from the API standpoint, also if you consider that if the query is still valid, the entity is still be valid, so re-fetching the data through the
.sparsearray just introduces an unnecessary error handling. -
So, when is it safe to avoid double fetching and storing the
dense_idxright away? Only if you can ensure that the query is still valid. For that, it's only safe to store thedense_idxif:-
The query is constructed every frame.
-
The query is constructed upfront, but with usage of dirty flags to invalidate the query and re-query when necessary.
-
-
-
Solution :
-
Store the
[]dense_idx, instead of storing theEntity_ID.-
Basically an "Archetype" as the result of a query, with the option to only change only when any of its components are dirty.
-
-
-
-
The programmer has to type the name of the component twice.
-
Solution :
-
Store a pointer to the component inside a struct, so it can be referenced later.
-
-
-
-
Idea 10: Return many
[]^Tarrays :@(require_results) intersect2 :: proc( c0: ^Component($T0), c1: ^Component($T1), ) -> ( sa0: [COMPONENT_MAX_ENTITIES]^T0, sa1: [COMPONENT_MAX_ENTITIES]^T1, idx: int, ) { for entity, d0 in c0.dense { d1 := c1.sparse[entity].? or_continue sa0[idx] = &c0.data[d0] sa1[idx] = &c1.data[d1] idx += 1 } return } @(require_results) intersect3 :: proc( c0: ^Component($T0), c1: ^Component($T1), c2: ^Component($T2), ) -> ( sa0: [COMPONENT_MAX_ENTITIES]^T0, sa1: [COMPONENT_MAX_ENTITIES]^T1, sa2: [COMPONENT_MAX_ENTITIES]^T2, idx: int, ) { for entity, d0 in c0.dense { d1 := c1.sparse[entity].? or_continue d2 := c2.sparse[entity].? or_continue sa0[idx] = &c0.data[d0] sa1[idx] = &c1.data[d1] sa2[idx] = &c2.data[d2] idx += 1 } return } intersect :: proc{ intersect2, intersect3 } intersect :: proc{ intersect2, intersect3 } { health_bars, healths, size := ecs.intersect(&_health_bars, &_healths) for i in 0..<size { draw_health_bar(cmd, health_bars[i], healths[i]^) } } { dyn_bodies, mov_dirs, velocities_max, velocities_max_base, healths, size := ecs.intersect(&_dynamic_bodies, &_movement_directions, &_velocities_max, &_velocities_max_base, &_healths) for i in 0..<size { movement_direction_via_input_and_network_and_debug(dyn_bodies[i]^, mov_dirs[i], velocities_max[i], velocities_max_base[i]^, healths[i], &_game.character_net, _game.fixed_cycle_physics.tick_count) } }-
Explicit and direct.
-
Problems :
-
Wastes memory for having to store some extra arrays; the memory is not that big of a deal, as it's on the stack and the size is
uintptr * COMPONENT_MAX_ENTITIES. -
I'm not sure how I feel about having an array of pointers. How does the CPU cache compare to just having an array of indices? Also, the safety can be a little bit more annoying if I intent to store the query result with caching and dirty flags.
-
It's a lot of things to type from the API user side...
-
-
-
Idea 4: Iterator with or without static idx :
@(private="file") _iter_idx: u32 intersect2_next:: proc( c0: ^Component($T0), c1: ^Component($T1), ) -> ( t0: ^T0, t1: ^T1, found: bool, ) { /* The last element in the `.dense` slice was a match, now the last iteration should be false. This is a valid behavior. */ if _iter_idx >= u32(len(c0.dense) - 1) { _iter_idx = 0 return } for entity, d0 in c0.dense[_iter_idx:] { _iter_idx += 1 d1 := c1.sparse[entity].? or_continue t0 = &c0.data[d0] t1 = &c1.data[d1] found = true return } /* Looped over the whole `.dense` slice and didn't find a match; reset the _iter_idx. This is a valid behavior. */ _iter_idx = 0 return } for dyn_bodies, mov_dirs, velocities_max, velocities_max_base, healths in ecs.intersect_next(&_dynamic_bodies, &_movement_directions, &_velocities_max, &_velocities_max_base, &_healths) { movement_direction_via_input_and_network_and_debug(dyn_bodies^, mov_dirs, velocities_max, velocities_max_base^, healths, &_game.character_net, _game.fixed_cycle_physics.tick_count) }-
See the Odin note for an explanation on this pattern.
-
Problems :
-
Parallelization?
-
Just use an option different from iterators; this is not a good option for parallelization.
-
-
-
-
Idea 9: Query and Execute :
query_and_execute :: proc( $FN: $F, c0: ^Component($T0), c1: ^Component($T1), c2: ^Component($T2) = nil, c3: ^Component($T3) = nil, c4: ^Component($T4) = nil, c5: ^Component($T5) = nil, c6: ^Component($T6) = nil, ) where intrinsics.type_proc_parameter_count(F) >= 2 {-
Doesn't allow caching with dirty flags, but that's ok; Not a big deal.
-
Problems :
-
-> Procedures like
draw_health_bar :: proc(cmd: rd.Command_Buffer, health_bar: ^Health_Bar, health: Health) {are a problem, as they receive the result of a component, BUT also receive random data (in this case:cmd). This creates a much more complicated API, which I didn't went too much into. -
The components should be constant for better comp-time checks, but they can't; this is not a big deal, but makes the comp-time error be more confusing.
-
"Constant parameters cannot have a default value".
-
"Cannot determine polymorphic type from parameter: 'untyped nil' to '^Component($T)'".
-
"'c3' of type '^Component($T)' has no field 'sparse'".
-
-
-
-
Idea 6: A Query stores many
[]dense_idxarrays :Query3_Idea6 :: struct($A, $B, $C: typeid) { ca: ^Component(A), cb: ^Component(B), cc: ^Component(C), dense_a: sa.Small_Array(COMPONENT_MAX_ENTITIES, u32), dense_b: sa.Small_Array(COMPONENT_MAX_ENTITIES, u32), dense_c: sa.Small_Array(COMPONENT_MAX_ENTITIES, u32), len: int, iter_idx: u32, } @(require_results) query3_via_intersect_id6 :: proc(ca: ^Component($A), cb: ^Component($B), cc: ^Component($C)) -> (query: Query3_Idea6(A, B, C)) { query.ca = ca query.cb = cb query.cc = cc for entity, dense_a in ca.dense { dense_b, ok_cb := cb.sparse[entity].? if !ok_cb do continue dense_c, ok_cc := cc.sparse[entity].? if !ok_cc do continue sa.append(&query.dense_a, u32(dense_a)) sa.append(&query.dense_b, dense_b) sa.append(&query.dense_c, dense_c) } query.len = sa.len(query.dense_a) return } query := query3_via_intersect_id6(&_dynamic_bodies, &_velocities_max, &_movement_directions) for i in 0..<query.len { dynamic_body_velocity_with_movement_direction( &query.ca.data[query.dense_a.data[i]], // dense_a.data here is so I get the actual thing inside the Small_Array. query.cb.data[query.dense_b.data[i]], query.cc.data[query.dense_c.data[i]], ) }-
This can be used as an iterator, for example; this was for an old code, but the principle is the same:
query3_next:: proc(q: ^Query3($A, $B, $C)) -> (A, B, C, bool) { if q.idx == len(q.queried_data_indices[0]) - 1 do return q.idx += 1 return &q.ca.dense[q.queried_data_indices[0][q.idx]], &q.cb.dense[q.queried_data_indices[1][q.idx]], &q.cc.dense[q.queried_data_indices[2][q.idx]], true } for dynamic_body, vel_max, mov_direction in query3_next(&query) { dynamic_body_velocity_with_movement_direction(dynamic_body, vel_max, mov_direction) }-
Problems :
-
More memory waste than Idea 1, as every Query has many
[]dense_idx.
-
-
-
Idea 8: A Query stores a:[]Entity_IDarray-
Pattern
query3_intersect :: proc(ca: Component($A), cb: Component($B), cc: Component($C)) -> Query(A, B, C) {
Query3_Idea8 :: struct($A, $B, $C: typeid) { ca: ^Component(A), cb: ^Component(B), cc: ^Component(C), entities: sa.Small_Array(COMPONENT_MAX_ENTITIES, Entity_ID), iter_idx: u32, } @(require_results) query3_via_intersect_id8 :: proc(ca: ^Component($A), cb: ^Component($B), cc: ^Component($C)) -> (query: Query3_Idea8(A, B, C)) { query.ca = ca query.cb = cb query.cc = cc for entity in ca.dense { if cb.sparse[entity] == nil || cc.sparse[entity] == nil do continue sa.append(&query.entities, entity) } return } query := query3_via_intersect_id8(_dynamic_bodies, _velocities_max, _movement_directions) for entity in sa.slice(&query.entities) { dynamic_body_velocity_with_movement_direction( component_get(query.ca, entity), component_get(query.cb, entity)^, component_get(query.cc, entity)^, ) }-
Same extra memory from Idea 1.
-
The
[]Entity_IDis a stack allocated array. -
Advantages :
-
The query can be made upfront, caching the results with dirty flags.
-
-
Problems :
-
Fetches
component.sparse[entity]twice. -
Confusing API.
-
-
-
Idea 1: Array of Entity with a global:_query_result-
Pattern
intersect2 :: proc(ca: Component($T), cb: Component($W)) -> []Entity_ID {
_query_result: [COMPONENT_MAX_ENTITIES]Entity_ID @(require_results) intersect3 :: proc(ca: Component($A), cb: Component($B), cc: Component($C)) -> []Entity_ID { idx := 0 for entity in ca.dense { if cb.sparse[entity] == nil || cc.sparse[entity] == nil do continue _query_result[idx] = entity idx += 1 } return _query_result[:idx] } for entity in intersect3(_dynamic_bodies, _velocities_max, _movement_directions) { dynamic_body_velocity_with_movement_direction( component_get(&_dynamic_bodies, entity), component_get(&_velocities_max, entity)^, component_get(&_movement_directions, entity)^, ) }-
Advantages :
-
No extra struct. Every thing is just a call to intersect.
-
This by it self can be an issue, as the query doesn't support caching upfront.
-
-
-
Problems :
-
The existence of
_query_result: [COMPONENT_MAX_ENTITIES]Entity_IDis quite ugly. -
Fetches
component.sparse[entity]twice. -
The programmer has to type the name of the component twice.
-
-
-
Idea 2: Many:[]dense_idxarrays, using stack arrays from the global scope-
Pattern
intersect2 :: proc(ca: Component($T), cb: Component($W)) -> (r1, r2: []u32) { -
Sort-of an extension from Idea 1.
dyn_body_indices, vel_max_indices, mov_indices := intersect3(_dynamic_bodies, _velocities_max, _movement_directions) for i in 0..<len(dyn_body_indices) { dynamic_body_velocity_with_movement_direction( _dynamic_bodies.dense[dyn_body_indices[i]], _velocities_max.dense[vel_max_indices[i]]^, _movement_directions.dense[mov_indices[i]]^, ) }-
Avoids fetching
component.sparse[entity]twice. -
Problems :
-
~There has to be lots of different arrays to store each one of
dyn_body_indices, vel_max_indices, mov_indices. -
The programmer has to type the name of the component twice.
-
-
-
Idea 8: Each:Component(T)has a array ofquery_resultarray of ^T-
Pattern
intersect2 :: proc(ca: Component($T), cb: Component($W)) -> (count: int) { -
The query
intersect2function returns a length to be used inside the for loop.
component_get_query :: #force_inline proc(component: Component($T), idx: int) -> ^T { return component.query_result[idx] } for i in 0..<intersect3(_dynamic_bodies, _velocities_max, _movement_directions) { dynamic_body_velocity_with_movement_direction( component_get_query(_dynamic_bodies, i), component_get_query(_velocities_max, i)^, component_get_query(_movement_directions, i)^, ) }-
Problems :
-
When comparing to Idea 4, it comes down to what's lighter, as the ideas are basically the same.
-
Idea 4 seems better, as a
u32is lighter than auintptr(size ofint/uint).
-
-
-
-
Idea 3: Many arrays of:T-
Pattern
intersect2 :: proc(ca: Component($T), cb: Component($W)) -> (r1: []T, r2: []W) {. -
Problems :
-
The array being
[]Tdoesn't make sense, as I want a pointer to the element, but doing so would give me a pointer to a copy of the element. -
It's necessary that the array be defined as
[]^T(same as Idea 6) or[]u32(indices to.data, which is exactly the same as Idea 4). -
The generic nature of
TandWmakes this quite difficult.-
Make a
Componentbe generic enough so it doesn't needTorW. -
The returns are allocated.-
Seems very bad for performance.
-
-
-
-
-
Idea 5: Each:Component(T)has a array of query_result array ofT-
Problems :
-
The array being
[]Tdoesn't make sense, as I want a pointer to the element, but doing so would give me a pointer to a copy of the element. -
It's necessary that the array be defined as
[]^T(same as Idea 6) or[]u32(indices to.data, which is exactly the same as Idea 4).
-
-
With Archetypes
void movement_system(float dt) {
for (int i = 0; i < a_pv.count; i++) {
a_pv.pos[i].x += a_pv.vel[i].vx * dt;
a_pv.pos[i].y += a_pv.vel[i].vy * dt;
}
for (int i = 0; i < a_pvh.count; i++) {
a_pvh.pos[i].x += a_pvh.vel[i].vx * dt;
a_pvh.pos[i].y += a_pvh.vel[i].vy * dt;
}
}
-
No
has_componentchecks.