v1.1.{4-5}: Sorting Propagators

28 August 2024. We perform two experiments to reduce thread divergence by sorting propagators, for the reasons mentioned in the previous post. Given a list of propagators representing constraints, we sort on the syntactic shape of the constraint:

  • v1.1.4: Sort on the symbol of the outermost predicate of the formula, e.g. x < y is equal to z + y < x * x, but less than x <= y for an arbitrary ordering between < and <=.
  • v1.1.5: Lexicographic sort on the first predicate symbol followed by the length of the formula, e.g. length(x < y) = 2 < 3 = length(z + y < x * x).

These are quite simple solutions to reduce divergence while avoiding spending too much time on a sorting algorithm specialized to minimize divergence. The results are not impressive, and it shows that thread divergence is perhaps not the biggest bottleneck of Turbo. Nevertheless, given how cheap this optimization is, I decided to keep it. Note that on some benchmarks where the propagators are very unsorted (e.g. gfd_schedule), sorting increases by 50% the number of nodes explored.

Results v1.1.4

MetricsAverageΔ v1.1.3MedianΔ v1.1.3
Nodes per seconds4439.68+3%1236.870%
Fixpoint iterations per second21659.62+1%6492.200%
Fixpoint iterations per node8.69+1%5.07-5%
#Problems with IDLE SMs at timeout99
Propagators memory9.01MB0%8.08MB0%
Variables store memory72.29KB0%84.10KB0%
#Problems at optimality1111
#Problems satisfiable2222
#Problems unknown22
#Problem with store in shared memory1010
#Problem with prop in shared memory11

TurboGPU-v1.1.3 vs TurboGPU-v1.1.4

Results v1.1.5

MetricsAverageΔ v1.1.3MedianΔ v1.1.3
Nodes per seconds4513.37+4%1243.610%
Fixpoint iterations per second21787.61+2%6612.26+1%
Fixpoint iterations per node8.50-1%5.38+1%
#Problems with IDLE SMs at timeout99
Propagators memory9.01MB0%8.08MB0%
Variables store memory72.29KB0%84.10KB0%
#Problems at optimality1111
#Problems satisfiable2222
#Problems unknown22
#Problem with store in shared memory1010
#Problem with prop in shared memory11

TurboGPU-v1.1.3 vs TurboGPU-v1.1.5