Roadmap 2026
30 March 2026. The year 2025 marks the end of the FNR-funded project COMOC.
Over the course of the project, we developed Turbo, a general-purpose, exact, discrete constraint programming solver that runs entirely on GPU architectures. Turbo supports the discrete subset of MiniZinc and can solve instances from the MiniZinc Challenge benchmark suite. It is already competitive with sequential CP solvers such as Choco, and on some problems it can even outperform OR-Tools.
The algorithms behind the current version of the solver (v1.2.8) were published at AAAI 2026. This paper is the final scientific deliverable of the COMOC project.
The foundations are now in place, and in 2026 we will build a new set of projects on top of them.
This year also marks the start of a new FNR-funded project, NEURESA, in collaboration with Cognifyer. NEURESA stands for Neuro-Symbolic Reasoning for Embedded System Architectures Design. In this project, we investigate how LLMs can support both the modelling stage—specialized here to the design of embedded system architectures—and the solving stage, where they may help improve Turbo’s decisions.
The team is also growing:
- Clément Poncelet joins the NEURESA team as a postdoctoral researcher.
- Yi-Nung Tsao (PhD candidate) is investigating Turbo for continuous problems and neural network verification (NNV).
- Hakan Hasan begins a PhD on distributed constraint solving, and just got his first paper at the PaPoc workshop 2026.
- Anisa Meta joins us to explore GPU-based preprocessing algorithms in the context of her Master's thesis.
- Wei Huang continues his work on reducing the number of iterations in fixpoint computation, also as part of his Master's thesis.
- A team of motivated first-year students–Wenbo Han, Eduardo Rodrigez and Ching-Yao Tang–is working on reducing the number of memory transactions during fixpoint computation.
We have an exciting research agenda for 2026. In this post, I discuss some directions we plan to explore.
Continuous Constraint Problems
In recent years, NVIDIA GPUs have tended to include more floating-point cores (FP32) than integer cores (INT32), largely because of their heavy use in machine learning workloads. On Hopper H100 architectures, there are twice as many. For Turbo, this opens two research directions:
- extending Turbo to continuous constraint solving;
- solving discrete constraint problems using floating-point representations (FP32, or even FP4, FP8, FP16, FP64).
The first direction is not trivial, because floating-point arithmetic introduces many subtle difficulties. Fortunately, there is already a rich body of work in this area. The following survey papers and references provide a strong starting point:
- Interval arithmetic: Hickey et al., Interval Arithmetic: from Principles to Implementation, 2003 and Kulisch U., Mathematics and Speed for Interval Arithmetic: A Complement to IEEE 1788
- Inverse interval functions: Goualard F., Interval Extensions of Multivalued Inverse Functions, 2008
- Midpoint of an interval: Goualard F., How do you compute the midpoint of an interval?, 2014
- Computing an over-approximation (branch-and-prune): Granvilliers L., Algorithm 852: RealPaver: an interval solver using constraint satisfaction techniques, 2006 and Chabert et al., Contractor Programming, 2009
- Optimization problems (brand-and-bound): Araya I. et al, Interval Branch-and-Bound algorithms for optimization and constraint satisfaction: a survey and prospects, 2016
- Computing an under-approximation: Araya I. et al., Upper Bounding in Inner Regions for Global Optimization under Inequality Constraints, 2014
The second extension—solving discrete problems with floating-point domains—is not new. It was introduced in CLP(BNR) as a way to unify the representation of Boolean, natural and real-valued domains. An integer domain can be represented by carefully rounding the bounds of a real interval. The same strategy is implemented in IC, the Hybrid Finite Domain / Real Number Interval Constraint Solver of the ECLiPSe constraint logic programming system.
The key intuition is to observe that a 32-bits floating-point number can exactly represent integer in the range [-2^24, 2^24].
There are then two possible ways to propagate constraints over such domains.
As in CLP(BNR) and ECLiPSe, one can apply standard floating-point propagators and then round the result back to an integer domain.
However, this may produce weaker propagation.
For example, [1,1] * [0,1] yields [0,+oo] with a continuous propagator, whereas an integer propagator would infer the tighter domain [0,1].
This suggests an interesting research direction: using integer propagators over floating-point domains. That could lead to a novel contribution. Additional speedups may also be possible by using lower-precision floating-point formats whenever the domains fit safely within the available representation.
Towards Efficient Search Strategy
So far, we have compared Turbo against Choco and OR-Tools using a fixed search strategy–the one specified the MiniZinc model. In practice, however, adaptive search strategy such as wdom/deg, FRBA or pick/dom are often much more effective.
As highlighted in this paper, there are three main ingredients behind the success of modern CP solvers:
- adaptative search strategy;
- solution-based phase saving;
- restarts.
The ACE solver has recently shown strong performances in the XCSP3 competition, including against hybrid SAT+CP solvers based on lazy clause generation.
At the moment, it remains unclear how to parallelize the learning component of SAT solvers effectively on GPUs, so this is not a direction we plan to pursue yet. By contrast, the three components listed above appear much more amenable to efficient GPU implementation.
This extension could become a major milestone for Turbo: it should allow the solver to handle a broader class of problems, and begin competing more seriously in the open category of constraint solving competitions.
Fixpoint Computation Improvement
Turbo’s current fixpoint loop is deliberately simple.
Each warp iterates over all constraints, synchronizes with the others, and, if anything changed, starts another pass.
This mode is exposed as -fp ac1.
A small improvement is available with -fp wac1, where each warp computes a local fixpoint over groups of 32 constraints before synchronizing globally.
We are currently exploring two orthogonal optimization directions:
- converging faster, by reducing the number of fixpoint iterations;
- reducing memory transactions, by improving memory access patterns.
The first direction is primarily algorithmic: it aims to statically schedule constraints in a way that accelerates convergence to the fixpoint.
The second direction is more hardware-oriented: it aims to improve memory coalescing and reduce memory traffic by reordering variables and constraints appropriately.
From preliminary experiments, we expect each of these optimizations to provide a speedup of at least 2x. It is still unclear, however, how strongly they interact, and whether their gains will combine cleanly in practice.