CUD@SAT

The material presented here is provided as companion to the paper:

A. Dal Palù, A. Dovier, A. Formisano, and E. Pontelli
CUD@SAT: SAT Solving on GPUs
JETAI (on line since Sept 2014). DOI: 10.1080/0952813X.2014.954274

(and to its preliminary version presented at CILC 2012)

This page contains source code, installation instructions and results.

Source code

The code discussed in the paper can be downloaded here:   cudasat.cu  (10 Apr 2013). Save it using the suffix *.cu. The code has been tested under Linux, Mac OS and cygwin systems.

  • An example of UNSAT cnf formula  hole6
  • An example of SAT cnf formula 4queens

Installation instructions

Make sure to have a CUDA-capable GPU (we suggest capability 2.1 or higher), gcc and CUDA SDK installed on your system. You can follow the instructions for your specific OS at docs.nvidia.com/cuda/index.html

Compile your code with (for Unix/Linux/Mac):

> nvcc -O3 -arch=sm_21 cudasat.cu -o cudasat
Usage:
./cudasat mode N_vars_blocks N_vars_thread Maxv learning varSelectStrategyCPU varSortGPU filename

  • mode:integer that specifies how the parallel search is performed:
    • 0 = whole search performed by CPU only (no watched literals)
    • 1 = search tree on CPU, unit propagation (no watched literals) on GPU at each node of the tree
    • 2 = lower search tree is passed to GPU whenever at most Maxv variables are unassigned. Each thread implements an independent DPLL procedure, where N_vars_blocks and N_vars_thread variables are uniquely assigned by each thread.
    • 3 = mode 1 is combined to mode 2
    • 4 = whole search performed by CPU only (unit propagation with watched literals)
    • 5 = unit propagation with watched literals on both CPU and GPU. The GPU is used on lower part of the three with same rules as in mode 2.
  • N_vars_blocks: integer that controls the number of variables assigned by different blocks (for mode 2 and 5)
  • N_vars_thread: integer that controls the number of variables assigned by different threads. This is limited by the capability of the GPU (usually up to 10 variables = 1024 threads per blocks – for mode 2 and 5)
  • Maxv: integers that controls the maximum number of unassigned variables that activate the search on the GPU (for mode 2 and 5)
  • learning: {0,1}. If 1 enables the clause learning procedure (modes <= 3, on the search tree handled by CPU)
  • varSelectStrategyCPU:0..4.
    • 0 = clause with minimum free variables (breaking ties by clause code)
    • 1 = JW+- (two sided)
    • 2 = JW (one sided)
    • 3 = DLIS (Dynamic Largest Individual Sum)
    • 4 = DLCS (Dynamic Largest Combined Sum)
  • varSortGPU: {0,1}. If 1, before invoking GPU (modes 2 and 3) the varSelectStrategyCPU is applied to the unassigned variables, before passing them to GPU.
  • filename: the formula in cnf format

Example of usage:

> ./cudasat 0 0 0 0 0 0 0 hole8.cnf
Simple search on CPU
> ./cudasat 1 0 0 0 1 0 0 hole8.cnf
Search with learning and GPU unit propagation

> ./cudasat 3 8 6 50 1 0 0 hole8.cnf
Search with learning in the higher part of the search tree (GPU unit propagation), lower part on GPU when 50 or less variables are unassigned and 8+6 variables for a total of 16K threads perform paralle DPLL (no learning in lower part of the tree)

> ./cudasat 4 0 0 0 0 0 0 hole8.cnf
CPU implementation of watched literals

> ./cudasat 5 8 6 50 1 0 0 hole8.cnf
Watched literals are also implemented in the lower part of the search tree in each DPLL run by each thread (see mode 3 example)

Additional material and results

We provide a comprehensive set of runs with different problems from Sat competitions and different launch modes.

The results referred in the journal paper can be downloaded from THIS LINK.  Here there are other tables

Additional Source codes

Search tree on GPU

The following code, as mentioned in Section 5.4 of the paper, handles a search tree stored in the GPU’s memory. The variable non deterministic assignment is performed by a kernel (one block, one thread) that takes care of backtracking. The CPU simply interleaves the two kernels and periodically detects the completion of the search.

Source file:     cudasat-onlyGPU.cu

Usage:

./cudasat-onlyGPU learning filename

where learning(={0,1}) activates the flag for clause learning.

Learning during GPU DPLL

We provide here another source code, developed by one of our students, that implements the learning procedure on GPU for the lower search tree when activated in mode 2. Each thread, while performing the DPLL procedure, can learn local conflicts and prune the local search tree.

The code can be compiled as described above.

Usage:
cudasat mode N_vars_blocks N_vars_thread Maxv learning filename

Where the parameters have the same semantics as discussed above.

Results:

The learning provides little help, since the learnt clauses have only local scope, i.e. they provide no help in pruning other worker’s trees. The following table provides some computational times for HOLE8 problem (UNSAT) and the same problem, where the first clause is removed (SAT). CPU column refers to a CPU run, while (X,Y) refers to X=N_vars_blocks and Y=N_vars_thread. Learning in on. Speedup is referred to the ration between best parallel result (5,6) respect to the sequential version run on GPU (0,0). For this instance, the MaxV=70 starts immediately the GPU search.

Instance MaxV CPU (0,0) (5,0) (5,2) (5,4) (5,6) SpeedUp
hole8 SAT 70 2.464 21.677 2.632 2.633 2.632 2.334 9.28399
hole8 UNSAT 70 2.527 24.629 5.240 5.239 5.239 5.224 4.71413

For this example, there is a minimum advantage compared to CPU version for the SAT instance, while the UNSAT instance requires half time on the CPU, even if the presence of multiple blocks and threads provide a moderate speedup compared to the sequential version run on GPU.

For any information or request, please send an email to any of the authors.

 

Comments are closed.