Credits to Ed, Bert and Harm, I played a "litte" 158 ballot game DamExchange tournament with Flits, Truus and Dam2.2 (free download). The results I would like to share with you.
The computer I used is a desktop Intel core duo E8400 3.0GHz with 4Gb RAM, running Windows XP pro x64.
- Flits : 75 moves in 9 min (and pondering), 6p databases, 128Mb hash table (160 Mb vs. Dam2.2)
- Truus : 75 moves in 9 min (and pondering), 6p databases, 128Mb hash table (160 Mb vs. Dam2.2)
- Dam2.2: 75 moves in 12 min (no pondering), 6p databases, 24Mb hash table
The matches Truus vs. Dam2.2 and Flits vs. Dam2.2 were played with Dam2.2 as initiator, using the file 2move_ballots.pdn. Using setup positions seems to disable the Dam2.2 opening book, so in fact none of the three programs uses an opening book. Dam2.2 stops a game when there are 6 or less pieces on the board, giving reliable game results.
The match Flits vs. Truus was played with the truus_dxp_server -a option. I checked the outcomes and had to correct 7 results from draws into wins/losses.
Here are the results of the tournament.
Round 1. Truus vs. Dam2.2 60 wins, 2 losses, 96 draws
Round 2. Flits vs. Dam2.2 78 wins, 6 losses, 74 draws
Round 3. Flits vs. Truus 13 wins, 10 losses, 135 draws
The games between Flits and Dam2.2 were played "to the end" (6 pieces or less). This gives Flits the opportunity to make more glitches. I found 5, all in/near database positions. All causing a draw position to be lost. I only found one glitch by Flits vs. Truus, again making a losing move in a draw ending (#6 below).
Final standings (Flits glitches not corrected):
1. Flits 391 pts 1.237 avg
2. Truus 371 pts 1.174 avg
3. Dam2.2 186 pts 0.583 avg
To see if having a much larger hash table makes a big difference, I also played a match Truus vs. Dam2.2 with a Truus hash table of 24Mb, equal to Dam2.2. (At the same time raising the 'partijendatabase ruimte' from 4 to 36Mb, effect unknown to me.) The result shows little difference:
Truus/24Mb vs. Dam2.2 57 wins, 0 losses, 101 draws
I noticed a difference with the Flits vs. Truus match played by Ed, more draws on my side. It would be interesting if a number of people could repeat this experiment, so we can see if there is much difference (randomness) in the outcomes.
Maybe we can even make a benchmark out of this: see how many points a program scores on the 474 game FTD benchmark (under some standard conditions) gives a good indication of its playing strength. In addition to internet engine matches, this can help in determining more reliable and up to date estimates for program ratings.
Anyway, it is great that any draughts programmer implementing DamExchange now can run automated matches against these programs!
In case somebody is interested in the glitch positions:
1. [FEN "B:W29,32,34:B8,13,31."] 1. ... 31-37
2. [FEN "B:W6,34,38:B12,13,26,28."] 1. ... 28-32 2. 38x27 26-31 3. 27x36
3. [FEN "B:W18,23,36,39:B1,3,6,35."] 1. ... 35-40 2. 39-34 40x29 3. 23x34 6-11
4. [FEN "W:W29,37,47:B17,19,36."] 1. 47-41
5. [FEN "W:W31,39,50:B8,24,45."] 1. 50-44
6. [FEN "B:W14,21,40:B6,11,24."] 1. ... 24-30