And some additional info (my machine has 64 GByte).
I used a 16 GByte cache for previous DB results.
DBs are stored non compressed 1 bit/position.
DBs are loaded by 4K Blocks in the Cache.
The program I used was based upon the work of Michel and Harm (you can find still some references in the source).
For the DB generation (mostly in pairs) 2 blocks of 16 GByte are allocated.
I could use (as a further optimization) the whole 32 GByte for symmetrical DBs, which i didn't implement yet.
First phase is called CP, so Capture and Promotion.
This is the only phase where other DBs are called (and as we are in Breakthrough mode we don't need to access other DBs for the score).
After this the iteration process starts, but so far moving through all indexes, so a forward scan, and not (as Michel implemented) a queue based backwards scan approach (which is far more efficient in the later stage).
The parallel implementation divides the index space in equal parts, and synchronization is done via a ::WaitForMultipleObjects function.
I still need to check if the partition for large DBs does not contain errors, and if so I will distribute the source for those interested.
Of course my impementation is quite different for at least two reasons: 1) I can use only 32Gb RAM, 2) I add a compression mechanism.
Let's take as an exemple the 6x5db. The number of positions is 12 423 500 232 but when it is white to play there are 2 000 174 981 positions without capture
and with black to play there are 1 974 384 083 positions whithout catpure . Because I am unable to calculate an index without holes for the positions without captures and because it of no use to have an index whithout holes for 12 423 500 232 I decided to build a quite simple index on 37 026 007 200 (number of black configurations * number of white configurations).
Now I have to choose a compression mechanism as simple as possible with an access to the db as fast as possible. I eliminate immediatly all sophistacated solutions and chose simply to store all relevant positions in a sequential order with 32bits/per position (the less significant bits of the index position) hoping that I have to store in the db less than 1% of the positions which is really the case.
To solve my lack of memory I divide the index space in block of 4G positions.
As soon as a block of 4G positions have been handled the results are stored directly in the db and the access to this part of the db is made possible by the standard access to all the db.
On the disk the db is divided in blocks of 4K.
Our implementation being completely different we could be 100% confident when we reach the same figures don't we?