Ed/Gerard, i have a question related to the DB cache implementation.
So far (with 6p databases) I preload all the databases during start-up, and all are directly accesible in main memory.
Although i also don not use a cache for the 7p database generation (so far), it is evident that i need to implement a cache mechanism in my program Damage.
So far I understood that you use a double linked list of 4K data blocks.
I assume that you use the double link to easily replace the Least-Recently-Used datablock.
What I still not completely understand, how you find a specific blocknumber based on a index.
In the preloaded implementation, i have a list of memory pointers, where i can find the location of a specific database (based on white/black man/king counts). The index for every specific database starts at 0.
When using a cache, i think i need a unique index (so not incorporating the specific database) for a position. Do you do something similar, or do you keep track on the specific database information (like mancount) for all blocks?
If you only use the double link list, you go linear trough the blocks, which is quite time consuming.
On the other hand if you keep a parallel sorted list (sorted on index) with the associated blocknumbers, the sorting will cost quite some time.
So i assume i overlooked something trivial.....
Bert, take a look at the Chinook db access code: http://games.cs.ualberta.ca/~chinook/databases/code.c
For the larger databases (>6 pieces), they have for every database subdivision (wm, wk, bm, wk, wr, br tuple) a plain text index file with for every 4K block of data in the database the 32-bit index of the position that starts the block. Suppose you build the full 8 pc databases and ultimately have a 512 Gb database on disk. That means 128M blocks of indices, 4 bytes per starting index so about 512 Mb of indexing data to be in RAM at all times. Suppose you have enough memory left to use 16Gb for caching. That means that you can cache 4M blocks of 4K each. During a search, once you have computed the index of the current position, you can do a binary search on the indexing data (29 steps) to find the location on disk, and then read in the specific 4K block that contains the value of the current position.
Your question is how to locate this block in RAM if it is already among the cached blocks? For this, you need another array of 128M entries, this time with 4-byte indices into the array of cached 4K blocks. That means you have another 512 Mb of indexing data to be in RAM at all times. After every disk read of a 4K block not in cache, you simply update the location of the block in the cache array, using the LRU entry (that's where the double-linked list comes in) as the one to replace.
In total you have 1 Gb of RAM overhead for 16Gb of caching. Note that this first overhead is proportional to the size of the db on disk, not to the size of the cache! However, there is also overhead of using a double linked list, and more importantly, also from having a mini-cache of 64 4-byte indices of 64-byte mini-blocks per cached 4K block (this is used to speedup locating the position within the 4K block). That stuff is about 2*4 bytes for the list pointers + 256 bytes for the mini-cache, so about 1/16 of the 4K data block itself, so another 1 Gb in total for 16Gb of cached blocks. This overhead, however, scales with the number of data blocks. But still, 18 Gb of memory to have 16 Gb in cache is not bad.
This is how I understand the Chinook code, and also from what I have learned from correspondence with Ed and his online descriptions of building the checkers 10 pc and draughts 8 pc dbs.