Self-learning and monte-carlo Algorithm

Discussion about development of draughts in the time of computer and Internet.
Post Reply
Sidiki
Posts: 118
Joined: Thu Jan 15, 2015 16:28
Real name: Coulibaly Sidiki

Self-learning and monte-carlo Algorithm

Post by Sidiki » Tue Dec 12, 2017 13:28

Hi all,

recently 4th december 2017, Google's firm, deepmind headquarter done a match that opposed their New chess engine called Alphazero against Stockfisch the best chess.
100 games was played and Alphazero won 25 wins 72 draws 0 losses.
Self-learning of 4 hours was enough for Alphazero to master chess game (without opening book and endgame)and crush the best chess engine.
I know that only Windames,Plus500 and Aurora Borealis are this option.
Self-learning ou teaching can be excellent to improve draughts game or solve it?
Itsn't a kind of endgame generator?

https://chess24.com/en/read/news/deepmi ... shes-chess#

Sidiki--

Fabien Letouzey
Posts: 285
Joined: Tue Jul 07, 2015 07:48
Real name: Fabien Letouzey

Re: Self-learning and monte-carlo Algorithm

Post by Fabien Letouzey » Thu Dec 14, 2017 03:29

Hi Sidiki,
Sidiki wrote:recently 4th december 2017, Google's firm, deepmind headquarter done a match that opposed their New chess engine called Alphazero against Stockfisch the best chess.
100 games was played and Alphazero won 25 wins 72 draws 0 losses.
Self-learning of 4 hours was enough for Alphazero to master chess game (without opening book and endgame)and crush the best chess engine.
I know that only Windames,Plus500 and Aurora Borealis are this option.
Self-learning ou teaching can be excellent to improve draughts game or solve it?
Itsn't a kind of endgame generator?
There is so much confusion (not just from you).

First of all, the AlphaZero learning duration was 9 hours (on thousands of enhanced computers), not 4 as everybody is copy/pasting. It is claimed that AlphaZero *reached* the level of Stockfish after 4 hours, but that's not the version that played the match.

Secondly, machine learning of evaluation (as in Scan and AlphaZero, although they differ a lot) has nothing to do with the good old opening-book learning that you are mentioning. The latter only recognises positions it has seen before. So it's basically a search tree (analysis) stored on disk.

Endgame-table generators are a form of exhaustive search: they make sure every possible position (with a specific material signature) is looked at. Again, nothing to do with evaluation (or learning).

Fabien.

Luzimar
Posts: 36
Joined: Tue Jul 25, 2017 01:35
Real name: Luzimar Jacinto Araujo

Re: Self-learning and monte-carlo Algorithm

Post by Luzimar » Thu Dec 14, 2017 15:13

Fabien Letouzey wrote:Hi Sidiki,
Sidiki wrote:recently 4th december 2017, Google's firm, deepmind headquarter done a match that opposed their New chess engine called Alphazero against Stockfisch the best chess.
100 games was played and Alphazero won 25 wins 72 draws 0 losses.
Self-learning of 4 hours was enough for Alphazero to master chess game (without opening book and endgame)and crush the best chess engine.
I know that only Windames,Plus500 and Aurora Borealis are this option.
Self-learning ou teaching can be excellent to improve draughts game or solve it?
Itsn't a kind of endgame generator?
There is so much confusion (not just from you).

First of all, the AlphaZero learning duration was 9 hours (on thousands of enhanced computers), not 4 as everybody is copy/pasting. It is claimed that AlphaZero *reached* the level of Stockfish after 4 hours, but that's not the version that played the match.

Secondly, machine learning of evaluation (as in Scan and AlphaZero, although they differ a lot) has nothing to do with the good old opening-book learning that you are mentioning. The latter only recognises positions it has seen before. So it's basically a search tree (analysis) stored on disk.

Endgame-table generators are a form of exhaustive search: they make sure every possible position (with a specific material signature) is looked at. Again, nothing to do with evaluation (or learning).

Fabien.
Fabyen Congratulations I did not know you are a great programmer also in very good chess. Luzimar Araujo

Sidiki
Posts: 118
Joined: Thu Jan 15, 2015 16:28
Real name: Coulibaly Sidiki

Re: Self-learning and monte-carlo Algorithm

Post by Sidiki » Thu Dec 14, 2017 21:05

Luzimar wrote:
Fabien Letouzey wrote:Hi Sidiki,
Sidiki wrote:recently 4th december 2017, Google's firm, deepmind headquarter done a match that opposed their New chess engine called Alphazero against Stockfisch the best chess.
100 games was played and Alphazero won 25 wins 72 draws 0 losses.
Self-learning of 4 hours was enough for Alphazero to master chess game (without opening book and endgame)and crush the best chess engine.
I know that only Windames,Plus500 and Aurora Borealis are this option.
Self-learning ou teaching can be excellent to improve draughts game or solve it?
Itsn't a kind of endgame generator?
There is so much confusion (not just from you).

First of all, the AlphaZero learning duration was 9 hours (on thousands of enhanced computers), not 4 as everybody is copy/pasting. It is claimed that AlphaZero *reached* the level of Stockfish after 4 hours, but that's not the version that played the match.

Secondly, machine learning of evaluation (as in Scan and AlphaZero, although they differ a lot) has nothing to do with the good old opening-book learning that you are mentioning. The latter only recognises positions it has seen before. So it's basically a search tree (analysis) stored on disk.

Endgame-table generators are a form of exhaustive search: they make sure every possible position (with a specific material signature) is looked at. Again, nothing to do with evaluation (or learning).

Fabien.
Fabyen Congratulations I did not know you are a great programmer also in very good chess. Luzimar Araujo
Hi Fabien,

Thank for all these precisions. So the truth it's else that what it's claimed on many websites and blog.

Fabien Letouzey
Posts: 285
Joined: Tue Jul 07, 2015 07:48
Real name: Fabien Letouzey

Re: Self-learning and monte-carlo Algorithm

Post by Fabien Letouzey » Fri Dec 15, 2017 06:25

Sidiki wrote:Thank for all these precisions. So the truth it's else that what it's claimed on many websites and blog.
If you have doubts, you can post some links and I will have a look.

Some of the confusion is understandable. "learning" is a vague term: it's supposed to describe programs that improve with time. For example, a learning program might get better at solving combinations the more you use it. By contrast, just computing something is not "learning" in itself; you seem to be suggesting that.

The modern variant, usually called "machine learning", goes well beyond rote learning used to remember positions. It's usually used "offline", which means that the programmer runs the learning once during development. And then you get the resulting program, which doesn't learn afterwards (that would be both complicated and pointless). That's why the "learning" functionality doesn't appear; it's already been used by the author.

Sidiki
Posts: 118
Joined: Thu Jan 15, 2015 16:28
Real name: Coulibaly Sidiki

Re: Self-learning and monte-carlo Algorithm

Post by Sidiki » Tue Dec 19, 2017 14:50

Fabien Letouzey wrote:
Sidiki wrote:Thank for all these precisions. So the truth it's else that what it's claimed on many websites and blog.
If you have doubts, you can post some links and I will have a look.

Some of the confusion is understandable. "learning" is a vague term: it's supposed to describe programs that improve with time. For example, a learning program might get better at solving combinations the more you use it. By contrast, just computing something is not "learning" in itself; you seem to be suggesting that.

The modern variant, usually called "machine learning", goes well beyond rote learning used to remember positions. It's usually used "offline", which means that the programmer runs the learning once during development. And then you get the resulting program, which doesn't learn afterwards (that would be both complicated and pointless). That's why the "learning" functionality doesn't appear; it's already been used by the author.
Hi Fabien,

I remember that in one of yours posts on Scan 2.0, you wrote that learning was in the prepare step of the program and after, ie, when the program plays, this option itsn't longer availible. So what it's the truth into this Alphazero story.
Perhaps that it has a very very huge database due to learning or has a revolutionary eval function. They said that it's Montecarlo, that is based on a deep search.

Sidiki

TAILLE
Posts: 968
Joined: Thu Apr 26, 2007 18:51
Location: FRANCE

Re: Self-learning and monte-carlo Algorithm

Post by TAILLE » Tue Dec 19, 2017 16:53

Sidiki wrote:
Fabien Letouzey wrote:
Sidiki wrote:Thank for all these precisions. So the truth it's else that what it's claimed on many websites and blog.
If you have doubts, you can post some links and I will have a look.

Some of the confusion is understandable. "learning" is a vague term: it's supposed to describe programs that improve with time. For example, a learning program might get better at solving combinations the more you use it. By contrast, just computing something is not "learning" in itself; you seem to be suggesting that.

The modern variant, usually called "machine learning", goes well beyond rote learning used to remember positions. It's usually used "offline", which means that the programmer runs the learning once during development. And then you get the resulting program, which doesn't learn afterwards (that would be both complicated and pointless). That's why the "learning" functionality doesn't appear; it's already been used by the author.
Hi Fabien,

I remember that in one of yours posts on Scan 2.0, you wrote that learning was in the prepare step of the program and after, ie, when the program plays, this option itsn't longer availible. So what it's the truth into this Alphazero story.
Perhaps that it has a very very huge database due to learning or has a revolutionary eval function. They said that it's Montecarlo, that is based on a deep search.

Sidiki
Hi Sidiki,

I do not understand your question.
What is the problem of using Montecarlo as search algorithm during a game?
Using Montecarlo does not mean your are in a learning process and BTW in the past I experimented a little this algorithm in Damy as search algorithm.
Gérard

Sidiki
Posts: 118
Joined: Thu Jan 15, 2015 16:28
Real name: Coulibaly Sidiki

Re: Self-learning and monte-carlo Algorithm

Post by Sidiki » Wed Dec 20, 2017 14:56

TAILLE wrote:
Sidiki wrote:
Fabien Letouzey wrote: If you have doubts, you can post some links and I will have a look.

Some of the confusion is understandable. "learning" is a vague term: it's supposed to describe programs that improve with time. For example, a learning program might get better at solving combinations the more you use it. By contrast, just computing something is not "learning" in itself; you seem to be suggesting that.

The modern variant, usually called "machine learning", goes well beyond rote learning used to remember positions. It's usually used "offline", which means that the programmer runs the learning once during development. And then you get the resulting program, which doesn't learn afterwards (that would be both complicated and pointless). That's why the "learning" functionality doesn't appear; it's already been used by the author.
Hi Fabien,

I remember that in one of yours posts on Scan 2.0, you wrote that learning was in the prepare step of the program and after, ie, when the program plays, this option itsn't longer availible. So what it's the truth into this Alphazero story.
Perhaps that it has a very very huge database due to learning or has a revolutionary eval function. They said that it's Montecarlo, that is based on a deep search.

Sidiki
Hi Sidiki,

I do not understand your question.
What is the problem of using Montecarlo as search algorithm during a game?
Using Montecarlo does not mean your are in a learning process and BTW in the past I experimented a little this algorithm in Damy as search algorithm.
Hi Gerard,

My question was, and i can say that it's most a hypotese that a question, perhaps that Alphazero use a very huge learning result Database.
I just precise that they said that it eval function is Montecarlo.

Maurits Meijer
Posts: 221
Joined: Thu Nov 27, 2008 19:22
Contact:

Re: Self-learning and monte-carlo Algorithm

Post by Maurits Meijer » Wed Dec 20, 2017 16:09

Sidiki wrote:
TAILLE wrote:
Sidiki wrote:
Hi Fabien,

I remember that in one of yours posts on Scan 2.0, you wrote that learning was in the prepare step of the program and after, ie, when the program plays, this option itsn't longer availible. So what it's the truth into this Alphazero story.
Perhaps that it has a very very huge database due to learning or has a revolutionary eval function. They said that it's Montecarlo, that is based on a deep search.

Sidiki
Hi Sidiki,

I do not understand your question.
What is the problem of using Montecarlo as search algorithm during a game?
Using Montecarlo does not mean your are in a learning process and BTW in the past I experimented a little this algorithm in Damy as search algorithm.
Hi Gerard,

My question was, and i can say that it's most a hypotese that a question, perhaps that Alphazero use a very huge learning result Database.
I just precise that they said that it eval function is Montecarlo.
I don't think AlphaZero's search should be called Monte Carlo; It's selecting moves in the search tree based on the advise of the evaluation function, so it's a deliberate way of pruning. This is I think the main innovation of AlphaZero, but it is hard to tell how this impacts performance.

The main power of AlphaZero, besides computational power and setting the match conditions, seems to be the massive neural net it uses for evaluation. I don't believe it is using a database in playing.

AlphaZero's publicity is absolutely fantastic.
http://slagzet.com

CheckersGuy
Posts: 17
Joined: Mon Oct 17, 2016 09:05
Real name: Robin Messemer

Re: Self-learning and monte-carlo Algorithm

Post by CheckersGuy » Tue Feb 27, 2018 15:29

Having looked at the alphaZero/alphaGoZero papers, one shouldnt call the algorithm mcts because there are no longer random playouts at leaf nodes.

Post Reply