top of page

LightZero

Image Source : opendilab:lightzero

Project Description

  • This is a project I participated during my internship in SenseTime Group Limit. 

  • This project is a part of projects of OpenDILab.

  • This project focused on combining Monte Carlo Tree Search (MCTS) and Deep Reinforcement learning.

  • This project aims to implement various state-of-the-art algorithms, ranging from AlphaZero to Muzero Series.

  • More information can be found in github link and paper.

My Contribution

  • Preproduced the MuZero Algorithm, an innovative method that extends the applicablity of techniques akin to enabling tree search in environments with unkonwn transition dynamics.

  • Implemented the Sampled MuZero method, an extension of MuZero, to facilitate learning in domains with arbitrary complex action spaces through strategic planning over sampled actions.

  • Reproduced the Stochastic Muzero Method, enabling comprehensive incorporation of the stochastic nature of the envrionment in the tree search process.

Algorithm Framework

  • Muzero

muzero_frame.jpg
  • Sampled Muzero

sampled framework.jpg
  • Stochastic Muzero

stochastic framework2.jpg

Experimental Result

  • Muzero

mz_figure.jpg
  • Sampled Muzero

sez_figure.jpg
  • Stochastic Muzero

2048_figure.jpg
bottom of page