First, we have collected a new dataset, which consists of all the curated levels from Beast Saber. This was recommended by some people on reddit. Meanwhile, we also had to upgrade the code to work with the new level format.
Secondly, we have reimplemented the C-LSTM model from DanceDanceConvolution on pytorch, and integrated it into our training pipeline. We did this because we found DDC to produce better results for Stage 1, for DeepSaber v1. While doing this, we actually found a bug in our code, which meant that, for stage 1, we were training only on the first 10 seconds of every song. We believe this is why our stage 1 model wasn’t working so well, and we plan to revisit the wavenet model again after fixing this bug.
The stage 2 model was left the same, with only one change. We extended the maximum sequence length from 512 to 2048, to allow for full generation of longer songs, even at ExpertPlus difficulty (which can have lots of notes..)
We utilized Google cloud to retrain the new DDC stage 1 model, as well as the stage 2 model, on the new curated Beast Saber dataset. We trained on all Expert, and ExpertPlus levels, and we didn’t use data augmentation this time. The results are very promising. Stage 1 works so well now, that the quality of stage 2 appears to be the bottleneck now! In plan language: the model is now very good at identifying the rhythm, and where to put notes, and it seems to be a bit better than before at selecting which notes to put (althought it still sometimes lacks a bit of variety). Here are a couple of examples generated by the new model
Last, but definitely not least, we have finally open sourced the code!. The code can now be found on this github page. There are instructions on how to run the model on the readme. We do plan to work on making a simpler-to-use interface, as well as try to keep improving the model!