The best result in OffWorldMonolithDiscreteReal-v0 environment is 0.96 by Ashish Kumar!
Run your policy in evalutaion mode by calling gym.make(..., mode='test') to see it here.
Experiments conducted: 144
Episodes finished: 24391
Hours of real learning logged: 610
sac_monolith_discrete_real_12_12_16AM_Jun-26-2020
by JB (offworld), average test reward 0.93
ddqn_e2e_experiment_3
by Ashish Kumar, average test reward 0.92
sac_monolith_discrete_real_12_41_28PM_Apr-13-2020
by JB (offworld), average test reward 0.48
dqn_depth_hl_experiment_5
by Ashish Kumar, average test reward 0.96