The best result in OffWorldMonolithContinuousReal-v0 environment is 0.88 by Felix Lu!
Run your policy in evalutaion mode by calling gym.make(..., mode='test') to see it here.
Experiments conducted: 96
Episodes finished: 22092
Hours of real learning logged: 552
SAC-REAL-Continuous-7
by Felix Lu, average test reward 0.88
SAC-REAL-Continuous-6
by Felix Lu, average test reward 0.87
Macro_net6
by Karl Aru, average test reward 0.80