Performance tests check if the performance of an algorithm is still as good as before, e.g. after the upgrade of a python package, a common utility function or changes in the code. If you just want to check whether an algorithm or environment works with the rest of the framework at all, use smoke tests.
We use conf/performance/FetchReach/sac_her-test.yaml
as an example. Here is the config:
# @package _global_
# Changes specified in this config should be interpreted as relative to the _global_ package.
defaults:
- override /algorithm: sac
env: 'FetchReach-v2'
algorithm:
replay_buffer_class: HerReplayBuffer
buffer_size: 1000000
batch_size: 256
learning_rate: 0.001
gamma: 0.96
tau: 0.07
policy_kwargs:
n_critics: 1
net_arch:
- 256
- 256
- 256
replay_buffer_kwargs:
n_sampled_goal: 4
goal_selection_strategy: 'future'
This just specifies that the performance test is for the sac
algorithm with the HER replay buffer and the FetchReach-v2
environment.
performance_testing_conditions:
# In 2 out of 3 tests, the eval/success rate should be at least 0.9 after 10k steps.
total_runs: 3 # How many runs in total:
succ_runs: 2 # This number of runs should meet the conditions:
eval_columns: eval/success_rate # This is what we evaluate to determine success. Will use this to override the \'early_stop_data_column\' parameter of main.yaml
eval_value: 0.9 # This is the value we determine for success. Will use this to determine and override the \'early_stop_threshold\' parameter of main.yaml
# ca. 15 minutes on GPU
max_steps: 10_000 # This is the time limit for checking the success. Will use this and the \'eval_after_n_steps\' parameter of main.yaml to determine the n_epochs parameter in main.yaml.
As the comments in the config already explain, the performance test tests whether sac&her can achieve at least 90% eval/success_rate
after 10k steps in 2 of 3 runs.
hydra:
sweeper:
n_jobs: 3
_target_: hydra_plugins.hydra_custom_optuna_sweeper.performance_testing_sweeper.PerformanceTestingSweeper
study_name: sac_her_FetchReach-test
The last part of the config sets the number of parallel runs with n_jobs
, specifies that we use the PerformanceTestingSweeper
and names the study.
You can start this performance test with python src/main.py +performance=FetchReach/sac_her-test --multirun
If the test is successful, the PerformanceTestingSweeper
prints something like [2022-04-22 13:10:25,558][HYDRA] Performance test FetchReach/sac_her-test successful! The value for eval/success_rate was at least 0.9 in 3 runs.
to the console.
Use run_performance_tests.sh
to run multiple performance tests sequentially.