Retry Step On Spot VM
Situation
QuickBuild supports to launch Amazon/Azure spot VM on demand to reduce cost. Since spot VM can be terminated at any time, the build should be retried on other nodes upon VM termination.
Resolution
QuickBuild steps can be retried via retry setting in advanced section of step definition. However this retry only happens on current node. So if you specify retry setting for the step running on spot VM, it will not work. Instead, we should retry the parent step and the parent step should run on non-spot VM. A demo configuration is set up to explain the idea:
- Step build is set up to run on any build agent, and we assume that build agents will be launched on demand using spot VM
- Step master is configured to run on server which is not a spot VM, and master step will be retried 3 times upon step error message .*Error testing job.* which will be reported if node running any child step is terminated