Recovering from deploy failures
Last updated
Was this helpful?
Last updated
Was this helpful?
You might need to destroy the Fly builder app. It'll get auto-created again when you retry, which is what you should do after destroying the builder app.
Just retry. It's fine. :)
Just retry. It's fine. :)
Start by surveying the scene, to see how many machines are on the new image vs the old one, or in replacing
vs failed
vs created
status.
If you're here, the app is probably online but no longer processing background jobs (because all the Sidekiq processes were instructed to enter quiet mode during ).
Handle this by rebooting one of the worker_autoscale machines. That should be enough to start bringing machines back online.
Once you've verified that the app is doing work again, wait for it to catch up on the run backlog, and then retry the deploy.
Manually redo the deploy.
Manually update the rest of the machines.
Start by examining fly m list -a $FLY_APP_NAME
, and build a list of machine IDs that are stuck on the old image.
For each one, do something like this:
fly m destroy MACHINE_ID
Add --force
if the machine is stubborn and wonβt stop.
and then use fly scale count
to scale back up to the desired machine count. Search fly scale count in the internal slack and you'll see example usage.
Do this using a , using the Docker image URI from the build step.