When doing bulk deletes like this what safe guards do you put in place, other than testing the script up/down in another environment, turning off app servers etc (which Im guessing they did not do)?
Naive approach, replace delete with select and see if you're surprised at the results.
More mature approach, especially in an environment where engineers are running bulk changes against the database, you don't do bulk deletes. You change that delete into an update that marks things for later collection.
One tactic I've seen that worked, assuming you have straightforward relational tables: you add a "marked for deletion" column whose value is an identifier for the single run of the bulk job you just did. Then you can query rows with that value in that column to ensure it had the desired effect. If you're satisfied, you run another bulk job which doesn't re-run your original query.. it just deletes rows with that "marked" value.
Lots of places rely on schema-enforced foreign keys and cascading deletes though. In that case, my recommendation is: don't.
Canary deploys i.e. start with a couple customers and do manual validations, wait a little bit of time (maybe a few days) before incrementally rolling it out to larger amounts of customers.
It's not clear I'd the issue affected all tenants where the script ran--which it sounds like it did. It wouldn't be as effective if it only effected certain tenants (maybe with a specific config)