-
Couldn't load subscription status.
- Fork 143
Description
Why do we need this ?
Right now, if our Dora metrics sync process fails partway through, we have to restart the whole thing. That wastes time and makes it hard to know exactly where it broke. If we save “checkpoints” after each step and retry failed steps automatically, we can:
- Pick up where we left off, instead of starting over.
- Save time and compute resources.
- See exactly which step failed.
What we’ll do
-
Split into steps
Define clear stages (e.g. “fetch data,” “calculate lead time,” “save results”).
-
Save checkpoints
After each stage finishes, record that it succeeded (e.g. in a small database table or file).
-
Add retries
If a stage fails, try it again a few times before giving up.
-
Keep “force” option
If someone runs sync --force, we clear all checkpoints and start from the first step. -
Better logs
Log each stage’s start, end, attempts, and any errors so we can debug easily.
Further Comments / References
This shall be a prerequisite to #350