NHacker Next
- new
- past
- show
- ask
- show
- jobs
- submit
login
Rendered at 06:59:46 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
How to think about durable execution - https://news.ycombinator.com/item?id=46245238 - Dec 2025 (37 comments)
- Ensure the workflow is idempotent - if it stops or fails at any point, you should be able to start it from scratch and skip / happily redo various elements.
- Store the messages which trigger workflows.
- Track failures (if your log aggregation is good, even that's enough to start).
Then when the odd thing fails (or sometimes a bunch of things fail, because e.g. a core integration goes down) you can lookup the messages and have a little script or tool to go and re-queue them. This is an easy starting point that can keep you going for a long time until you really approach huge scale.