Auto Shutdown Script

post by jefftk (jkaufman) · 2025-03-29T13:10:05.227Z · LW · GW · 5 comments

Contents

5 comments

I run a lot of one-off jobs on EC2 machines. This usually looks like:

For short jobs this is fine, but when I run a long job there are two issues:

Ideally I could tell the machine to shut itself off if no one was logging in and there weren't any active jobs.

I didn't see anything like this (though I didn't look very hard) so I wrote something (github):

$ prevent-shutdown long-running-command

As long as that command is still running, or someone is logged in over ssh, the machine will stay on. Every five minutes a systemd timer will check if this is the case, and if not shut the machine down. Note that you still need screen or something to prevent the long running command from exiting when you log out.

(This is an example of the kind of thing that I find goes a lot faster with an LLM. I used Claude 3.7, prompted it with essentially the beginning of this blog post, took the scripts it generated as a starting point, and then fixed some things. It did make some mistakes (the big ones: a typo of $ for $$, a regex looking for PID: that should have looked for ^PID:, didn't initially plan for handling stale jobs) but that's also about what I'd expect if I'd asked a junior engineer to write this for me. And with much faster turnaround on my code reviews!)

5 comments

Comments sorted by top scores.

comment by Dagon · 2025-03-29T15:51:32.956Z · LW(p) · GW(p)

I've seen scripts (though I don't have links handy) that do this based on no active logins and no CPU load for X minutes as well.  On the other tack, I've seen a lot of one-off processes that trigger a shutdown when they complete (and write their output/logs to S3 or somewhere durable).  Often a Lambda is used for the control plane - it responds to signals and runs outside the actual host.

Replies from: jkaufman
comment by jefftk (jkaufman) · 2025-03-31T11:53:41.615Z · LW(p) · GW(p)

I like this idea a lot, but I'm nervous about setting the right CPU threshold. Too low and it never shuts off, too high and it shuts down in the middle of something when waiting for a slow download. But possibly if I looked at load logs I'd see it's so clearly either ~zero or >>zero that it's not fussy?

Replies from: Dagon
comment by Dagon · 2025-03-31T17:09:54.920Z · LW(p) · GW(p)

The rabbit hole can go deep, and probably isn't worth getting too fancy for single-digit hosts.  Fleets of thousands of spot instances benefit from the effort.  Like everything, dev-time vs runtime-complexity vs cost-efficiency is a tough balance.

When I was doing this often, I had different modes for "dev mode, which includes human-timeframe messing about" and "prod mode", which was only for monitored workloads.   In both cases, automating the "provision, spin up, and initial setup", as well as the "auto-shutdown if not measurably used for N minutes (60 was my default)" with a one-command script made my life much easier.

comment by Gurkenglas · 2025-03-31T12:07:04.297Z · LW(p) · GW(p)

You could reduce check-shutdown.sh to the ssh part and prevent-shutdown.sh to "run long-running-command using ssh".

Replies from: jkaufman
comment by jefftk (jkaufman) · 2025-04-02T12:40:11.948Z · LW(p) · GW(p)

That's elegant in some sense, but somehow doesn't feel like the right way to do it.