VM Idle Management
Use VM Idle Management to reduce compute cost without changing workflow logic.
Path: Settings -> VM Management
Default behavior
Auto-stop is enabled by default for all organizations. New orgs are created with:
Auto-Stop Idle Agents: enabledIdle Timeout: 30 minutesAuto-Wake on Job Dispatch: enabled
This ensures cost efficiency is the baseline operating mode. Adjust or disable in Settings as needed.
Configure org-level VM policy
- Toggle
Auto-Stop Idle Agentson or off. - Set
Idle Timeoutfrom 10 to 120 minutes. - Enable
Auto-Wake on Job Dispatchif stopped agents should boot automatically when work is queued. - Save VM settings.
What counts as activity
Idle logic uses lastActivityAt (work activity), not just connection heartbeat.
Activity is touched by:
- job dispatch
- step-status updates
- worker log ingestion batches via
POST /api/job-logs(resolved fromjobId) - handshake and healthy reconnect events
- workspace keepalive pings via
POST /api/agent/{id}/activity - other execution-affecting agent actions
Idle stop also checks for active jobs before stopping an agent. Jobs in queued, processing, or running state block idle-stop.
Per-agent override (Always-On)
On Agents -> Agent View, use the status chip toggle:
Auto-Stopmeans agent follows org idle policy.Always-Onmeans agent is exempt from auto-stop.
This sets idleAutoStopExempt for that agent.
Dispatch behavior with stopped agents
When dispatching via agent action endpoints, scheduled runs, or retries:
- If agent is
stoppedand auto-wake is enabled, Mimic starts it and waits untilonline(up to 2 minutes). - If auto-wake is disabled, dispatch returns a conflict/error response and the run is not started.
- The same auto-wake policy applies consistently across manual API calls, scheduled runs, and retry attempts.
From /app/agents, operators can issue Start and Stop directly from the fleet table. These actions map to POST /api/agent/{id}/action with start/stop and emit manual_start/manual_stop lifecycle events.
Resume latency: Starting a stopped agent takes 60-120 seconds. Schedule time-sensitive RCM workflows with this buffer in mind.
Lifecycle events
Every idle stop and auto-wake transition is recorded as a persistent lifecycle event:
| Event Type | When |
|---|---|
idle_stop_attempt | Idle monitor detects an agent past its timeout |
idle_stop_success | Agent successfully stopped |
idle_stop_failed | Stop attempt failed (EC2 error) |
wake_requested | Auto-wake initiated for a job/schedule |
wake_blocked | Auto-wake disabled by org policy |
wake_ready | Agent came online after wake |
wake_timeout | Agent did not come online within timeout |
wake_failed | EC2 start command failed |
manual_start | User-initiated start via UI/API |
manual_stop | User-initiated stop via UI/API |
manual_reboot | User-initiated reboot via UI/API |
These events are visible in:
- Agent Events SSE stream (
GET /api/agents/{id}/events) aslifecycleevents - Agent Logs endpoint (
GET /api/agents/{id}/logs) merged into the unified log timeline - Agent View UI as status badges showing “Waking from idle” state
Agent View performance behavior
Agent View now separates critical status polling from heavy tab data fetches:
- Live transition polling uses
GET /api/agents/{id}/status(lightweight payload). - Recent runs load from
GET /api/agents/{id}/runs. - Portals, functions, and script library data load on demand when tabs are opened.
This reduces DB load during provisioning/reboot transitions while keeping operator feedback near real time.
Verification checklist
- Enable auto-stop with a low timeout (e.g., 10 minutes) in dev.
- Wait for an idle agent to be stopped — confirm
idle_stop_successevent appears. - Dispatch a run to the stopped agent — confirm
wake_requestedandwake_readyevents. - Check the agent view UI shows the “Waking from idle” badge during resume.
- Verify the run completes successfully after wake.
Watchdog behavior for stale runs
Background monitors enforce runtime limits:
- jobs with stale heartbeats and exceeded max runtime are failed
- hung worker process is killed on the VM
- agent status is reset for subsequent dispatch
- retry pipeline can requeue according to scheduling policy
Use this with RPA-first scripts to keep long-running automations deterministic and recoverable.
Session lock recovery (Windows)
Idle policy and session recovery solve different failure classes:
- idle policy controls compute stop/start economics
- session recovery prevents lock-screen induced mid-run termination
For guarded watchdog rollout and live patch procedures, see Windows Session Recovery.