Skip to content

VM Idle Management

Use VM Idle Management to reduce compute cost without changing workflow logic.

Path: Settings -> VM Management

Default behavior

Auto-stop is enabled by default for all organizations. New orgs are created with:

  • Auto-Stop Idle Agents: enabled
  • Idle Timeout: 30 minutes
  • Auto-Wake on Job Dispatch: enabled

This ensures cost efficiency is the baseline operating mode. Adjust or disable in Settings as needed.

Configure org-level VM policy

  1. Toggle Auto-Stop Idle Agents on or off.
  2. Set Idle Timeout from 10 to 120 minutes.
  3. Enable Auto-Wake on Job Dispatch if stopped agents should boot automatically when work is queued.
  4. Save VM settings.

What counts as activity

Idle logic uses lastActivityAt (work activity), not just connection heartbeat.

Activity is touched by:

  • job dispatch
  • step-status updates
  • worker log ingestion batches via POST /api/job-logs (resolved from jobId)
  • handshake and healthy reconnect events
  • workspace keepalive pings via POST /api/agent/{id}/activity
  • other execution-affecting agent actions

Idle stop also checks for active jobs before stopping an agent. Jobs in queued, processing, or running state block idle-stop.

Per-agent override (Always-On)

On Agents -> Agent View, use the status chip toggle:

  • Auto-Stop means agent follows org idle policy.
  • Always-On means agent is exempt from auto-stop.

This sets idleAutoStopExempt for that agent.

Dispatch behavior with stopped agents

When dispatching via agent action endpoints, scheduled runs, or retries:

  • If agent is stopped and auto-wake is enabled, Mimic starts it and waits until online (up to 2 minutes).
  • If auto-wake is disabled, dispatch returns a conflict/error response and the run is not started.
  • The same auto-wake policy applies consistently across manual API calls, scheduled runs, and retry attempts.

From /app/agents, operators can issue Start and Stop directly from the fleet table. These actions map to POST /api/agent/{id}/action with start/stop and emit manual_start/manual_stop lifecycle events.

Resume latency: Starting a stopped agent takes 60-120 seconds. Schedule time-sensitive RCM workflows with this buffer in mind.

Lifecycle events

Every idle stop and auto-wake transition is recorded as a persistent lifecycle event:

Event TypeWhen
idle_stop_attemptIdle monitor detects an agent past its timeout
idle_stop_successAgent successfully stopped
idle_stop_failedStop attempt failed (EC2 error)
wake_requestedAuto-wake initiated for a job/schedule
wake_blockedAuto-wake disabled by org policy
wake_readyAgent came online after wake
wake_timeoutAgent did not come online within timeout
wake_failedEC2 start command failed
manual_startUser-initiated start via UI/API
manual_stopUser-initiated stop via UI/API
manual_rebootUser-initiated reboot via UI/API

These events are visible in:

  • Agent Events SSE stream (GET /api/agents/{id}/events) as lifecycle events
  • Agent Logs endpoint (GET /api/agents/{id}/logs) merged into the unified log timeline
  • Agent View UI as status badges showing “Waking from idle” state

Agent View performance behavior

Agent View now separates critical status polling from heavy tab data fetches:

  • Live transition polling uses GET /api/agents/{id}/status (lightweight payload).
  • Recent runs load from GET /api/agents/{id}/runs.
  • Portals, functions, and script library data load on demand when tabs are opened.

This reduces DB load during provisioning/reboot transitions while keeping operator feedback near real time.

Verification checklist

  1. Enable auto-stop with a low timeout (e.g., 10 minutes) in dev.
  2. Wait for an idle agent to be stopped — confirm idle_stop_success event appears.
  3. Dispatch a run to the stopped agent — confirm wake_requested and wake_ready events.
  4. Check the agent view UI shows the “Waking from idle” badge during resume.
  5. Verify the run completes successfully after wake.

Watchdog behavior for stale runs

Background monitors enforce runtime limits:

  • jobs with stale heartbeats and exceeded max runtime are failed
  • hung worker process is killed on the VM
  • agent status is reset for subsequent dispatch
  • retry pipeline can requeue according to scheduling policy

Use this with RPA-first scripts to keep long-running automations deterministic and recoverable.

Session lock recovery (Windows)

Idle policy and session recovery solve different failure classes:

  • idle policy controls compute stop/start economics
  • session recovery prevents lock-screen induced mid-run termination

For guarded watchdog rollout and live patch procedures, see Windows Session Recovery.