Skip to content

CubeSandbox Core Operations Performance Benchmark Report (PVM Cloud Server)

1. Overview

CubeSandbox is designed for AI Agent code execution, where ultra-fast cold-start and high concurrency are the two most critical metrics. This post presents performance benchmark data measured on a Tencent Cloud standard CVM (running a PVM kernel), split into two parts:

  • Chapter 3: Create sandbox from Template — cold-start latency, concurrency scaling, single-host deployment density
  • Chapter 4: Snapshot operations — Snapshot creation, create-from-snapshot, Rollback, Clone

Every section includes the exact commands needed to reproduce the results on your own hardware.

Important: all benchmark numbers are highly dependent on the test environment and workload. Contributing factors include (but are not limited to) host CPU, memory, IO performance, and sandbox internal workload (e.g. the more complex the program running inside the sandbox and the more dirty pages generated, the longer snapshot creation takes). Please evaluate against your own hardware and workload when planning deployments.

Compared to the bare-metal benchmark report, this post uses a standard virtualized CVM (SA9.4XLARGE32) with fewer CPU cores and less memory, and can serve as a reference baseline for small-to-medium scale deployments.


2. Test Environment

2.1 Hardware

ItemDetail
MachineTencent Cloud Standard CVM SA9.4XLARGE32 (available for purchase from the Tencent Cloud console)
Availability Zone
OSOpenCloudOS 9.4
Kernel6.6.69-opencloudos9.cubesandbox.pvm.host
CPU ModelAMD EPYC 9K65
CPU Config1 Socket × 16 Core × 1 Thread = 16 logical cores
NUMA Nodes1 (node0: 0-15)
Total Memory32 GiB
System Disk/dev/vda 200 GiB Enhanced SSD cloud disk, formatted as XFS, mounted at /

SA9.4XLARGE32 is a Tencent Cloud ninth-generation standard instance powered by AMD EPYC 9K65 processors, suited for general-purpose computing. This post runs a PVM (Parallel Virtual Machine) kernel that supports nested virtualization, enabling CubeSandbox to run on an ordinary cloud server. To reproduce the tests in this post, visit the Tencent Cloud CVM purchase page to select the same model.

To install CubeSandbox, refer to the Quick Start Guide.

2.2 Sandbox Spec and Template Creation

All tests use sandboxes with the following spec:

ItemDetail
Spec2 vCPU / 2 GiB memory
Test Imagecube-sandbox-cn.tencentcloudcr.com/cube-sandbox/sandbox-code:latest
StorageCoW reflink (XFS, /data/cubelet/storage/)
Memory Trackingsoft-dirty (/proc/PID/clear_refs)

Build the template before running any tests (use cn registry in China, int elsewhere):

bash
cubemastercli tpl create-from-image \
  --image cube-sandbox-cn.tencentcloudcr.com/cube-sandbox/sandbox-code:latest \
  --writable-layer-size 1G \
  --expose-port 49999 \
  --expose-port 49983 \
  --probe 49999

After the build finishes, note the template ID:

bash
# List templates and grab the first tpl- prefixed ID
cubemastercli tpl list

2.3 Metric Definitions

MetricMeaning
avgMean across all rounds
minMinimum observed
p9595th percentile (95% of requests complete within this time)
maxMaximum observed
wallEnd-to-end elapsed time for the entire batch (first request sent → last one done); used in concurrency scenarios
perAmortized per-operation time (wall ÷ number of operations in the batch); used in concurrency scenarios

All times are in milliseconds (ms). A warm-up round is run before each scenario (results discarded) to eliminate page-cache cold-read noise. Concurrent test rounds run serially — no cross-round concurrency — to avoid mutual interference.


3. Create Sandbox from Template

This chapter measures the end-to-end time to start a ready-to-use sandbox — calling POST /sandboxes (with template_id) until the sandbox reaches running. This is the most common usage pattern.

3.1 Setup and Verification

Step 1: Install the Python SDK and set environment variables

bash
pip install e2b-code-interpreter

export E2B_API_URL=http://<your-server-ip>:3000
export E2B_API_KEY=e2b_000000           # any non-empty string for local deploys
export CUBE_TEMPLATE_ID=<your-template-id>  # from cubemastercli tpl list
export SSL_CERT_FILE=/root/.local/share/mkcert/rootCA.pem  # mkcert certificate path

Step 2: Run a Hello World to verify the environment

Before running any benchmarks, run the following script to confirm sandboxes can be created and execute code:

python
import os
from e2b_code_interpreter import Sandbox

with Sandbox.create(template=os.environ["CUBE_TEMPLATE_ID"]) as sandbox:
    result = sandbox.run_code("print('Hello from Cube Sandbox, safely isolated!')")
    print(result)
    print("✅ Environment verification passed — ready for benchmarking")

Save as hello.py and run:

bash
python hello.py

If you see ✅ Environment verification passed, CubeSandbox is deployed correctly and you can proceed. If it errors, refer to the Quick Start to troubleshoot.

3.2 Cold-Start Latency and Concurrency Scaling

Use the cube-bench tool to measure sandbox creation latency at different concurrency levels. cube-bench drives CubeAPI via Go goroutines and reports full percentile statistics.

Build (requires Go 1.21+):

bash
cd examples/cube-bench
make
# output: ./bin/cube-bench

Run:

bash
# Set environment variables
export E2B_API_URL=http://<your-server-ip>:3000
export E2B_API_KEY=e2b_000000
export CUBE_TEMPLATE_ID=<your-template-id>

# 1-concurrent, 20 total (create then immediately delete)
./bin/cube-bench -c 1 -n 20 -w 3

# 10-concurrent, 200 total
./bin/cube-bench -c 10 -n 200 -w 3

# 20-concurrent, 300 total
./bin/cube-bench -c 20 -n 300 -w 3

-w 3 runs 3 warm-up rounds whose results are discarded before measurement.

Results (Tencent Cloud SA9.4XLARGE32 PVM, 2 vCPU / 2 GiB sandbox):

ConcurrencyRequestsavgminP50P90P95P99max
12066.7 ms55.9 ms64.5 ms77.5 ms78.2 ms80.2 ms80.2 ms
10200170.9 ms85.4 ms168.5 ms206.4 ms216.7 ms286.1 ms323.5 ms
20300364.6 ms116.5 ms356.2 ms459.0 ms521.4 ms673.8 ms744.0 ms

Each tier is tested independently — all sandboxes are cleaned up and the resource pool is given time to recover between tiers to avoid interference. 100% success rate across all tiers.

Key findings:

  • Serial creation latency ~67 ms (min 55.9 / P95 78.2), extremely low and stable
  • At 10-concurrent, avg 171 ms — amortized per-sandbox just 17.1 ms, showing strong concurrency scaling
  • At 20-concurrent, avg 365 ms — amortized per-sandbox 18.2 ms, P99 674 ms reflects minor tail latency under queue pressure

3.3 Single-Host Deployment Density (Memory Overhead)

CubeSandbox uses kernel sharing and Copy-on-Write (CoW) to compress its per-instance overhead to extremely low levels. This section measures net per-instance cost by "clearing the machine → launching sandboxes in batches → recording memory changes."

⚠️⚠️⚠️ Important Safety Warning

Before each batch, always run free -h to confirm sufficient remaining memory. Launch only a small batch at a time, observe memory after each batch, and only proceed when safe — never launch too many at once! Running out of memory triggers OOM Killer, which at minimum kills processes and at worst corrupts the running environment, requiring redeployment. Decide batch sizes based on your machine's actual available memory.

Step 1: Record the baseline (empty machine memory)

bash
export E2B_API_URL=http://<your-server-ip>:3000
export E2B_API_KEY=e2b_000000
export CUBE_TEMPLATE_ID=<your-template-id>

# Ensure no leftover sandboxes
cubemastercli list

# Record empty-machine memory usage
free -h
# Also record shim process count (should be 0)
ps --no-headers -C containerd-shim-cube-rs | wc -l

Step 2: Launch sandboxes in batches, record memory with free -h after each batch

Use cube-bench in create-only mode to create sandboxes and keep them alive:

bash
# Set environment variables (same as §3.2; re-export if you open a new terminal)
export E2B_API_URL=http://<your-server-ip>:3000
export E2B_API_KEY=e2b_000000              # any non-empty string for local deploys
export CUBE_TEMPLATE_ID=<your-template-id> # from cubemastercli tpl list

./bin/cube-bench -c 1  -n 1  -m create-only && free -m   # cumulative: 1
./bin/cube-bench -c 4  -n 4  -m create-only && free -m   # cumulative: 5
./bin/cube-bench -c 5  -n 5  -m create-only && free -m   # cumulative: 10
./bin/cube-bench -c 10 -n 10 -m create-only && free -m   # cumulative: 20

Step 3: Calculate per-instance overhead

Per-VM amortized overhead = (current used - baseline used) ÷ VM count

Results (Tencent Cloud SA9.4XLARGE32 PVM, 2 vCPU / 2 GiB sandbox):

Live SandboxesSystem Available (MB)Per-VM Amortized Overhead
0 (baseline)25570 MB
125536 MB~34 MB
525436 MB~27 MB
1025252 MB~32 MB
2024990 MB~29 MB

Measured per-VM amortized overhead is approximately 27–34 MB. CoW on-demand allocation is clearly effective — a 2 GiB sandbox at idle uses only ~30 MB in practice.

Estimated single-host capacity (SA9.4XLARGE32, 32 GiB memory):

Total memory:                     32768 MB
System baseline usage (measured): 7198 MB  (= 32768 - 25570, from empty-machine available)
Safety headroom reserved (10%):   3276 MB
Available for sandboxes:         22294 MB  (= 32768 - 7198 - 3276)

Idle/light-load scenario (CoW on-demand allocation, ~30 MB amortized per sandbox):
  22294 ÷ 30 ≈ 743 sandboxes

Full-load scenario (each sandbox writes the full 2 GiB):
  22294 ÷ (2048 + 30) ≈ 10 sandboxes

4. Snapshot Operations

Snapshot is a core CubeSandbox feature, supporting memory + filesystem snapshots on running sandboxes that can be restored near-instantly (Clone / Rollback).

Install dependencies:

bash
cd examples/snapshot-rollback-clone
pip install -r requirements.txt   # installs the cubesandbox SDK

# The following environment variables are prerequisites for all 4.x benchmark scripts;
# export in each new shell (or write to env.sh and source it)
export CUBE_API_URL=http://<your-server-ip>:3000
export CUBE_TEMPLATE_ID=<your-template-id>          # from cubemastercli tpl list
export CUBE_PROXY_NODE_IP=<your-cubeproxy-ip>       # use 127.0.0.1 when running on the CubeProxy host
export CUBE_PROXY_PORT_HTTP=80                      # CubeProxy listen port (default 80)

Sections 4.1–4.5 below assume you have completed the above export in your current shell (scripts read these variables via env.py). Re-export if you open a new terminal.

4.1 Snapshot Creation vs Concurrency

How it works: calls POST /sandboxes/{id}/snapshots on a running sandbox. N concurrent requests target N independent sandboxes simultaneously, measuring wall time until all snapshots complete.

CubeSandbox serializes snapshot requests on a single sandbox internally, so the concurrency test targets N distinct sandboxes (one snapshot request per sandbox), and the actual success count equals the concurrency.

Run: (script: bench_snapshot_concurrency.py)

bash
cd examples/snapshot-rollback-clone
export CUBE_API_URL=http://<your-server-ip>:3000
export CUBE_TEMPLATE_ID=<your-template-id>
export CUBE_PROXY_NODE_IP=<your-cubeproxy-ip>
export CUBE_PROXY_PORT_HTTP=80

python bench_snapshot_concurrency.py -c 1  -n 5
python bench_snapshot_concurrency.py -c 5  -n 5 --no-header
python bench_snapshot_concurrency.py -c 10 -n 5 --no-header

Results (fresh sandboxes snapshotted as-is; measured dirty pages ~8 MB, confirmed by PagemapAnon snapshot saved in /data/log/CubeVmm/vmm.log; this is the sandbox baseline anonymous memory page size and is not a variable in this section):

ConcurrencyRoundswall avgwall minwall p95wall maxper-snapshot avg
1541.4 ms37.6 ms48.7 ms48.7 ms41.4 ms
5558.2 ms51.0 ms66.1 ms66.1 ms11.6 ms
105114.1 ms66.2 ms285.2 ms285.2 ms11.4 ms

Serial snapshot ~41 ms; at 5-concurrent, batch wall ~58 ms, per-snapshot amortized drops to ~11.6 ms; at 10-concurrent, batch wall ~114 ms, amortized further drops to ~11.4 ms — significant concurrency amortization.

4.2 Snapshot Creation vs Dirty Page Size

Background: CubeSandbox uses the soft-dirty mechanism to save only memory pages modified since the last snapshot. Actual write volume = dirty page count × 4 KiB, typically far less than total sandbox memory (2 GiB).

The test precisely controls dirty page size by pre-writing data to /dev/shm (tmpfs). The "Dirty Page" column shows actual bytes written as read from /data/log/CubeVmm/vmm.log — it differs from the theoretical write size due to Guest OS background activity.

Run: (script: bench_snapshot_dirty.py)

bash
cd examples/snapshot-rollback-clone
export CUBE_API_URL=http://<your-server-ip>:3000
export CUBE_TEMPLATE_ID=<your-template-id>
export CUBE_PROXY_NODE_IP=<your-cubeproxy-ip>
export CUBE_PROXY_PORT_HTTP=80

python bench_snapshot_dirty.py -d 0    -n 3
python bench_snapshot_dirty.py -d 10   -n 3 --no-header
python bench_snapshot_dirty.py -d 50   -n 3 --no-header
python bench_snapshot_dirty.py -d 100  -n 3 --no-header
python bench_snapshot_dirty.py -d 200  -n 3 --no-header
python bench_snapshot_dirty.py -d 500  -n 3 --no-header
python bench_snapshot_dirty.py -d 800  -n 3 --no-header
python bench_snapshot_dirty.py -d 1024 -n 3 --no-header

Tests run in serial mode; one warm-up is discarded before each data point, then 3 measured runs are averaged. The "create sandbox avg" column shows the time to create a new sandbox from that snapshot, reflecting whether dirty page size affects restore speed.

Results:

Write SizeDirty Pagesnapshot avgsnapshot minsnapshot p95snapshot maxcreate sandbox avgcreate sandbox mincreate sandbox p95create sandbox max
0 MB8.3 MB42.1 ms37.6 ms45.9 ms45.9 ms71.6 ms65.4 ms77.7 ms77.7 ms
10 MB41.2 MB55.3 ms54.1 ms56.6 ms56.6 ms73.1 ms60.4 ms82.5 ms82.5 ms
50 MB122.6 MB67.7 ms66.5 ms69.6 ms69.6 ms70.3 ms63.9 ms81.4 ms81.4 ms
100 MB195.2 MB85.7 ms82.5 ms88.7 ms88.7 ms68.3 ms62.3 ms71.6 ms71.6 ms
200 MB296.8 MB100.9 ms98.5 ms102.6 ms102.6 ms65.9 ms62.7 ms71.2 ms71.2 ms
500 MB602.6 MB168.6 ms165.4 ms172.9 ms172.9 ms68.1 ms54.5 ms75.7 ms75.7 ms
800 MB908.3 MB215.8 ms212.1 ms217.6 ms217.6 ms68.1 ms60.9 ms79.1 ms79.1 ms
1024 MB1136.3 MB257.5 ms251.2 ms267.6 ms267.6 ms62.3 ms56.5 ms69.6 ms69.6 ms

Key findings:

  • Snapshot creation time scales near-linearly with dirty page size: baseline (8.3 MB dirty) ~42 ms, +~22 ms per 100 MB of additional dirty data, ~258 ms at 1024 MB
  • Create-from-snapshot time is independent of dirty page size: stable at 54–83 ms regardless of snapshot size, because restore uses CoW (copy-on-write) on-demand loading and does not depend on dirty page size

4.3 Create Sandbox from Snapshot

How it works: creates a snapshot first, then launches N sandboxes concurrently via POST /sandboxes (with snapshot_id), measuring end-to-end wall time until all sandboxes reach running.

Run: (script: bench_create_concurrency.py)

bash
cd examples/snapshot-rollback-clone
export CUBE_API_URL=http://<your-server-ip>:3000
export CUBE_TEMPLATE_ID=<your-template-id>
export CUBE_PROXY_NODE_IP=<your-cubeproxy-ip>
export CUBE_PROXY_PORT_HTTP=80

python bench_create_concurrency.py -c 1  -n 3
python bench_create_concurrency.py -c 10 -n 3 --no-header
python bench_create_concurrency.py -c 20 -n 3 --no-header

Results:

Concurrencyn totalRoundswall avgwall minwall p95wall maxper-sandbox avg
11366.7 ms65.8 ms68.3 ms68.3 ms66.7 ms
10103387.9 ms364.4 ms420.3 ms420.3 ms38.8 ms
20203701.3 ms660.5 ms742.4 ms742.4 ms35.1 ms

Single sandbox startup ~67 ms; at 10-concurrent, wall ~388 ms, amortized just 38.8 ms/sandbox; at 20-concurrent, wall ~701 ms, amortized just 35.1 ms/sandbox — demonstrating good concurrency scaling.

4.4 Rollback

How it works: calls POST /sandboxes/{id}/rollback on running sandboxes to restore memory and filesystem state in-place to the specified Snapshot, without recreating the sandbox.

Snapshot ownership constraint: CubeSandbox only allows a sandbox to roll back to a checkpoint it created itself. Therefore each concurrent sandbox must independently complete the full "snapshot + rollback" flow.

Run: (script: bench_rollback_concurrency.py)

bash
cd examples/snapshot-rollback-clone
export CUBE_API_URL=http://<your-server-ip>:3000
export CUBE_TEMPLATE_ID=<your-template-id>
export CUBE_PROXY_NODE_IP=<your-cubeproxy-ip>
export CUBE_PROXY_PORT_HTTP=80

python bench_rollback_concurrency.py -c 1  -n 5
python bench_rollback_concurrency.py -c 5  -n 5 --no-header
python bench_rollback_concurrency.py -c 10 -n 5 --no-header

Results:

ConcurrencyRoundswall avgwall minwall p95wall maxper-rollback avg
1590.0 ms82.0 ms98.3 ms98.3 ms90.0 ms
55325.5 ms322.9 ms329.4 ms329.4 ms65.1 ms
105821.4 ms778.7 ms858.1 ms858.1 ms82.1 ms

Single Rollback flow ~90 ms; at 5-concurrent, batch wall ~326 ms, per-rollback amortized drops to ~65 ms; at 10-concurrent, batch wall ~821 ms, amortized ~82 ms/rollback.

Note: Because CubeSandbox requires sandboxes to roll back only to their own checkpoints, shared snapshots cannot be reused — each concurrent sandbox must independently complete the full "snapshot + rollback" flow.

4.5 Clone

How it works: calls POST /sandboxes/{id}/clone to fork N new sandboxes from a running source sandbox, fully preserving the source's memory and filesystem state (including dirty pages).

Note: disk files in this test were already in Page Cache; results exclude cold-read IO overhead.

Run: (script: bench_clone_concurrency.py)

bash
cd examples/snapshot-rollback-clone
export CUBE_API_URL=http://<your-server-ip>:3000
export CUBE_TEMPLATE_ID=<your-template-id>
export CUBE_PROXY_NODE_IP=<your-cubeproxy-ip>
export CUBE_PROXY_PORT_HTTP=80

python bench_clone_concurrency.py -n 1  -c 1  --rounds 5
python bench_clone_concurrency.py -n 10 -c 5  --rounds 3 --no-header
python bench_clone_concurrency.py -n 20 -c 10 --rounds 3 --no-header

Results (source sandbox dirty pages ~10 MB):

ScenarionConcurrencyRoundswall avgwall minwall p95wall maxper-clone avg
1 sandbox, 1-concurrent115270.6 ms260.8 ms280.5 ms280.5 ms270.6 ms
10 sandboxes, 5-concurrent1053541.6 ms522.9 ms557.7 ms557.7 ms54.2 ms
20 sandboxes, 10-concurrent20103789.7 ms757.2 ms815.3 ms815.3 ms39.5 ms

Single sandbox clone ~271 ms; 10 sandboxes at 5-concurrent, batch wall ~542 ms, per-clone amortized drops to ~54 ms; 20 sandboxes at 10-concurrent, batch wall ~790 ms, amortized further drops to ~40 ms/sandbox — significant concurrency amortization.

4.6 Pause / Resume

How it works: Creates concurrency sandboxes, pauses all of them concurrently via POST /sandboxes/{id}/pause, then resumes all concurrently via POST /sandboxes/{id}/resume. Records wall time and per-sandbox amortized latency for both operations.

⚠️ Current implementation note: Pause currently uses full-memory-copy mode — on pause, all anonymous memory pages of the sandbox are written to persistent storage. Latency scales linearly with sandbox memory size (~371 ms per sandbox at 2 GiB on PVM). A future release will upgrade to soft-dirty incremental mode, which only saves pages dirtied since the last checkpoint. For an idle sandbox this is expected to reduce pause latency by 80–90% — significantly.

Run: (script: bench_pause_resume_concurrency.py)

bash
cd examples/snapshot-rollback-clone
export CUBE_API_URL=http://<your-server-ip>:3000
export CUBE_TEMPLATE_ID=<your-template-id>
export CUBE_PROXY_NODE_IP=<your-cubeproxy-ip>
export CUBE_PROXY_PORT_HTTP=80

python bench_pause_resume_concurrency.py -c 1  -n 5
python bench_pause_resume_concurrency.py -c 10 -n 5 --no-header

Pause results:

ConcurrencyRoundswall avgwall minwall p95wall maxper-pause avg
15370.8 ms351.0 ms384.0 ms384.0 ms370.8 ms
1051586.0 ms1529.5 ms1679.8 ms1679.8 ms158.6 ms

Resume results:

ConcurrencyRoundswall avgwall minwall p95wall maxper-resume avg
1518.9 ms9.5 ms32.8 ms32.8 ms18.9 ms
10526.6 ms19.3 ms39.9 ms39.9 ms2.7 ms

Key findings:

  • Resume is extremely fast with excellent concurrency scaling: single resume ~19 ms; at 10-concurrent, per-resume amortized just 2.7 ms/sandbox
  • Pause is the current bottleneck: in full-copy mode, single pause ~371 ms, 10-concurrent per-pause amortized 158.6 ms/sandbox
  • After soft-dirty mode lands: pause latency is expected to drop significantly, with 10-concurrent per-pause falling into single-digit milliseconds

full-copy → soft-dirty optimization: The current full-copy mode writes up to 2 GiB of VM anonymous memory to disk on every pause, creating high IO pressure. The soft-dirty incremental mode tracks dirty pages via /proc/PID/clear_refs since the last checkpoint; pause only writes actually modified pages (typically a few MB for an idle sandbox), reducing pause latency by 80–90% and significantly increasing high-concurrency throughput.


Appendix: Benchmark Script Index

All benchmark scripts used in this post are located in the repository directories:

Have an article about CubeSandbox you'd like to share?Contribute on GitHub →