Reproducible Laboratory Comparative Study of VPS vs Dedicated

Executive summary

VPS and Dedicated differ less in “average speed” than in latency predictability (p95/p99) and resilience to multi-tenant interference (“noisy neighbor”), which tends to be most visible in storage I/O and networking.
A key diagnostic indicator of CPU contention on VPS is CPU steal time (visible in /proc/stat, top, vmstat), reflecting CPU time taken by the hypervisor in favor of other guests.
This article proposes a reproducible benchmark protocol (fio/sysbench/iperf3/pgbench) with ≥10 repeated 60-second runs plus long runs (1–6 hours, recommended 4h) with telemetry (iostat/vmstat/sar/pidstat).
The Results section provides a demonstration dataset and charts that are explicitly marked as simulated, constructed to reflect published patterns of variability in shared environments (tail events, interference, and diagnostic value of %steal).
We show how to compare p95/p99 statistically via bootstrap confidence intervals for differences of medians and the Mann–Whitney U test, and how to report effect sizes (Cliff’s delta / probabilistic dominance).
Practical implication: for IO-bound systems (databases, logs, persistent queues), disk tail latency predominates; for CPU-bound workloads, %steal is critical; for network-bound services, jitter/p99 RTT plus port throughput are decisive.

Introduction

In infrastructure hosting, comparing VPS and Dedicated is more rigorous when framed as a comparison of resource-sharing models and their resulting latency distributions, rather than a direct comparison of “cores and gigabytes.”
In large and distributed services, rare latency spikes and tail latency events can disproportionately affect user experience; eliminating variability sources completely—especially in shared environments—is usually impractical.
Multi-tenancy can cause performance interference: a “noisy neighbor” may degrade other tenants’ metrics, and shared storage/network subsystems can show noticeable variability even when average throughput looks acceptable.
Therefore, reproducible comparisons should publish not only throughput/IOPS but also p50/p95/p99 latency, plus system contention signals such as %steal and %iowait, and time dynamics in long runs.

Problem, hypotheses, and assumptions

Problem statement

Goal: define a reproducible protocol that allows a hosting provider (or customer) to compare VPS and Dedicated service classes under representative configurations and workloads, then interpret results in terms of tail latency, variability, and root causes (CPU scheduling, I/O contention, network behavior).

Research questions

RQ1: Do VPS and Dedicated differ in p95/p99 disk latency for database/log-relevant profiles (fsync-write)?
RQ2: How different is network variability (RTT/jitter) and throughput (iperf3) between VPS and Dedicated?
RQ3: Do %steal and %iowait explain observed tail-latency spikes and degradations?

Operational hypotheses

H1: Dedicated shows a lower median p99 latency for synchronous writes (fsync) and lower temporal spread of p99 than shared VPS.
H2: VPS exhibits non-zero %steal correlated with increased p99 latency (host-level CPU contention as a tail-latency factor).
H3: A premium VPS with guaranteed resources approaches Dedicated in tail latency and %steal but may still lag under shared storage/network contention.

Representative benchmark configurations

When providers/tariffs are unspecified, we evaluate these representative classes (treated as classes of setups, not vendor products):

small VPS: 1 vCPU, 2 GB RAM, SSD (shared CPU)
medium VPS: 2 vCPU, 4 GB RAM, NVMe (shared CPU)
premium VPS: 4 vCPU, 8 GB RAM, NVMe + “guaranteed resources” (assumption: reduced oversubscription/CPU pinning and disk QoS)
dedicated small: 4 cores, 16 GB RAM, 1×NVMe, 1G
dedicated large: 8 cores, 32 GB RAM, 2×NVMe, 10G

Explicit assumptions (fixed for reproducibility)

OS/kernel: Ubuntu Server 24.04 LTS, Linux kernel 6.8.x.
CPU governor: performance (reduce DVFS-related variance during tests).
I/O scheduler:
- NVMe: none
- SATA/SAS SSD (small VPS “SSD” class): mq-deadline
Filesystem/mount: ext4 with noatime,nodiratime.
VPS oversubscription is unknown and not measured directly; %steal is used as a proxy for host-level CPU contention.
VPS storage model assumed as a shared pool; hence heavier latency tails are expected than on single-tenant local NVMe.

Materials and methods

What exactly is compared

This study interprets “VPS vs Dedicated” as a comparison of two deployment architectures:

VPS as a guest OS under a hypervisor (typical stack: KVM + paravirtualized virtio devices).
Dedicated as an OS running directly on physical hardware.

Tools and outputs

fio: IOPS/BW and completion latency percentiles (clat), exported in JSON for aggregation.
sysbench: modular benchmark for CPU and memory.
iperf3: throughput (JSON output with -J).
pgbench: database-class benchmark (TPS and transaction latency statistics).
sysstat stack (sar/iostat/pidstat): telemetry and per-process attribution.

Mandatory contention metrics:

%steal (CPU time desired by the guest but not granted due to host scheduling).
%iowait (CPU idle while waiting for I/O completion).

Benchmark profiles and exact parameters

fio profiles (storage)

We use these fixed fio profiles (block size, queue depth, and sync semantics are explicit):

randread: 4k, iodepth=32
randwrite: 4k, iodepth=32
randwrite + fsync: 4k, iodepth=1, direct=1, fsync=1

Percentiles are reported as p50/p95/p99 of completion latency (clat).

sysbench profiles (CPU/memory)

sysbench cpu
sysbench memory

Network profiles

iperf3 -t 60 -J
ping -i 0.2 -c 300 and compute p50/p95/p99 RTT from raw output.

Experimental design and telemetry

Short runs: 60 seconds, ≥10 repeats per test and configuration.
Long runs: 1–6 hours, recommended 4 hours, tracking time series of p99 and system metrics.
Optional warm-up exclusion: first 5–10 seconds may be excluded (especially for TCP slow-start in iperf3).
Telemetry capture cadence (example): 1-second iostat/vmstat/sar sampling into log files.

flowchart LR
  A[Provision VPS/Dedicated host] --> B[Fix OS/kernel and CPUFreq/I-O scheduler settings]
  B --> C[Install tools: fio, sysbench, iperf3, pgbench, sysstat]
  C --> D[Short runs: 60s × 10 repeats (disk/network/CPU)]
  C --> E[Long run: 1–6 hours (recommended 4h) with p99 time series]
  D --> F[Collect telemetry: iostat/vmstat/sar/pidstat and %steal]
  E --> F
  F --> G[Parse JSON/logs into unified CSV]
  G --> H[Compute p50/p95/p99 + statistical analysis]
  H --> I[Publish tables + CDFs + time-series figures]

Results

Important note on data provenance

Live measurements are not possible in this environment, so the results below are simulated as a publication-ready example of how to report and analyze VPS vs Dedicated. The simulation is constructed to reflect published qualitative patterns: heavier tails and “rare spikes” in shared environments, storage/network interference, and the diagnostic role of %steal.

Simulation model (explicit). To generate plausible tails and spikes:

Per-run latencies are drawn from a heavy-tailed distribution (e.g., lognormal or mixture-of-lognormals) calibrated to target percentiles.
Long-run p99 time series are produced by a baseline process plus rare spikes (e.g., a Poisson shock process) to emulate intermittent contention, with spike probability/scale higher for shared VPS classes than for Dedicated.
%steal is generated as a correlated variable that increases in spike windows to reflect host-level contention coupling.

Disk results (fio) — simulated, medians across 10 runs

Table 1. fio profiles (simulated; medians across 10 runs). lat p50/p95/p99 are completion latency (clat) percentiles.

Workload (fio)	Configuration	IOPS (median)	BW (MiB/s, median)	lat p50 (ms)	lat p95 (ms)	lat p99 (ms)	CPU steal (%, median)	CPU iowait (%, median)
randread 4k, iodepth=32	vps_small	17,800	69.5	0.352	1.630	3.821	3.884	—
randread 4k, iodepth=32	vps_medium	64,234	250.9	0.203	0.963	2.217	1.876	—
randread 4k, iodepth=32	vps_premium	90,128	352.1	0.151	0.631	1.311	0.623	—
randread 4k, iodepth=32	ded_small	156,443	611.1	0.121	0.465	0.879	0.043	—
randread 4k, iodepth=32	ded_large	261,811	1022.7	0.100	0.412	0.781	0.024	—
randwrite 4k, iodepth=32	vps_small	12,606	49.2	0.457	2.233	5.703	3.463	—
randwrite 4k, iodepth=32	vps_medium	45,396	177.3	0.283	1.318	3.091	1.837	—
randwrite 4k, iodepth=32	vps_premium	71,426	279.0	0.202	0.850	1.763	0.648	—
randwrite 4k, iodepth=32	ded_small	120,956	472.5	0.161	0.622	1.188	0.048	—
randwrite 4k, iodepth=32	ded_large	201,886	788.6	0.141	0.569	1.087	0.033	—
randwrite 4k + fsync, iodepth=1	vps_small	556	2.2	2.048	18.373	60.267	3.673	11.339
randwrite 4k + fsync, iodepth=1	vps_medium	880	3.4	1.021	9.378	34.889	1.746	7.361
randwrite 4k + fsync, iodepth=1	vps_premium	1,438	5.6	0.606	3.255	9.046	0.705	3.135
randwrite 4k + fsync, iodepth=1	ded_small	2,225	8.7	0.402	1.538	2.865	0.053	1.552
randwrite 4k + fsync, iodepth=1	ded_large	2,605	10.2	0.351	1.224	2.145	0.025	1.047

Interpretation: the fsync profile (bottom five rows) is the most revealing for shared environments, because synchronous writes expose tail latency due to queueing, background activity, and shared contention—consistent with the “rare spikes” thesis for service quality.

Figure 1 (simulated): CDF of disk latency (fsync write).

Figure 2 (simulated): 4-hour time series of p99 fsync-write latency.

Network results — simulated, medians across 10 runs

Table 2. RTT and iperf3 metrics (simulated; medians across 10 runs). Ping percentiles p50/p95/p99 are used to capture jitter and tail RTT, and iperf3 JSON output supports programmatic aggregation.

Metric	vps_small	vps_medium	vps_premium	ded_small	ded_large
ping RTT p50 (ms)	0.904	0.804	0.703	0.601	0.551
ping RTT p95 (ms)	2.651	2.084	1.543	1.222	1.010
ping RTT p99 (ms)	5.043	3.516	2.323	1.718	1.339
iperf3 TCP throughput (Gbps, median)	0.860	0.962	1.264	0.903	9.113
iperf3 TCP retransmits (median, per 60s)	161	119	76	60	27
CPU steal during network test (%, median)	3.227	2.139	0.662	0.052	0.036

Practical interpretation: in multi-tenant infrastructure, it is insufficient to report bandwidth alone; p95/p99 RTT and jitter (tail behavior) matter because they affect “chattiness” of distributed services (many short RPCs/requests).

Figure 3 (simulated): CDF of ping RTT.

Statistical analysis

Metrics and formulas

Latency percentiles. For an observed latency sample ( {x_1,\dots,x_n} ), the (p)-th percentile (Q_p) is the value such that approximately (p%) of observations are (\le Q_p). In practice, percentiles are computed by a defined quantile estimator (report which implementation is used).

IOPS and bandwidth relationship. For fixed block size (b) (bytes), the relationship is:
[ \text{BW} \approx \text{IOPS} \times b ] or equivalently
[ \text{IOPS} \approx \frac{\text{BW}}{b} ] When BW is reported in MiB/s and (b) in bytes, unit conversion is required.

Hypothesis testing methodology (bootstrapping + nonparametric test)

The protocol uses:

Percentile bootstrap for the difference of medians (e.g., ( \Delta = \text{median}(p99)_A – \text{median}(p99)_B )).
- Resample each group with replacement (B) times (e.g., (B=10{,}000)), compute (\Delta^{(b)}).
- The 95% confidence interval is ([\Delta_{2.5%}, \Delta_{97.5%}]).
Mann–Whitney U test (two-sample rank test) as a nonparametric test of distributional shift (“tends to be larger”). Report: p-value and a probabilistic dominance interpretation (MW parameter).
Effect size: Cliff’s delta (dominance). A convenient interpretation is:
[ \delta = P(X>Y) – P(X<Y) ] where (X) comes from group A and (Y) from group B.

Example statistical results (simulated)

Table 3. Comparison using 10 independent runs (simulated). Reported: Δ median, bootstrap CI, bootstrap p-value, Mann–Whitney p-value, MW probabilistic dominance ( \varphi ), and Cliff’s delta ( \delta ).

Comparison	Metric	median(A)	median(B)	Δ median(A−B)	95% bootstrap CI for Δ	p_bootstrap	p_MW	MW ϕ	Cliff’s δ
vps_medium vs ded_small	fsync lat p99 (ms)	34.889	2.865	32.025	[31.073; 32.558]	4.00e-05	1.83e-04	1.000	1.000
vps_premium vs ded_small	fsync lat p99 (ms)	9.046	2.865	6.181	[6.070; 6.934]	4.00e-05	1.83e-04	1.000	1.000
vps_medium vs ded_small	RTT p99 (ms)	3.516	1.718	1.798	[1.746; 1.843]	4.00e-05	1.83e-04	1.000	1.000
vps_medium vs vps_premium	CPU steal (%, fsync)	1.746	0.705	1.041	[0.751; 1.540]	4.00e-05	1.83e-04	1.000	1.000

Interpretation of MW ϕ and Cliff’s δ: both are dominance-style measures; values close to 1 indicate strong separation where group A tends to produce higher values than group B (here: higher latency or higher steal).

Discussion and threats to validity

Why tail latency matters more than averages

For many online services, “rare” slow events become a material fraction of user-visible requests at scale (tail-at-scale effect). Variability sources—queues, background activity, shared-resource contention—are difficult to eliminate entirely.

Why I/O often breaks predictability first

In multi-tenant infrastructure, shared storage and shared services are prone to interference and higher variability. Databases are particularly sensitive to disk behavior: interference and contention translate into application-level response-time variance and SLA degradation.

%steal as a VPS differentiator

Steal time is CPU time needed by the guest but not provided by the host because resources were allocated elsewhere; it is observable in /proc/stat and standard monitoring tools.
Operationally: if tail latency and %steal rise together while guest CPU load appears “normal,” the bottleneck is likely host-level CPU contention (typical under oversubscription). This is a concrete engineering input for choosing between shared VPS, “guaranteed” VPS, and Dedicated.

Limitations and threats to validity

Simulated results vs live measurements. The Results values are not tied to any specific data center; they demonstrate reporting structure and typical effects from the literature (tail variability, multi-tenant interference, and %steal diagnostic value).

Instrumentation validity. fio percentile computation depends on internal accounting and can vary with logging/aggregation mode; therefore, live studies should fix fio version, output format (JSON), and preserve raw artifacts.

Construct validity (synthetic vs production). fio/sysbench/iperf3/pgbench capture important aspects but do not replace profiling of real applications; they remain useful as “minimal reproducible” tests for comparing infrastructure classes.

External validity (transferability). Results from one region/storage class/network may not transfer to others; multi-tenant background noise varies over time, so tests should be repeated in different time windows and reported via distributions/percentiles rather than single numbers.

Practical recommendations, decision checklist, conclusion, and appendix

Practical recommendations (SLO/risk language)

If the target service is IO-bound (databases, logs, queues with fsync, heavy writes), the key KPI is p99 latency under randwrite+fsync. For stable p99 requirements, Dedicated or premium VPS with explicit CPU/I/O guarantees is often justified.
If the service is CPU-bound (rendering, compilation, pure compute, high-QPS processing), diagnose %steal: persistent steal indicates shared-host CPU contention and supports choosing a guaranteed profile or Dedicated.
If the service is network-bound (proxy/edge, highly “chatty” microservices), publish p95/p99 RTT and jitter plus iperf3 throughput/retransmits because network tail latency degrades distributed-system behavior.

Decision checklist (engineering questions to ask)

CPU: oversubscription constraints/guarantees, CPU pinning/“guaranteed vCPU,” typical %steal for the SLA class.
Storage: local NVMe vs shared pool, QoS for IOPS/latency, availability of latency metrics/graphs.
Network: port speed (1G/10G), rate-limit/PPS policy, expected p95/p99 RTT intra-region, isolation guarantees.
Observability: permission to run sysstat tooling and collect iostat/sar/pidstat; constraints on benchmarking.

Conclusion

A journal-style VPS vs Dedicated publication should be centered on latency distributions and their sources: tail latency (p95/p99) and temporal variability are more informative than averages, while multi-tenant interference and CPU contention can be diagnosed via system metrics such as %steal.
The proposed protocol (repeats + long runs + telemetry + bootstrap/Mann–Whitney analysis) enables reproducible and comparable reporting across tariff classes without tying conclusions to specific providers.

Appendix: reproducibility pack (bash) and table templates with filled examples

Below is a minimal bash command set to reproduce the protocol on any Linux host; parameters are explicit. Percentile interpretation follows fio documentation and standard “X% of operations complete within ≤ value” semantics.

Environment preparation

bash# OS: Ubuntu/Debian (example)
sudo apt-get update
sudo apt-get install -y fio sysbench iperf3 postgresql-contrib sysstat procps

# Record OS and tool versions for the protocol
uname -a
lsb_release -a || cat /etc/os-release
fio --version
sysbench --version
iperf3 --version
pgbench --version

# CPU governor
for c in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do
  echo performance | sudo tee "$c"
done

# I/O scheduler: inspect and (if needed) set
lsblk -d -o NAME,ROTA,TYPE,SIZE,MODEL
cat /sys/block/nvme0n1/queue/scheduler 2>/dev/null || true
echo none | sudo tee /sys/block/nvme0n1/queue/scheduler 2>/dev/null || true

fio short runs (60s × 10 repeats)

bash
TARGET_DIR=/mnt/bench
mkdir -p "$TARGET_DIR"

# 1) randread 4k, iodepth=32
fio --name=randread4k \
  --directory="$TARGET_DIR" --filename=fiofile \
  --rw=randread --bs=4k --iodepth=32 --numjobs=1 \
  --direct=1 --size=4G --runtime=60 --time_based \
  --group_reporting --output-format=json \
  > fio_randread4k.json

# 2) randwrite 4k, iodepth=32
fio --name=randwrite4k \
  --directory="$TARGET_DIR" --filename=fiofile \
  --rw=randwrite --bs=4k --iodepth=32 --numjobs=1 \
  --direct=1 --size=4G --runtime=60 --time_based \
  --group_reporting --output-format=json \
  > fio_randwrite4k.json

# 3) randwrite 4k + fsync, iodepth=1
fio --name=fsyncwrite4k \
  --directory="$TARGET_DIR" --filename=fiofile \
  --rw=randwrite --bs=4k --iodepth=1 --numjobs=1 \
  --direct=1 --fsync=1 --size=2G --runtime=60 --time_based \
  --group_reporting --output-format=json \
  > fio_fsyncwrite4k.json

Long run (recommended 4h) + telemetry

bash
# 1-second telemetry (example)
iostat -x 1 > iostat_1s.log &
IOSTAT_PID=$!
vmstat 1 > vmstat_1s.log &
VMSTAT_PID=$!
sar -u -r -d 1 > sar_1s.log &
SAR_PID=$!

# 4 hours = 14400 seconds
fio --name=fsyncwrite4k_long \
  --directory="$TARGET_DIR" --filename=fiofile \
  --rw=randwrite --bs=4k --iodepth=1 --numjobs=1 \
  --direct=1 --fsync=1 --size=2G --runtime=14400 --time_based \
  --group_reporting --output-format=json \
  > fio_fsyncwrite4k_4h.json

kill $IOSTAT_PID $VMSTAT_PID $SAR_PID

Network (iperf3 + ping)

bash
# On server:
iperf3 -s

# On client:
iperf3 -c <SERVER_IP> -t 60 -J > iperf3_tcp_60s.json
ping -i 0.2 -c 300 <TARGET_IP> > ping_300.log

pgbench (optional, database-class signal)

bash
createdb pgbench
pgbench -i -s 10 pgbench

pgbench -c 10 -j 4 -T 60 -P 10 -l pgbench > pgbench_run1.log
pgbench -c 10 -j 4 -T 60 -r pgbench > pgbench_run1_percmd.log

Table templates (filled example structure, not empty)

Table A1. Host specification report (example layout with the representative configurations used in this paper).

Parameter	small VPS	medium VPS	premium VPS	dedicated small	dedicated large
CPU	1 vCPU (shared)	2 vCPU (shared)	4 vCPU (guaranteed)	4 cores	8 cores
RAM	2 GB	4 GB	8 GB	16 GB	32 GB
Storage	SSD (SATA class)	NVMe	NVMe + QoS	1×NVMe	2×NVMe
Network	1G class	1G class	1G class	1G	10G
OS/kernel	Ubuntu 24.04 / 6.8.x	same	same	same	same
Governor	performance	performance	performance	performance	performance
I/O scheduler	mq-deadline	none	none	none	none

Table A2. fio reporting template (example populated with the fsync p99 line from Results).

Profile	Configuration	IOPS	BW (MiB/s)	clat p50 (ms)	clat p95 (ms)	clat p99 (ms)	%steal	%iowait
randwrite 4k + fsync qd1	vps_medium	880	3.4	1.021	9.378	34.889	1.746	7.361
randwrite 4k + fsync qd1	ded_small	2,225	8.7	0.402	1.538	2.865	0.053	1.552