Hello!
We have just set up couple of clusters:
- One cluster for VMs: 7 nodes 6x1Gbps NetAdapters (team), 160 shared disks in JBODs, connected through LSI SAS Switch (disks are in mirrored storage spaces)
- One cluster for SoFS: 2 nodes, 2x1Gbps (team) + 2x10Gbps NetAdapters, 60 shared disks in JBOD (disks are in mirrored storage spaces too).
OS on hosts and guests is Win 2012 R2. Guests are Generation1 VMs.
Before using them in production and decide, which variant is better/cheaper/easier to deploy/etc, we want to test these scenarios:
- IO performance inside VM when virtual disk located on CSV, Direct Mode
- IO performance inside VM when virtual disk located on CSV, Redirected Mode
- IO performance inside VM when virtual disk located on SoFS
We expected moderate or, maybe, worse performance (especially in last two scenarios), bus unexpectedly it was blazing-fast and just theoretically unreachable. We have spent almost a week, searching for any explanations with no success :(
So, we're testing virtual SCSI disk (500 Gb, dynamic) with SQLIO.
First - disk on CSV, Direct mode (VM host and owner of CSV is the same node)
random write 8Kb IO:
IOs/sec: 239856.70
MBs/sec: 1873.88
Avg_Latency(ms): 0
This CSV is two-way mirror created on pool of 24x2.5 600Gb SAS disks. I just can't believe in 250K random 8kb write IOPS on this array. Other hdd performance measurement tools report same counters.
Then, when I move CSV to another node, leaving VM itself where it was (as I understand - this forces use of redirected mode for operations on that CVS).
Almost nothing changes in IOPS/throughput counters, but I see network activity on VM host and CSV owner. However, it is just 60-80 Mbps!
Finally, the SoFS.
random write 8Kb IO:
IOs/sec: 125737.93
MBs/sec: 982.32
Avg_Latency(ms): 0
SQLIO shows almost 1Gigabytes/s throughput, but Network usage on SoFS node is just 160-180 Mbit/s.
Using sqio with 256KB sequential writing IO shows 5829.57 MB/s, but just 1Gbps of network activity (it's a limit, because 10Gbps adapters and SMB multichannel were not set up).
I've tried to test SoFS share and CSV directly from host and got more realistic results:
random write 8Kb IO on \\sofs1\share1\test\testfile.dat:
IOs/sec: 4931.03
MBs/sec: 38.52
Avg_Latency(ms): 18
random write 8Kb IO on C:\ClusterStorage\pool1vd1\test\testfile.dat
IOs/sec: 2206.29
MBs/sec: 17.23
Avg_Latency(ms): 42
So, can anybody explain these strange results, when performing IO inside VM?
I can run any other test or post any specific performance counters/screenshots. I would like to understand, what’s happening before moving to production.
Thanks in advance,
Best regards, Dmitry.