Hello Everyone,
You will have to excuse me as I'm kind of Hyper-V stupid. I just started using Windows 8 Enterprise and playing with VMs myself. I love it. On my primary machine I made a backup of a hyper-V windows 7 VM. Now I can surf around, do
stupid things, and if my VM gets trashed, delete it and add the base back. That is my GOOD experience where I have control over things. Now for the question from my BAD experience:
We moved our local servers to a cloud based service company several months ago. The theory is that we are completely mobile now and users can connect using their laptops from wherever. At first, it worked great, but about 5 weeks ago we started
seeing our system slow to a crawl once or more times a day. Our setup:
- 1 file, DNS, RRAS, AD server (4 cores/8GB)
- 2 dedicated workstations (2cores/4GB each)
- 1 RDP server (8 cores/16GB)
Now what happens:
Systems run for awhile and then gets SLOW!!! 1 minute to click on START, 5 minutes to open a file, etc. I've checked the following:
- RAM - All systems have 50-90% available RAM
- CPU - CPUs idling at 0-15% (mostly from task manager & resource monitor)
- Processes - There are no bloated processes (biggest taking under 200MB)
- Processes - There are no processes pulling major CPU cycles
- Current Disk queue length goes between .01 and 1 (when it hits 1 or above and stays there, the system SLOWS down) I've seen it jump to 5 and 50 as well
Rebooting the server sometimes helps, but recently, after the reboot, things are worse than they were before the reboot and stay that way for 30 minutes to over an hour.
A note, and I'm not sure how to describe this, so I'm going to tell you how I get there:
- Go into resource monitor
- Select the Disk tab
- Click on the Disk Activity Section
- In the Disk Activity section, there are two colored boxes "Disk I/O" and "Highest Active Time"
When nothing is responding and the Disk Queue Length is stuck at 1 or above, The disk I/O sits at 0 for long periods of time and then after multiple minutes of sitting at 0, it will jump up to multiple MB/sec and the disk queue length will go down for a
couple of seconds and then it will spike back up and the Disk I/O box will stay at 0 again for several minutes. On my home computer running hyper-V, this disk activity box is constantly showing Disk I/O and my queue length seldom spikes and stays at
a high level.
Does anyone have any ideas? Anything I can ask the support guys to check into? Anything I can do to collect information? I'm stumped. I just found out today that they were running a Hyper-V based system and have little or no access
to configuration, etc. They are sure it's us. So far, I've:
- Removed anti-virus from our main server
- Disabled a service called PeachtreeSmartPosting because it was having errors (I think the errors are caused by the slowdown, they thought the slowdown was caused by the errors. I disabled the smartposting service and our server still went into it's death
spiral today)
- Removed Windows Search Service
- Rummaged through our event viewer: Some Disk errors under the system log: The driver disabled the write cache on device \Device\Harddisk0\DR0
and
some ESENT/Performance errors under the application log: (x): A request to write to the file <y> at offset ... succeeded, but took an abnormally long time (<some large value> seconds) to be serviced by the OS. This problem is likely due to faulty
hardware. Where the variable x is wuaueng.dll, svchost, Windows, and possibly others and variable y is datastore.edb, j50.chk, j50tmp.log, dhcp.mdb, edbtmp.log, edb.log, j500134D.log (in dhcp\backup\temp), MSS.log, windows.edb, and others. I pulled
the variables from the last 20 or so of these errors. In addition, between every 20 or 30 of the above errors, I get something like this:
DFSRs (1928) \\.\C:\System Volume Information\DFSR\database_28A4_16A0_A416_7094\dfsr.db: A request to write to the file "\\.\C:\System Volume Information\DFSR\database_28A4_16A0_A416_7094\fsr.log"
at offset 4155904 (0x00000000003f6a00) for 262656 (0x00040200) bytes succeeded, but took an abnormally long time (64 seconds) to be serviced by the OS. This problem is likely due to faulty hardware. Please contact your hardware vendor for further assistance
diagnosing the problem.
I've got no clue where to go next, and I think the support staff running this are just as baffled. Any thoughts, suggestions, etc. would be appreciated.
Thanks,
Jeff