Monday 9 September 2013

Fun with OpenSuse, VMWare and Firefox

Just rebuilt one of my VMWare workstation machines: AMD FX 8120, 16GB RAM, dual SSD boot drives etc. with OpenSuse 12.2. I always lag an OS version with VMWare as Workstation host OS support takes a little while to "bed in", in my expereince.

All is well until I fire up Firefox and start to experience huge lag spikes where the whole UI seizes up for 20-30 seconds at a time. I can CTRL-ALT-Fn to another console and text mode is fine but X/KDE is locked solid.

Looking at top shows that VM's, Firefox, kwin and khugepaged have pegged their respective CPU cores (or 4 cores in the case of the VM's) with little or no disk, swap or RAM activity. Killing firefox drops everything back to normal so I start to look online for reports of weird interactions between firefox (or flash/java within firefox) with OpenSuse and VMWare. Nothing.

Typing khugepaged into Google, however, was a bit of a revelation. Lots of reports of CPU stalls, 100% utilisation etc. with high core/RAM counts. I wouldn't have called 8-cores/16GB high in this day and age - at work I use 48-core/256GB VM hosts and they're getting to the end of their support lifetime already. However, my previous 4-core/12GB box did not have this problem.

To cut a long story short, it appears to be a problem with khugepaged attempting to defrag RAM to make space for the huge pages. For now I have just disabled defragging with:

echo 0 > /sys/kernel/mm/transparent_hugepage/khugepaged/defrag
echo never > /sys/kernel/mm/transparent_hugepage/defrag

...why do they take completely different parameters when they have the same name? Logic, please.

...it seems that later kernel vesions have this fixed, so I may have been bitten by my "lag an OS version" principle above. Ho hum.