Tag Archives: Oracle Linux

Oracle Linux 7 – KVM Console is broken, OCFS2 kicks ass though

This post was originally written for spiceworks, so ignore the formatting shortcomings and such. I have chosen to just throw it on my blog since I’ve found a valid workaround to the problem, that being use OL6.

I have a farm of KVM servers running CentOS 7.2. I have procured some hardware that is best utilized with a cluster aware filesystem such as GFS2 or OCFS2. I started fiddling with GFS2, got a functional cluster on some spare hosts and everything worked – poorly. I was going from NFS on a LACP bond assembled from gigabit Ethernet to GFS2 on a functioning MPIO with 4 gigabit Ethernet links. I tested with XFS and the performance was as expected – nominal number of IOPS based upon number of spindles and ~450MB/s of read and write throughput. Moving to GFS2 the IOPS and throughput were both there, but also some latency spikes of up to 30 seconds which is obviously unacceptable. I tried tuning for awhile and couldn’t massage out the bugs.

I crapped that setup out and installed Oracle Linux to try O2CB/OCFS2. Setup was great and performance looks vastly improved. The problem with the configuration was that KVM domains seem to be fully functional, but nothing I’ve tried will allow me to see the consoles of the running domain. I start a domain and look at a console (tried virt-viewer, virt-manager on Linux, and the builtin Gnome Remote client) and all I see is a black screen. I also tried VNC as an alternative, something I’ve never done so I don’t know if that should *just work* but it didn’t either. Lastly I tried a few different video modes, qxl, vga, cirris, nothing changed anything. The only logs I know of to look at are `/var/log/libvirt/qemu/domain.log` – which when trying to connect to the domain look like this:

    main_channel_link: add main channel client
    main_channel_handle_parsed: net test: latency 12.194000 ms, bitrate 146799512 bps (139.998924 Mbps)
    red_dispatcher_set_cursor_peer:
    inputs_connect: inputs channel client create

I compared that to a good connection on another machine – it looks like this:

    main_channel_link: add main channel client
    main_channel_handle_parsed: net test: latency 3.128000 ms, bitrate 69543957 bps (66.322286 Mbps)
    red_dispatcher_set_cursor_peer:
    inputs_connect: inputs channel client create

To complicate but fix things I was hoping reinstalling and using CentOS 7.2 which I’m more familiar with and has obviously all different packages would solve the issue. I did just that – fresh CentOS install and then installed nothing but UEK and OCFS2 tools so I could proceed with libvirt/KVM tools I knew that worked, but continue testing the OCFS2 filesystem:

    [root@kvmhost images]# yum list installed | grep ol7
    kernel-uek.x86_64                     3.8.13-118.14.1.el7uek         @ol7_UEKR3
    kernel-uek-devel.x86_64               3.8.13-118.14.1.el7uek         @ol7_UEKR3
    kernel-uek-firmware.noarch            3.8.13-118.14.1.el7uek         @ol7_UEKR3
    libdtrace-ctf.x86_64                  0.5.0-2.el7                    @ol7_UEKR3
    ocfs2-tools.x86_64                    1.8.6-7.el7                    @ol7_latest
    ocfs2-tools-devel.x86_64              1.8.6-7.el7                    @ol7_latest

So the interesting takeaway from this configuration is that it worked no differently when using UEK, but if I boot into the Base CentOS kernel it works fine. To be clear … I boot into the system choosing `CentOS Linux (3.8.13-118.14.1.el7uek.x86_64) 7 (Core)` at the prompt and the problem persists, but if I reboot the system and choose `CentOS Linux (3.10.0-327.el7.x86_64) 7 (Core)` then things work proper.

Reading just that, one would assume the problem is simply a situation where the userland components were compiled with other kernel libraries and find a new problem to think about – however I have another system running that was installed from an OL7 iso and never had anything but OL7 packages instaled exhibiting exact behavior.

The only other place I could think to look for anything useful was the virt-manager logs. I installed enough X11 components so that I could get on it graphically and make some local logs from that app, here is what came from that:

Not working

    [Sat, 05 Nov 2016 14:55:04 virt-manager 15862] DEBUG (details:602) Showing VM details: 
    [Sat, 05 Nov 2016 14:55:04 virt-manager 15862] DEBUG (engine:357) window counter incremented to 2
    [Sat, 05 Nov 2016 14:55:04 virt-manager 15862] DEBUG (console:650) Starting connect process for proto=spice trans= connhost=127.0.0.1 connuser= connport= gaddr=127.0.0.1 gport=5900 gtlsport=None gsocket=None
    [Sat, 05 Nov 2016 14:55:04 virt-manager 15862] DEBUG (console:771) Viewer connected

Working:

    [Sat, 05 Nov 2016 15:06:49 virt-manager 3917] DEBUG (details:602) Showing VM details: 
    [Sat, 05 Nov 2016 15:06:49 virt-manager 3917] DEBUG (engine:357) window counter incremented to 2
    [Sat, 05 Nov 2016 15:06:49 virt-manager 3917] DEBUG (console:650) Starting connect process for proto=spice trans= connhost=127.0.0.1 connuser= connport= gaddr=127.0.0.1 gport=5900 gtlsport=None gsocket=None
    [Sat, 05 Nov 2016 15:06:49 virt-manager 3917] DEBUG (console:771) Viewer connected

Googling ‘Spice blank screen UEK’ yields nothing helpful that I can see so for posterity’s sake and perhaps in hope that someone else will have the problem and *CAN* indeed file a bug with Oracle, this was my experience.