Saturday, September 17, 2016

xorg - Ubuntu 18.04 error on waking up from sleep : Read-error on swap device




After the laptop has been in sleep mode for a couple of hours,when I try and resume my session I get the following error:




Read-error on swap device




It takes about 30 seconds for the login screen to load post this happening. Once I log in, the screen blanks out once or twice for a second, and none of my programs are open anymore. I get a "System problem detected" prompt. When I click on "Send Report", another notification pops up saying:




Sorry the program "Xorg" closed unexpectedly. Your computeer does not have enough free memory to automatically analyze the problem and send a report to the developers.





What I have tried so far is to increase the swap space available. It was around 2 GB initially and I created another swap file of 9 GB. This hasn't helped. The occupied swap space ( as per the swapon command ) after the crash is always around 170MB.



The DMESG for when I resume my session, up to the read error on swap device is as follows:



    
[64046.474054] ACPI: Low-level resume complete
[64046.474162] ACPI: EC: EC started
[64046.474162] PM: Restoring platform NVS memory

[64046.475139] Enabling non-boot CPUs ...
[64046.475196] x86: Booting SMP configuration:
[64046.475196] smpboot: Booting Node 0 Processor 1 APIC 0x2
[64046.475663] cache: parent cpu1 should not be sleeping
[64046.475859] CPU1 is up
[64046.475910] smpboot: Booting Node 0 Processor 2 APIC 0x4
[64046.476330] cache: parent cpu2 should not be sleeping
[64046.476506] CPU2 is up
[64046.476539] smpboot: Booting Node 0 Processor 3 APIC 0x6
[64046.477071] cache: parent cpu3 should not be sleeping

[64046.477255] CPU3 is up
[64046.477274] smpboot: Booting Node 0 Processor 4 APIC 0x1
[64046.477721] cache: parent cpu4 should not be sleeping
[64046.477922] CPU4 is up
[64046.477947] smpboot: Booting Node 0 Processor 5 APIC 0x3
[64046.478371] cache: parent cpu5 should not be sleeping
[64046.478571] CPU5 is up
[64046.478591] smpboot: Booting Node 0 Processor 6 APIC 0x5
[64046.479018] cache: parent cpu6 should not be sleeping
[64046.479229] CPU6 is up

[64046.479247] smpboot: Booting Node 0 Processor 7 APIC 0x7
[64046.479675] cache: parent cpu7 should not be sleeping
[64046.479899] CPU7 is up
[64046.485913] ACPI: Waking up from system sleep state S3
[64046.639206] ACPI: EC: event unblocked
[64046.639711] sd 2:0:0:0: [sda] Starting disk
[64046.873289] usb 1-11: reset full-speed USB device number 2 using xhci_hcd
[64046.976869] ata4: SATA link down (SStatus 4 SControl 300)
[64046.976892] ata2: SATA link down (SStatus 4 SControl 300)
[64047.149289] usb 1-6: reset high-speed USB device number 40 using xhci_hcd

[64047.437370] psmouse serio1: synaptics: queried max coordinates: x [..5660], y [..4570]
[64047.476302] psmouse serio1: synaptics: queried min coordinates: x [1364..], y [1284..]
[64047.922603] OOM killer enabled.
[64047.922605] Restarting tasks ... done.
[64047.928727] thermal thermal_zone1: failed to read out thermal zone (-61)
[64047.930036] Bluetooth: hci0: Bootloader revision 0.0 build 2 week 52 2014
[64047.935036] Bluetooth: hci0: Device revision is 5
[64047.935037] Bluetooth: hci0: Secure boot is enabled
[64047.935038] Bluetooth: hci0: OTP lock is enabled
[64047.935038] Bluetooth: hci0: API lock is enabled

[64047.935039] Bluetooth: hci0: Debug lock is disabled
[64047.935040] Bluetooth: hci0: Minimum firmware build 1 week 10 2014
[64047.935042] Bluetooth: hci0: Found device firmware: intel/ibt-11-5.sfi
[64047.944372] PM: suspend exit
[64048.050329] Read-error on swap-device (8:0:1543400288)
[64048.460888] [drm] RC6 on


Please let me know if any other information is needed.




The Ubuntu 18.04 kernel you are currently using is missing a rather important bug fix.



The fix for this is already present in the upstream Linux kernel version 4.16.8. (The suspend bug effectively started happening in kernel version 4.15). Ubuntu only needs to cherry-pick this small patch from upstream. The bug frequently causes Xorg crashes immediately after suspend, i.e. it crashes the whole graphical login session.



Note this bug often happens without showing Read-error on swap device. Most of the time, there was no error in the kernel log. (A few times, it showed EXT4-fs error and Buffer I/O error instead). Also, these error messages could be caused by a hardware failure instead. When diagnosing this problem, please focus on other, more distinct details.



A test kernel is available at the end of this Ubuntu bug, i.e. in this comment: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1776887/comments/5



So far no-one has reported their results from suspending with the Ubuntu test kernel. It might be that if someone can report success, it will encourage the Ubuntu developer to finally include the bug fix. I could be wrong though, I'm not 100% sure what's holding this up.




There is also a known workaround. You can avoid the crash if you configure the kernel command line to include the option scsi_mod.scan=sync.



https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1776887







This upstream bug has been confirmed to affect Ubuntu users[1]. As per
the fix commit (below), the most frequent symptom is a crash of
Xorg/Xwayland, i.e. killing the entire GUI, when a laptop is woken

from system sleep. Frequency of the bug is described as once every few
days[2].



[1] E.g. this user confirms the bug & very specific workaround:
https://bugs.launchpad.net/ubuntu/+source/xorg-server/+bug/1760450/comments/11



[2] E.g. this log of crashes:
https://bugzilla.redhat.com/show_bug.cgi?id=1553979#c23



This is a bug in blk-core.c. It is not specific to any one hardware

driver. Technically the suspend bug is triggered by the SCSI core -
which is used by all SATA devices.



The commit also includes a test which quickly and reliably proves the
existence of a horrifying bug.



I guess you might avoid this bug only if you have root on NVMe. The
other way to not hit the Xorg crash is if you don't use all your RAM,
so there's no pressure that leads to cold pages of Xorg being swapped.
Also, you won't reproduce the Xorg crash if you suspend+resume

immediately. (This frustrated my tests at one point, it only triggered
after left the system suspended over lunch :).



Fix: "block: do not use interruptible wait anywhere"



in kernel 4.17:
https://github.com/torvalds/linux/commit/1dc3039bc87ae7d19a990c3ee71cfd8a9068f428



in kernel 4.16.8:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-4.16.y&id=7859056bc73dea2c3714b00c83b253d4c22bf7b6




lack of fix in 4.15.0-24.26 (ubuntu 18.04):
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/bionic/tree/block/blk-core.c?id=Ubuntu-4.15.0-24.26#n856



I.e., this bug is still present in Ubuntu source package
linux-4.15.0-24.26 (and 4.15.0-23.25). I attach hardware details
(lspci-vnvn.log) of a system where this bug is known to happen.



Regards Alan




WORKAROUND: Use kernel parameter: scsi_mod.scan=sync



No comments:

Post a Comment

11.10 - Can't boot from USB after installing Ubuntu

I bought a Samsung series 5 notebook and a very strange thing happened: I installed Ubuntu 11.10 from a usb pen drive but when I restarted (...