Linux Mint crashing often, need help how to debug

Hello,

I'm a software developer who does not have a background in Linux. I have been using a few flavors of Linux for a few years - but as a user, not a coder.

I am running Linux Mint 19.1 Cinnamon. I partially built a new machine about a year ago, and for a while it was working great. A few months ago, I started having many many crashes. I blamed it on Firefox, because that was about all I was using. However, as more time passes, along with upgrades to Firefox versions, I am now thinking the problem is deeper than that. Please don't focus on Firefox first - it's just the only thing that's running.

Crash Frequency: As short as literally one minute after reboot to maybe three days before a crash.
Symptoms:
) Sound might stop working before a crash.
) Firefox tabs will start crashing, but the program can still be used.
) No other programs may be launched - Firefox might still be usable, but I can't launch System Monitor or Terminal, or anything else - nothing happens.
) Firefox may become completely unresponsive.
) Any program may have an error accessing files - like file access has somehow been revoked.
) Firefox cannot quit successfully - the interface goes away, but if I try to relaunch, it is always still running.
) The computer cannot restart - using the menu to restart the system will result in the screen going dark, but the machine never shuts down, unless I hold the power button.

As I said, I do run Firefox a lot, and I have an enormous number of tabs open. But if it's firefox, I have two questions: Why does firefox never know that it has crashed - it keeps a list of its own crashes and tries to report them - there aren't any. Second, how does a single program take out the entire OS?

Isn't linux supposed to be a modern operating system? :slight_smile:

I don't have the first clue about how to start tracking this down - can anyone suggest tools I might use, or a course of action that might help figure it out?

If this post is more appropriate to another forum, please let me know.

Thanks in advance for your help. Following below is the output of System Info.

MisterAcoustic

=====================================

System Info:

System:    Host: Corei7 Kernel: 4.15.0-20-generic x86_64 bits: 64 compiler: gcc v: 7.3.0 
           Desktop: Cinnamon 4.0.10 wm: muffin dm: LightDM Distro: Linux Mint 19.1 Tessa 
           base: Ubuntu 18.04 bionic 
Machine:   Type: Desktop Mobo: ASRock model: B450M Pro4 serial: <filter> 
           UEFI [Legacy]: American Megatrends v: P3.30 date: 05/10/2019 
CPU:       Topology: 8-Core model: AMD Ryzen 7 3700X bits: 64 type: MT MCP arch: Zen 
           L2 cache: 4096 KiB 
           flags: lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm bogomips: 114982 
           Speed: 2195 MHz min/max: 2200/3600 MHz Core speeds (MHz): 1: 2197 2: 2196 3: 2191 
           4: 2193 5: 2179 6: 2195 7: 2196 8: 2195 9: 2195 10: 2192 11: 2194 12: 2193 13: 2196 
           14: 2196 15: 2196 16: 2196 
Graphics:  Device-1: NVIDIA GP108 [GeForce GT 1030] vendor: Gigabyte driver: nvidia v: 390.138 
           bus ID: 06:00.0 chip ID: 10de:1d01 
           Display: x11 server: X.Org 1.19.6 driver: nvidia 
           unloaded: fbdev,modesetting,nouveau,vesa resolution: 3840x2160~30Hz 
           OpenGL: renderer: GeForce GT 1030/PCIe/SSE2 v: 4.6.0 NVIDIA 390.138 direct render: Yes 
Audio:     Device-1: NVIDIA GP108 High Definition Audio vendor: Gigabyte driver: snd_hda_intel 
           v: kernel bus ID: 06:00.1 chip ID: 10de:0fb8 
           Device-2: AMD vendor: ASRock driver: snd_hda_intel v: kernel bus ID: 08:00.4 
           chip ID: 1022:1487 
           Sound Server: ALSA v: k4.15.0-20-generic 
Network:   Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet vendor: ASRock 
           driver: r8169 v: 2.3LK-NAPI port: f000 bus ID: 04:00.0 chip ID: 10ec:8168 
           IF: enp4s0 state: up speed: 1000 Mbps duplex: full mac: <filter> 
Drives:    Local Storage: total: 5.57 TiB used: 1.57 TiB (28.3%) 
           ID-1: /dev/sda model: SATA SSD size: 447.13 GiB speed: 6.0 Gb/s serial: <filter> 
           ID-2: /dev/sdb type: USB vendor: Seagate model: Backup+ Desk size: 4.55 TiB 
           serial: <filter> 
           ID-3: /dev/sdc vendor: Western Digital model: WD6400AAKS-75A7B2 size: 596.17 GiB 
           speed: 3.0 Gb/s serial: <filter> 
Partition: ID-1: / size: 425.65 GiB used: 73.30 GiB (17.2%) fs: ext4 dev: /dev/sda1 
           ID-2: swap-1 size: 13.67 GiB used: 0 KiB (0.0%) fs: swap dev: /dev/sda2 
Sensors:   Message: No sensors data was found. Is sensors configured? 
Repos:     No active apt repos in: /etc/apt/sources.list 
           Active apt repos in: /etc/apt/sources.list.d/freecad-maintainers-freecad-stable-bionic.list 
          REMOVED because forum software thought they we links
Info:      Processes: 309 Uptime: 3h 35m Memory: 15.66 GiB used: 2.29 GiB (14.6%) Init: systemd 
           v: 237 runlevel: 5 Compilers: gcc: 7.5.0 alt: 7 Client: Unknown python3.6 client 
           inxi: 3.0.27

Phew.....this one sounds complicated since it can crash in all sorts of different situations.

I'd say this could be a hardware fault or an O/S misconfiguration/corruption.

The first thing to do is to get confidence in the RAM memory. download and burn to CD/DVD a full memory test iso. Boot from that and run full pattern tests on your RAM. It will test (pretty much) every location of memory, reading and writing different patterns. Let it run for a few hours. Once that test is passed, confidence in RAM will be high.

Second, download and burn a Mint 'live' CD/DVD (or a close relative of Mint will do) and boot from that. From there you can test everything having booted from a different source which is not dependent on your hard disk root filesystem. You can use a browser (probably Firefox) for a few hours and see if that falls over. If not, the hardware is probably ok and compatible with Mint.

If that works then I reckon that you're looking for a configuration/corruption issue but, given the number of crashes you are listing, I wouldn't be surprised at all to find that you have faulty memory there. Perhaps re-seat all the memory sticks to ensure the contacts are good.

Anyway, let's not jump the gun here.

Hello hicksd8,

Thanks for your response. I will run a ram test and post the results. I have tried to run the built-in diagnostics a couple of times, but I always need to do something and don't have the patience for it to run a long time. I'll set aside some time to get a good test in.

One step at a time, here we go :).

Thanks again,
MisterAcoustic

Thanks, but that doesn't really address my question. I've been using Mint Cinnamon for around a year and a half, after switching away from Ubuntu Mate. Until recently, it's been fantastic. Cinnamaon suits me much better than Mate. And besides, there is currently no evidence that Mint has anything to do with it. That's to be determined.

If the answer to my question is 'install another distro' - well, suffice to say I hear echoes of people saying "reinstall windows..."

MisterAcoustic

Hi @MisterAcoustic

When I read about your problem I find your use of the word "crash" to be the first barrier to solving your problem.

"Crash" means what exactly?

It seems to me you are describing problems with your desktop GUI not working and not the underlying OS "crashing".

I am pretty sure, by your description, you can reboot from a generic terminal command line versus trying reboot from your GUI.

If fact what you describe are all GUI problems not underlying OS problems, if I understand your post and have not misread something in haste on mobile.

Why would a GUI start having problems?

Typically, the first place to look is your total RAM and what percentage of RAM is being used by the total of all your GUI processes.

I am replying from my mobile and could not see any description about your RAM. Kindly provide RAM details and some stats on RAM allocations and cache management, etc.

Thanks!

Hi Neo,

Well, crash might be a bit harsh of a description - how about 'unrecoverable errors'?

I cannot use a terminal for anything, because once I start noticing any of symptoms mentioned, normally, no other program can be launched - in other words, I can't open a terminal window to do anything, nor can I open System Monitor to see if anything changed about memory usage.

Under normal circumstances, with just the browser running, System Monitor reports that I am using about 3.5 gigabytes out of a total of 16 (listed as 15.7). I believe other times I have looked, possibly with more browser windows and tabs open, I may have seen 4 gig used.

Hopefully, what I mentioned above is what you meant by the ram stats, but I don't know what you mean regarding cache management - what kind of command or tool would I use to find that information?

I have not yet been able to run the memory test suggested earlier - but I am working on it.

Thank you for replying.

MisterAcoustic

PS - I believe I have tried leaving a terminal window open all the time. To the best of my recollection, I think when I tried to use it after the problems began, I ran into problems similar to the file access denied type of errors, when I tried to use commands. Don't hold me to that though, it's been a while.

Hi @MisterAcoustic

I advise you to get out of "GUI MODE" and work from only the console at this point in your investigation.

If you do not know how to boot and run your Linux OS in console mode, might I kindly suggest to you that this is a good time to learn :slight_smile:

FWIW, I never debug Linux systems from a GUI or Desktop app and always, without exception debug from the console.

I have now run one complete pass of Memtest86+ 5.01, with no errors. Second pass is underway.

Once I complete a second pass, I think I will first try the suggestion to run from a Mint live CD. At least I may see the problem occur if it is the GUI layer causing the problem.

I could try to run from a terminal only, but I don't know what I will learn if the problem doesn't occur.

Still having no idea what might be happening, I wanted to mention that I have tried to set up various remote/vnc type software in the past, and that may have messed with my graphics/video setup. For example, I'm not even sure that having 'x11 server' is normal for my setup. Whatever I did, I was just trying to make the remote software work. Let me know if this is an area we should look at.

It may take several days to see an issue crop up once I'm on a live CD, so I will post as soon as I know anything.

Thanks for the help so far.

MisterAcoustic

It sounds like you are progressing. I would agree that if you can do two full passes of memory test without error then the probability is that RAM is not the problem.

Next, the principle is to test the other hardware in the box such as to make the CPU work harder, test the graphics card, network interface, et al, using a known OS setup (i.e. the 'live' distribution). If the problem is hardware which as you say can fall over within one minute of booting, then using a 'live' DVD for an hour or so you should see the problem occur. That will also be true if you have some BIOS setting set wrong for Mint. If Firefox has been a catalyst for this problem then use that on 'live' as much as you can to try and make it happen.

If trouble occurs on 'live' then we need to track it down. If it doesn't then we're looking at the hard disk and/or the hard disk root filesystem. Have you checked the root filesystem recently?

Do post back your results and, by the way, welcome to this community.

Hi hicksd8,

The shortest time that it's ever happened in is one minute - but it may be once a day or not for several days. So I'm going to be stuck on the disk-based version for a while to see if it will happen. Perversely, in this case, it's not a good thing that it doesn't fail quickly :).

I think I tried to find and run something to look at my SSD, but I don't remember what, and I wasn't convinced it was testing anything useful - if you have a recommendation for how to test the disk and file system, I'd like to hear it.

Thanks for the welcome - it's nice to find a place where people are willing and able to help.

MisterAcoustic