Fixing a broken Linux upgrade without a full, current backup

Hi,

I was recently tasked with a recovery of a botched debian 9 upgrade.
Is there a quicker way of getting access to libraries and binaries from a specific release, than to perform an install on another machine and copying them over to the target machine?
And i know that the quickest way is probably to reinstall, but the question is more academic in nature

Also are binaries and libraries compatible throughout a major version?
For example, will libs from debian 9.3 work with debian 9.x ?

You may be better off just installing your Linux distribution on a new machine; and then moving the applications over to the new machine.

This is how I do major upgrades, BTW.

I do not "mess with" a working production system with a major distribution upgrade; but instead create a new distro from scratch and migrate the production system apps to the new box and test it before cutting it over to replace the older system.

HTH

1 Like

Didn't someone take a full backup immediately before pressing the button on the O/S upgrade?

Hi,

Your best bet, if you need to replace particular binaries and/or libraries and you know specifically which ones you need to replace, would probably be to boot from a Debian 9 Live CD/USB, mount your local storage, and copy over whatever you need to replace from the live system to your local storage.

But really, this is one of those "I wouldn't start form here" kind of problems. You might be able to patch together a broken system into something that works by doing this, but there's always the chance (and it's a pretty high chance in this situation) that there are any number of not-immediately-obvious problems and issues just waiting to bite you further down the road.

Now that the system is in a broken state, restoring it from its last-known-good backup would be the best thing you could do. That would get you back to a point in time where you know the system works and is stable, and you could then re-attempt the upgrade after addressing whatever issues caused the upgrade to fail in the first place. If you don't have backups, then...well, all you can really do is re-install (and I guess chalk this one up to experience).

These days, in-place upgrades are not generally the best way to upgrade a system - things are just too complex. Similarly to @Neo, I tend to "upgrade" a system by building a new one, and migrating all applications and data over from the soon-to-be-retired system to its new replacement. And if you absolutely must do an in-place upgrade for some reason (lack of replacement hardware or an inability to create a new VM or container, for instance), then having known-good backups that you are 100% certain you can restore from is a pre-requisite to even attempting such a thing.

2 Likes

Unfortunately, its is unlikely that @regexp made a current backup before the failed upgrade.

As mentioned, for major Linux upgrades it is better to build a new system from scratch and migrate the apps over to the new system and test.

It is not wise to perform live major upgrades on a production system. This is not how to avoid failures and downtime.

It is easier to test a staged new system than risk failures on live production systems.

1 Like

And, IMHO, despite the mess, take a backup of the system now and at least get a copy of all the user data at risk. You can do that using a 'live' CD/DVD if the system won't boot

The user data was backed up. So it wast mostly out of curiosity that i wanted to see if the system was salvageable...

https://wiki.debian.org/SystemDowngrade

Hmm.

Let me explain something important to you @regexp

First of all, not all systems are the same and so the system management (and risk management) is different. You need to share with us the exact application or applications running on this computer system. That information determines the correct approach to recovering from a breach of system integrity. So please answer this before we continue (and please everyone, stop replying until we get this piece of critical information).

Second, let's briefly touch on cybersecurity. IT security, including cybersecurity, is composed of three elements:

  1. Integrity
  2. Availability
  3. Confidentiality

Assuming your system is a production system with critical assets (please answer the question above about what exact application is running), your system is now in an unknown state because you attempted a major system upgrade and it failed and you did not perform a full system backup before you attempted the upgrade. This was a mistake. Just to let you know, I never do what you have done. For example, even on my desktop macOS workstation, I perform a full system backup before any OS upgrade because upgrades can and do fail and that is what backups are for. You made a mistake. Others make similar mistakes.

So, now that the system integrity has been breached. Your system integrity is in an unknown state. This is not good, obviously. @hicksd8 has suggested you to attempt to downgrade but I completely disagree (sorry Dennis). When you take a system which is in an unknown state and then run some automated process to downgrade, you run a high probability to increase the uncertainty of the system. This approach generally does not improve system integrity and it generally degrades it. By taking a system in an unknown state and attempting to use a third party software process to degrade the system to a prior version, you are increasing the uncertainty, not improving it.

To maintain system integrity, it is faster, better, cheaper and more secure to simply build a new box with the upgraded OS you desire and to migrate the application to the new OS and test it. This approach puts the system in a state of certainty and maintains system integrity. Downgrading broken upgrades do not put the system in a state of certainty, it lowers system integrity.

So, of course the proper upgrade approach @regexp would have been to insure you (using "you" generically, not personally) had a full and current backup before you attempted an upgrade, but "you" made the mistake of only backing up "user data" (whatever that means, please explain this before we proceed).

Finally, before dropping off so you can reply to my request for you to describe you application exactly, let me say that when you build a production application, you should segment the application from the operating system as much as possible. This means you, as the system admin, must build the system so that it is easy to move the application from one system to another by segmenting the application is it's own directory (location in the file system) to the extent possible. Then, when the system fails (as they eventually do), then you can boot up a new system, move the app and user data over, test the warm standby, and the cut-over the new system after testing.

So, before anyone jumps in with guesses and suggestions, please @regexp be so helpful as to describe exactly what application or application(s) are running on this system where you have breached the system integrity via a failed linux distribution upgrade without a full backup in place.

Thanks!

Note:

I'm not picking on you @regexp, I'm just being direct and trying to get to the root of this situation so we can help you before this evolves into a 100 reply discussion where everyone guesses at your solution.
According to our site, you have been a member here over 10 years, so you should know all these things I have stated above about full backups before system upgrades, the importance of file-system integrity, clearly stating what application is running on a host before seeking a solution, etc.

So, again, provide us with the application running on this host where you have beached the system integrity by performing a failed upgrade without a full backup in place.

Thanks again. We are waiting for your reply @regexp

1 Like

Yes, that is what you should do, as stated many times. Build a new system, migrate over the application (which you have not shared with us, please do) and test it before cutting it over.

This is the purpose of testing on a non-production system. You build a new system and the install and test the application to answer these types of questions! No one here can know the answer, with any degree of certainty, to this question! Shortcuts do not work. As a sys admin, your main goal is to insure file system integrity, not just "make it work".

Also note: When I use the pronoun "you", this is "you" generically speaking. So please do not take the use of this pronoun literally or defensively.

I appreciate the efforts of everybody to help, but you make this more serious than it is.
The server wasn't running anything critical, and the original plan was just to reinstall and be done with it.
As i stated before it was just purely out of curiosity that i wanted to check whether i would be able to recover the system to an operational state by accessing older libraries and binaries and putting them on there.

... and how can we know that, since you did not state these important details in your original question?

Yes, that is the correct approach to maintain system certainty and integrity.

This approach does does not help maintain system integrity, it degrades it. So, now you know the answer to your question :slight_smile:

... and how can we know that, since you did not state these important details in your original question?

Maybe i should have, my bad ..

Of course you should have. Technology, especially software, is about details, not guesswork. First, we define the problem or describe the situation, then we attempt to address the problem. Defining or describing the problem fully is normally around 80% of the solution, in most cases, at least in my experience over many decades.

The approach to maintaining system integrity for computer operating systems is based on risk management. If your system is not critical and is not a production system, then the answer / approach is different than a back-office banking application. (... and I would not have replied, to be honest, as I tend to focus on critical systems and cybersecurity).

How can we know this if you do not state the system details? When people post questions without the basic key details, discussions just go around in circles with people guessing at answers and solutions to a problem which is in the dark, so to speak.

This is a recurring problem here, BTW. We have a lot of great, kind, experienced and helpful people here who like to answer questions without knowing the problem clearly first. It's been like this since the beginning of unix.com but it seems to have gotten worse over the years.

... and you still did not state the application @regexp , you just said "it's not critical", but that's not really very informing... :slight_smile: Details matter.