[Cialug] Suse upgrades and grub issues

Matt Patterson matt at usrlocal.com
Thu Jul 24 12:28:00 CDT 2008


For the most part, we try to keep our systems up to date.  but with the 
rash of kernel update issues that we have been having, we've fallen behind 
on a few boxes which is something we want to resolve here shortly.

For the most part, we have autoyast files for the servers to quickly 
rebuild them if needed and have backups  on all of our servers to restore 
configuration files in timely fashion.  However, I would rather not waste 
a couple of hours rebuilding a box for an update that should just work.

Upon digging into some more of the logs, I have found the following from 
the /var/log/YaST2/y2log_bootloader from back in Feb 2008 when I had my 
first stage 2 error with this server.

# less y2log_bootloader


     GNU GRUB  version 0.97  (640K lower / 3072K upper memory)

  [ Minimal BASH-like line editing is supported.  For the first word, TAB
    lists possible command completions.  Anywhere else TAB lists the 
possible
    completions of a device/filename. ]
grub> setup --stage2=/boot/grub/stage2 (hd0) (hd0,1)

Error 21: Selected disk does not exist
grub> quit


I'm leaning more towards Yast/you failing to see this failure.

Now, when I have a "good" stage2 file, I get the following output:

grub> setup --stage2=/boot/grub/stage2 (hd0) (hd0,1)
  Checking if "/boot/grub/stage1" exists... yes
  Checking if "/boot/grub/stage2" exists... yes
  Checking if "/boot/grub/xfs_stage1_5" exists... yes
  Running "embed /boot/grub/xfs_stage1_5 (hd0)"...  18 sectors are 
embedded.
succeeded
  Running "install --stage2=/boot/grub/stage2 /boot/grub/stage1 (hd0) 
(hd0)1+18
p (hd0,1)/boot/grub/stage2 /boot/grub/menu.lst"... succeeded
Done.


I'm digging into the logs more and if I can find something, hopefully 
Novell/suse community will fix this oversight.  Or at least give me a 
better warning about the error so I can manually fix it before I shoot 
myself in the foot.

-Matt


  On Thu, 24 Jul 2008, Ken MacLeod wrote:

> On Thu, Jul 24, 2008 at 10:58 AM, Matt Patterson <matt at usrlocal.com> wrote:
>> I'm not sure how many people run SuSE on here but I seem to have a recurring
>> issue that is really starting to irk me.
>
> I don't run Open/SuSE so take this as more general info.
>
>> The issue happens when installing the kernel patches.  I would say that I
>> have a 90% failure rate with this.  Not with the installation of the kernel
>> itself, but the grub updates that happen in relation to the kernel updates.
>>  I have had issues where the menu.lst file is corrupted or the bigger, more
>> popular error is that the stage2 file becomes corrupt and when the server
>> reboots, it will go into a reboot loop.
>
> The biggest outstanding problem with package updates is the
> interaction between different packages during an update.  Between the
> kernel, grub, and the stage2 builder in this case, made worse because
> of how important it is to booting.
>
>> I've opened a ticket before with Novell for one of the SLES servers and
>> provided a b0rked stage2 file for them to look at.  Nothing really ever came
>> out of this incident since I was able to get the system up on my own.  For
>> the most part, I have been able to restore the servers by booting a CD in
>> rescue mode and untar a backup of the /boot directory and everything is fine
>> in the world.
>
> Yes, you shouldn't let Novell slip out of this because you've found a
> workaround, and don't let them get away with only giving you a manual
> workaround as a final solution.  Package updates should "just work".
>
>> Anyone have any ideas here?   I'm honestly thinking about removing grub from
>> all the servers and installing lilo and hoping that it resolves this issue.
>
> If your short term solution is to continue manually maintaining the
> bootloader, and you're more comfortable with lilo, then that might be
> a good solution for you.  I doubt YaST/package updates will be better
> with lilo since as I recall, Open/SuSE now uses grub by default so
> lilo wouldn't be as well maintained either.
>
>>  Though I think that the real underlying issue is just that YaST is not
>> handling an error condition correctly and moving on like nothing has
>> happened.
>
>>  But this is rather annoying when you are trying to patch
>> servers that are hundreds of miles away.
>
> The solution I've found best is to make sure all the systems are as
> consistent with each other as possible and you have a test server you
> can easily revert back to a known state at any time.  This way, when I
> run into these problems on my test server locally, then debug, revert
> to earlier state, and re-test, I'm more confident it'll work when I
> roll-out the change to more servers.
>
> In this case, you're probably looking at overwriting the menu.lst and
> stage2 with known good copies after you've done the update but before
> you reboot.
>
> Are you/can you use AutoYaST on the remote systems?  Re-autoinstalling
> is one of the simplest ways I've found to maintain lots of systems,
> along with change management tools like cfengine/puppet to maintain
> small ongoing changes.
>
> http://c2.com/cgi/wiki?SystemsAdministrationPractices
>
>  -- Ken
> _______________________________________________
> Cialug mailing list
> Cialug at cialug.org
> http://cialug.org/mailman/listinfo/cialug
>
>


More information about the Cialug mailing list