[Cialug] Replacing failed RAID 1 drive

Rob Cook rdjcook at gmail.com
Thu Oct 2 14:26:24 CDT 2014


I licked it clean, it is 'pixie dust' after all...

On Thu, Oct 2, 2014 at 2:14 PM, David Champion <dchamp1337 at gmail.com> wrote:

> That'll buff right out.
>
> -dc
>
> On Wed, Oct 1, 2014 at 5:46 PM, Rob Cook <rdjcook at gmail.com> wrote:
>
> > That's a failed drive right there... when I got home tonight it was a
> > screaming unholy racket. The last pic looks to be where the head had been
> > machining a groove in the platter by the spindle. There are chunks on the
> > platter and the entire drive is dirty.
> >
> > Popped in the new drive, booted up fine, rejoined the disk to the RAID
> via
> > mdadm after copying the partition scheme over from the existing drive,
> now
> > somewhere between the next 4-6 hours the array will be rebuilt. Next
> month
> > I'll get another new drive (I went from 1.5Tb to 3Tb on the replacement)
> > fail out the old drive and replace it with another 3Tb. Then expand the
> LVM
> > and *magic* twice the space... I hope ;)
> >
> >
> >
> >
> >
> >
> >
> > On Wed, Oct 1, 2014 at 10:38 AM, Rob Cook <rdjcook at gmail.com> wrote:
> >
> > > Well after today I may be able to give that demo. Lets hope that it all
> > > works properly.
> > >
> > > On Wed, Oct 1, 2014 at 8:39 AM, Matthew Nuzum <newz at bearfruit.org>
> > wrote:
> > >
> > >> This would make a great topic for a LUG meeting sometime. Bring a box
> > in,
> > >> set it up with RAID then fail and swap a drive.
> > >>
> > >> It's one of those things we all know we should do, and many of us do
> it,
> > >> but the actual hands-on experience of dealing with a failed drive is
> not
> > >> nearly as common. (I've never done in with software RAID, only done a
> > >> cold-swap with a hardware RAID controller)
> > >>
> > >> On Tue, Sep 30, 2014 at 5:35 PM, Rob Cook <rdjcook at gmail.com> wrote:
> > >>
> > >> > "When you say “LVM on top of RAID” I assume you mean something like
> > >> this:
> > >> >         /dev/sd* (physical block devices) => md0 (mdadm array) =>
> pv1
> > >> (LVM
> > >> > “physical” volume) => vg1 (LVM volume group) => lv1 (LVM logical
> > >> volume) =>
> > >> > /mnt/foo (filesystem)"
> > >> >
> > >> > Yes, like that exactly.
> > >> >
> > >> > Ok, off to buy a new drive.
> > >> >
> > >> > On Tue, Sep 30, 2014 at 5:25 PM, Zachary Kotlarek <
> zach at kotlarek.com>
> > >> > wrote:
> > >> >
> > >> > >
> > >> > > On Sep 30, 2014, at 3:07 PM, Rob Cook <rdjcook at gmail.com> wrote:
> > >> > >
> > >> > > > I have a CentOS 6.5 box with 2 1.5Tb drives in a RAID 1 with LVM
> > >> > > partitions
> > >> > > > on top of that. One of the drives /dev/sdb has failed.
> > >> > > >
> > >> > > > I've been googling quite a bit and I think that I should be ok
> > >> > following
> > >> > > > this guide:
> > >> > > >
> > >> > > > http://www.howtoforge.com/replacing_hard_disks_in_a_raid1_array
> > >> > > >
> > >> > > > Fail then remove the drive from the array, replace with similar
> or
> > >> > larger
> > >> > > > then recreate. The one question I have is what to do with the
> LVM
> > >> > > > partitons? Naively they should recreate given this is a RAID 1
> so
> > >> it's
> > >> > > the
> > >> > > > same data on both drives so I shouldn't have to worry. Or is
> that
> > to
> > >> > > > simplistic of a view?
> > >> > >
> > >> > >
> > >> > > When you say “LVM on top of RAID” I assume you mean something like
> > >> this:
> > >> > >         /dev/sd* (physical block devices) => md0 (mdadm array) =>
> > pv1
> > >> > (LVM
> > >> > > “physical” volume) => vg1 (LVM volume group) => lv1 (LVM logical
> > >> volume)
> > >> > =>
> > >> > > /mnt/foo (filesystem)
> > >> > >
> > >> > > If that’s the case then the LVM physical volume and everything
> > higher
> > >> in
> > >> > > the stack has no idea that you’re swapping disks and doesn’t need
> to
> > >> be
> > >> > > told anything.
> > >> > >
> > >> > > —
> > >> > >
> > >> > > On a related note, sometimes mdadm commands that reference
> physical
> > >> > > devices, like this:
> > >> > >         mdadm --manage /dev/md1 --fail /dev/sdb2
> > >> > > will fail with an error like:
> > >> > >         No such device: /dev/sdb2
> > >> > > because the file /dev/sdb2 no longer exists (because the disk is
> > dead
> > >> or
> > >> > > pulled).
> > >> > >
> > >> > > But you still need to tell mdadm about it so it can update the
> > array.
> > >> > > Instead you should use the short name:
> > >> > >         mdadm --manage /dev/md1 --fail sdb2
> > >> > > or whatever other device name shows up when you ask mdadm about
> the
> > >> array
> > >> > > or look at /proc/mdstat. That bypasses any device-file lookup and
> > uses
> > >> > the
> > >> > > references that mdadm tracks internally.
> > >> > >
> > >> > >         Zach
> > >> > >
> > >> > >
> > >> > > _______________________________________________
> > >> > > Cialug mailing list
> > >> > > Cialug at cialug.org
> > >> > > http://cialug.org/mailman/listinfo/cialug
> > >> > >
> > >> > >
> > >> > _______________________________________________
> > >> > Cialug mailing list
> > >> > Cialug at cialug.org
> > >> > http://cialug.org/mailman/listinfo/cialug
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Matthew Nuzum
> > >> newz2000 on freenode, skype, linkedin and twitter
> > >>
> > >> ♫ You're never fully dressed without a smile! ♫
> > >> _______________________________________________
> > >> Cialug mailing list
> > >> Cialug at cialug.org
> > >> http://cialug.org/mailman/listinfo/cialug
> > >>
> > >
> > >
> >
> > _______________________________________________
> > Cialug mailing list
> > Cialug at cialug.org
> > http://cialug.org/mailman/listinfo/cialug
> >
> >
> _______________________________________________
> Cialug mailing list
> Cialug at cialug.org
> http://cialug.org/mailman/listinfo/cialug
>


More information about the Cialug mailing list