[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ProgSoc] Programming activity - raidreconf



On Tuesday 21 February 2006 19:40, Nigel Sheridan-Smith wrote:
 ] Okay -- from what I can ascertain from trawling the source for only about
 ] 1/2 an hour:

 Nigel,

 Very much appreciate you having a look through the source.  As I say,
 it's more than my ageing brain can deal with comfortably.

 Also, as you say elsewhere, an understanding of how RAID5 arrays
 are striped would be a useful thing to have.  As is, apparently, a
 finer understanding of some of the optimisations they used when
 re-striping an array of x disks onto an array of x+y disks.

 I've attached a response I got from Jakob (the original author of
 the raidreconf application) for your erudition, below.

 I'm sure that the gifts could be recovered (from the 280MB core file)
 and the residual blocks that are yet to be ordered *should* be
 recoverable .. but maybe not, if there's no way to track how the
 app would have approached the original 4-disk array layout -- since
 that layout simply doesn't exist anymore, and you can't programmatically
 reverse-engineer the 5-disk array layout (partial) as it stands now,
 because the layout of it is  .. gawd, the phrase has gone.  Same
 term used to describe crypt() algorithms -- they only work in one
 direction, but I'm sure there was a fancier word for it.  {shrug}

 In any case .. more and more it's looking like the best bet is to
 force a re-create using the 5-disk layout config file, then run
 whatever reiserfsck util I can get my hands on, and then copy
 everything I can onto (well, therein lies a small problem) and then
 erase the contents of those disks and start again.

 Working out the integrity of several hundred avi's is going to
 take some time.  I'm sure there's some utilities out there that
 can go part way to doing that, at least.


 ----------->>>>>>>>>> begin forwarded message <<<<<<<<------------
From: Jakob
To: Jedd

>  Cool.  I was working towards that thought (I've done nothing
>  with the array as of yet - happily there's no big rush).  My
>  only concern there is all that stuff that raidreconf reports
>  after the re-striping is complete  - flushing, superblock updating,
>  fixing up of gifts/wishes/friends (?), more superblock updating,
>  and then updating it to the kernel .. none of that got a chance
>  to run, obviously, because of the core-dump.
> 
>  Is there a way to force that part of the process to happen, or
>  will mdadm do bits of that when it tries to start /dev/md0 ?

raidreconf has some data blocks (gifts) in cache (thus the size of the
core file) - those are lost. And there are some blocks on your array
that are still ordered for the old setup.  This is lost information.
Hopefully it will not cause too much filesystem corruption.

The superblocks are not updated. But if you (re-)create the array with
the new setup, the superblocks will be re-written and the kernel side of
things will then be ok.  What's missing is the data blocks that didn't
get moved, and I see no simple way of fixing that.

You would be well advised to run a fsck on the new array before mounting
it :)   We can only hope that fsck will fix up some filesystem meta-data
corruption, drop a few files, and that the rest of the data will be ok.
This depends a lot on the "robustness" of the filesystem and on fsck -
you said that raidreconf converted most of your blocks, so hopefully
most of your filesystem will be ok too. But there are no guarantees as
to where the fs chose to put your files...

> 
>  I'm going through the source, but it's fair to say that much of
>  it is beyond me.  It'd be nice to have a feature in there where you
>  could say 'start the process from block X' .. of course, this is the
>  same feature you've mentioned elsewhere, about having a log file
>  for resuming from crashes etc.

The problem is, that there is no simple linear conversion order in
raidreconf - you can't really "start from block X" because frequently
blocks are converted a little bit out-of-order. This is the way
raidreconf is implemented, because it allows for some rather funky
conversions where a linear conversion would require overwriting a block
that was not yet moved.

The downside, of course, is the difficulty of re-starting a conversion.
Ideally raidreconf would use a write-ahead-log on another disk to keep
track of things and be able to recover from a power cycle etc.

 ----------->>>>>>>>>> end forwarded message <<<<<<<<------------


-
You are subscribed to the progsoc mailing list. To unsubscribe, send a
message containing "unsubscribe" to progsoc-request@xxxxxxxxxxxxxxxxxxx
If you are having trouble, ask owner-progsoc@xxxxxxxxxxxxxxxxxx for help.