Discussion:
Possible bug in mkfs.ext3
jd1008
2014-09-20 01:56:37 UTC
Permalink
I am reporting this on the advice of the Fedora Users Mailing List Member.

This the mailing list exchange outlining the problem with specifying -S
to mkfs,
and it's subsequent consequences when fsck is run.


I am reporting this per suggestions made to me on the Fedora Users
Mailing List.
Is there any other tool that can extract files from a partition that
seems to have corrupted superblocks?
I tried dumpe2fs, and fsck -b <blockNumber>
to no avail. Tried all available block numbers that are listed
when original mkfs was done, and it's output was saved.
None of the blocks seem to work - all of them have invalid magic.
Verify that the partition table still appears to be correct. If it
is pointing to the wrong starting location, none of the super blocks
will appear in the expected places. You might see if /testdisk/can
find any intact super blocks.
Consider using a hex editor to look at some of the super blocks.
They should contain the same data. The data that actually appears
there might give some clue as to what happened.
As a last ditch recovery effort, run mke2fs/mke3fs with the "-S"
option to initialize the super blocks and group descriptors only.
Do this only with (or on) a backup copy of the partition, since
it is potentially destructive. Then see if /debugfs/can make
sense of the filesystem, and if so, run /fsck/with the "-f"
option to repair the metadata.
I ran mkfs.ext3 -S /dev/sdc7
then ran fsck.ext3 -y /dev/sdc7
it blew away EVERYTHING :)
Back to square one and re-dd original to test drive
and start over.
Ouch! That _used_ to work. Trying it just now, "mke3fs -S" seems
to clear a substantial portion of the inodes, which the manpage
specifically says it should _not_ do, and then /fsck/ completes the
destruction by moving all of the remaining inodes to lost+found.
Sorry about that.
Can raise a bug against it?
Chances are this is an upstream bug, or a misunderstanding. You should post your reproduce steps to the ext4 list, what you expect to happen based on man page, and what actually happens.
http://vger.kernel.org/vger-lists.html#linux-ext4
Chris Murphy
Andreas Dilger
2014-09-20 06:07:30 UTC
Permalink
Post by jd1008
I am reporting this on the advice of the Fedora Users Mailing List Member.
This the mailing list exchange outlining the problem with specifying -S to mkfs, and it's subsequent consequences when fsck is run.
I am reporting this per suggestions made to me on the Fedora Users Mailing List.
I would say that "mke2fs -S" is going to lead to worse corruption rather
than improving the situation in 999 times of 1000. It should only be
used by someone who knows very specific details of the filesystem and
how it was corrupted. I'm tempted to make it an "undocumented" feature,
since I suspect it will do more harm than good in most cases. "-S"
should at least call check_plausibility() and proceed_question() before
clobbering the filesystem.

Better would be something like the "findsuper" utility in the e2fsprogs
sources (attached here for your conveniece). Usually in cases like this
the problem is actually something with the partition table, and not that
all of your backup superblocks have mysteriously been corrupted at the
same time.

Cheers, Andreas
Post by jd1008
Is there any other tool that can extract files from a partition that
seems to have corrupted superblocks?
I tried dumpe2fs, and fsck -b <blockNumber>
to no avail. Tried all available block numbers that are listed
when original mkfs was done, and it's output was saved.
None of the blocks seem to work - all of them have invalid magic.
Verify that the partition table still appears to be correct. If it
is pointing to the wrong starting location, none of the super blocks
will appear in the expected places. You might see if /testdisk/can
find any intact super blocks.
Consider using a hex editor to look at some of the super blocks.
They should contain the same data. The data that actually appears
there might give some clue as to what happened.
As a last ditch recovery effort, run mke2fs/mke3fs with the "-S"
option to initialize the super blocks and group descriptors only.
Do this only with (or on) a backup copy of the partition, since
it is potentially destructive. Then see if /debugfs/can make
sense of the filesystem, and if so, run /fsck/with the "-f"
option to repair the metadata.
I ran mkfs.ext3 -S /dev/sdc7
then ran fsck.ext3 -y /dev/sdc7
it blew away EVERYTHING :)
Back to square one and re-dd original to test drive
and start over.
Ouch! That _used_ to work. Trying it just now, "mke3fs -S" seems
to clear a substantial portion of the inodes, which the manpage
specifically says it should _not_ do, and then /fsck/ completes the
destruction by moving all of the remaining inodes to lost+found.
Sorry about that.
Can raise a bug against it?
Chances are this is an upstream bug, or a misunderstanding. You should post your reproduce steps to the ext4 list, what you expect to happen based on man page, and what actually happens.
http://vger.kernel.org/vger-lists.html#linux-ext4
Chris Murphy
_______________________________________________
Ext3-users mailing list
https://www.redhat.com/mailman/listinfo/ext3-users
Cheers, Andreas
jd1008
2014-09-21 00:01:35 UTC
Permalink
Post by Andreas Dilger
Post by jd1008
I am reporting this on the advice of the Fedora Users Mailing List Member.
This the mailing list exchange outlining the problem with specifying -S to mkfs, and it's subsequent consequences when fsck is run.
I am reporting this per suggestions made to me on the Fedora Users Mailing List.
I would say that "mke2fs -S" is going to lead to worse corruption rather
than improving the situation in 999 times of 1000. It should only be
used by someone who knows very specific details of the filesystem and
how it was corrupted. I'm tempted to make it an "undocumented" feature,
since I suspect it will do more harm than good in most cases. "-S"
should at least call check_plausibility() and proceed_question() before
clobbering the filesystem.
Better would be something like the "findsuper" utility in the e2fsprogs
sources (attached here for your conveniece). Usually in cases like this
the problem is actually something with the partition table, and not that
all of your backup superblocks have mysteriously been corrupted at the
same time.
Cheers, Andreas
Post by jd1008
Is there any other tool that can extract files from a partition that
seems to have corrupted superblocks?
I tried dumpe2fs, and fsck -b <blockNumber>
to no avail. Tried all available block numbers that are listed
when original mkfs was done, and it's output was saved.
None of the blocks seem to work - all of them have invalid magic.
Verify that the partition table still appears to be correct. If it
is pointing to the wrong starting location, none of the super blocks
will appear in the expected places. You might see if /testdisk/can
find any intact super blocks.
Consider using a hex editor to look at some of the super blocks.
They should contain the same data. The data that actually appears
there might give some clue as to what happened.
As a last ditch recovery effort, run mke2fs/mke3fs with the "-S"
option to initialize the super blocks and group descriptors only.
Do this only with (or on) a backup copy of the partition, since
it is potentially destructive. Then see if /debugfs/can make
sense of the filesystem, and if so, run /fsck/with the "-f"
option to repair the metadata.
I ran mkfs.ext3 -S /dev/sdc7
then ran fsck.ext3 -y /dev/sdc7
it blew away EVERYTHING :)
Back to square one and re-dd original to test drive
and start over.
Ouch! That _used_ to work. Trying it just now, "mke3fs -S" seems
to clear a substantial portion of the inodes, which the manpage
specifically says it should _not_ do, and then /fsck/ completes the
destruction by moving all of the remaining inodes to lost+found.
Sorry about that.
Can raise a bug against it?
Chances are this is an upstream bug, or a misunderstanding. You should post your reproduce steps to the ext4 list, what you expect to happen based on man page, and what actually happens.
http://vger.kernel.org/vger-lists.html#linux-ext4
Chris Murphy
Update:
Since I had believed, and it has been mentioned, that there is a
possibility that the partition
table itself may have been clobbered in a way to change the real
starting address of the partition,
I decided to search for possible candidates for a superblock using fsck
in a shell script as follows:

sb=0; while [ $sb -lt 429496729600 ]; do
possibleSB=`e2fsck -n -b $sb /dev/sdc7 2>&1 | grep 'The superblock could
not be read|Bad magic number'`
[ "x$possibleSB" = "x" ] && echo $sb - "$possibleSB"
sb=`expr $sb + 512`
done > possibleSB

I found these blocks, the value of which might hint that the current
start block of the partition may have been altered:
1325056
2373632
3815424
6553600
13376000
13391872
13407744
13423616
13439488

Running fsck /dev/sdc7:

for sb in 1325056 2373632 3815424 6553600 13376000 13391872 13407744
13423616 13439488 ; do
echo fsck -b $sb /dev/sdc7
fsck -b $sb /dev/sdc7
echo =================
done

=================================================================================
fsck -b 1325056 /dev/sdc7
fsck from util-linux 2.24.2
e2fsck 1.42.8 (20-Jun-2013)
Superblock has an invalid journal (inode 8).
Clear<y>? no
fsck.ext2: Illegal inode number while checking ext3 journal for /dev/sdc7

/dev/sdc7: ***** FILE SYSTEM WAS MODIFIED *****

/dev/sdc7: ********** WARNING: Filesystem still has errors **********

=================
fsck -b 2373632 /dev/sdc7
fsck from util-linux 2.24.2
e2fsck 1.42.8 (20-Jun-2013)
Superblock has an invalid journal (inode 8).
Clear<y>? no
fsck.ext2: Illegal inode number while checking ext3 journal for /dev/sdc7

/dev/sdc7: ***** FILE SYSTEM WAS MODIFIED *****

/dev/sdc7: ********** WARNING: Filesystem still has errors **********

=================
fsck -b 3815424 /dev/sdc7
fsck from util-linux 2.24.2
e2fsck 1.42.8 (20-Jun-2013)
Superblock has an invalid journal (inode 8).
Clear<y>? no
fsck.ext2: Illegal inode number while checking ext3 journal for /dev/sdc7

/dev/sdc7: ***** FILE SYSTEM WAS MODIFIED *****

/dev/sdc7: ********** WARNING: Filesystem still has errors **********

=================
fsck -b 6553600 /dev/sdc7
fsck from util-linux 2.24.2
e2fsck 1.42.8 (20-Jun-2013)
fsck.ext2: Attempt to read block from filesystem resulted in short read
while trying to open /dev/sdc7
Could this be a zero-length partition?
=================
fsck -b 13376000 /dev/sdc7
fsck from util-linux 2.24.2
e2fsck 1.42.8 (20-Jun-2013)
Superblock has an invalid journal (inode 8).
Clear<y>? no
fsck.ext2: Illegal inode number while checking ext3 journal for /dev/sdc7

/dev/sdc7: ********** WARNING: Filesystem still has errors **********

=================
fsck -b 13391872 /dev/sdc7
fsck from util-linux 2.24.2
e2fsck 1.42.8 (20-Jun-2013)
Superblock has an invalid journal (inode 8).
Clear<y>? no
fsck.ext2: Illegal inode number while checking ext3 journal for /dev/sdc7

/dev/sdc7: ********** WARNING: Filesystem still has errors **********

=================
fsck -b 13407744 /dev/sdc7
fsck from util-linux 2.24.2
e2fsck 1.42.8 (20-Jun-2013)
Superblock has an invalid journal (inode 8).
Clear<y>? no
fsck.ext2: Illegal inode number while checking ext3 journal for /dev/sdc7

/dev/sdc7: ********** WARNING: Filesystem still has errors **********

=================
fsck -b 13423616 /dev/sdc7
fsck from util-linux 2.24.2
e2fsck 1.42.8 (20-Jun-2013)
Superblock has an invalid journal (inode 8).
Clear<y>? no
fsck.ext2: Illegal inode number while checking ext3 journal for /dev/sdc7

/dev/sdc7: ********** WARNING: Filesystem still has errors **********

=================
fsck -b 13439488 /dev/sdc7
fsck from util-linux 2.24.2
e2fsck 1.42.8 (20-Jun-2013)
Superblock has an invalid journal (inode 8).
Clear<y>? no
fsck.ext2: Illegal inode number while checking ext3 journal for /dev/sdc7

/dev/sdc7: ********** WARNING: Filesystem still has errors **********

=================


So, of the possible sb's above, which one looks promising?
Theodore Ts'o
2014-09-21 01:09:42 UTC
Permalink
Since I had believed, and it has been mentioned, that there is a possibility
that the partition
table itself may have been clobbered in a way to change the real starting
address of the partition,
I decided to search for possible candidates for a superblock using fsck in a
shell script as follows....
Um.... if the starting location of the partition is wrong, you do
***not**** want to run fsck until fix the partition boundaries. Using
fsck -b <superblock number" when the partition boundaries are wrong
will ****not***** work.

Try using a tool like gpart to determine the partition boundaries, and
fix the partition boundaries *first*.

- Ted

Theodore Ts'o
2014-09-20 18:14:32 UTC
Permalink
Post by jd1008
I am reporting this on the advice of the Fedora Users Mailing List Member.
This the mailing list exchange outlining the problem with specifying -S to
mkfs,
and it's subsequent consequences when fsck is run.
If none of the possible superblocks are valid when using mke2fs -b
<NNN>, there's a good chance that your partition table (or LVM
metadata) has gotten corrupted. You should definitely check to make
sure the partition setup is sane before trying to use mke2fs -S.

It's also true, as Andreas has stated, that with the large number of
new file system options and layouts with ext4, mke2fs -S is much more
hazardous unless you __really__ know what you are doing. It would
probably be a good idea to have some warning messages to that effect
in the man page.

- Ted
jd1008
2014-09-20 18:38:18 UTC
Permalink
The -S was advised by a member of the Fedora Users Mailing List,
and I thought I would try it.
I still have the original disk, so no permanent harm.
I keep trying what is suggested on a copy of the partition.

Only drag is the copying :)
It is a 400GiB partition (400 * 1024^3).

I am currently scanning the parition for a superblock, starting at -b 0,
and keep incrementing by 512, until I find what"might be" a superblock;
i.e. fsck does not say "Bad magic number". Of course, that's no guarantee
it is a superblock, but it gives me an opportunity to examine the superblock
at that offset.

Regards,

JD
Post by Theodore Ts'o
Post by jd1008
I am reporting this on the advice of the Fedora Users Mailing List Member.
This the mailing list exchange outlining the problem with specifying -S to
mkfs,
and it's subsequent consequences when fsck is run.
If none of the possible superblocks are valid when using mke2fs -b
<NNN>, there's a good chance that your partition table (or LVM
metadata) has gotten corrupted. You should definitely check to make
sure the partition setup is sane before trying to use mke2fs -S.
It's also true, as Andreas has stated, that with the large number of
new file system options and layouts with ext4, mke2fs -S is much more
hazardous unless you __really__ know what you are doing. It would
probably be a good idea to have some warning messages to that effect
in the man page.
- Ted
Loading...