Very long delay for first write to big filesystem

Discussion:

Ken Bass

2014-01-17 16:32:48 UTC

I asked about this a while back. It seems that this problem is getting much
worse.

The problem/issue: there is a very long delay when my system does a write
to the filesystem. The delay now is over 5 minutes (yes: minutes). This
only happens on the first write after booting up the system, and only for
large files - 1GB or more. This can be a serious problem since all access
to any hard disk is blocked and will hang until the first write begins
again.

The prevailing thought at the time was this was associated with loading
into memory the directory information looking for free space, which I would
believe now.

The filesystem in question is 7.5TB, with about 4TB used. There are over
250,000 files. I also have another system with 1TB total and 400GB used,
with 65,000 files. This system, the smaller one, is beginning to show
delays as well, although only a few seconds.

This problem seems to involve several factors: the total size of the
system; the current "fragmentation" of that system; and finally the amount
of physical memory available.

As to the last factor, the 7.5TB system has only 2GB of memory (I didn't
think that it would need a lot since it is mostly being used as a file
server). The "fragmentation" factor (I am only guessing here) occurs with
having many files written and deleted over time.

So my questions are: is there a solution or work around for this; and is
this a bug, or perhaps an undesirable feature. If the latter, should this
be reported (somewhere)?

Any suggestions, tips, etc. greatly appreciated.

TIA

ken

Andreas Dilger

2014-01-18 17:09:20 UTC

Permalink

The problem/issue: there is a very long delay when my system does a write to the filesystem. The delay now is over 5 minutes (yes: minutes). This only happens on the first write after booting up the system, and only for large files - 1GB or more. This can be a serious problem since all access to any hard disk is blocked and will hang until the first write begins again.
The prevailing thought at the time was this was associated with loading into memory the directory information looking for free space, which I would believe now.

It isn't actually directory information that is being loaded, but rather the
block bitmaps from each group, and each one needs a seek to read.
This will take up to 7.5 TB / 128 MB/group / 100 seeks/sec = 600s
if the filesystem is nearly full. After this point, the bitmaps are cached
In memory and allocation is faster.

The filesystem in question is 7.5TB, with about 4TB used. There are over 250,000 files. I also have another system with 1TB total and 400GB used, with 65,000 files. This system, the smaller one, is beginning to show delays as well, although only a few seconds.
This problem seems to involve several factors: the total size of the system; the current "fragmentation" of that system; and finally the amount of physical memory available.
As to the last factor, the 7.5TB system has only 2GB of memory (I didn't think that it would need a lot since it is mostly being used as a file server). The "fragmentation" factor (I am only guessing here) occurs with having many files written and deleted over time.
So my questions are: is there a solution or work around for this; and is this a bug, or perhaps an undesirable feature. If the latter, should this be reported (somewhere)?

You might consider mounting the filesystem as ext4 instead of ext3.
It will do a slightly better job of finding contiguous free space
and avoid loading bitmaps that do not have enough space, but the
physics of seeking to read bitmaps is still the same.

If you format a new filesystem as ext4 (as opposed to just mounting the
existing filesystem as ext4) you can use a new feature "flex_bg" that
locates the block and inode bitmaps together so that they can be read
without so much seeking. You'd need a spare disk to format and copy
the data over to.

Using ext4 is also more resistant to fragmentation over time.

Cheers, Andreas

Any suggestions, tips, etc. greatly appreciated.
TIA
ken
_______________________________________________
Ext3-users mailing list
https://www.redhat.com/mailman/listinfo/ext3-users

Ken Bass

2014-01-20 02:07:54 UTC

Permalink

Thx Andreas.

re: block bitmaps - yes that is what I really meant. My experience with
filesystems is mainly from CPM/BDOS, where "directory" and block mapping
are essentially synonymous.

And now I understand about the timing. Makes sense when you describe it
that way.

My system is ext4, although I doubt that I used "flex_bg" option, since
this was first created awhile back. I did try to run e4defrag. It simply
said that no defrag was needed.

So, now I'm only left in need of a work-around. Perhaps a way to have the
system load the bitmaps at boot time in the background? It would need to be
done in such a way that it would not block any other access to that system.
Or, is there a better filesystem format that would not have this problem?
(Not a really great solution, since I would need to somehow/somewhere
backup my 7.5TB system first.)

It does seem strange that this hasn't become a more serious issue, as
typical filesystems are getting bigger now. And I can't imagine a really
large network server (10TB+) having to deal with this.

Again, thx for the response.

ken

Post by Ken Bass
The problem/issue: there is a very long delay when my system does a

write to the filesystem. The delay now is over 5 minutes (yes: minutes).
This only happens on the first write after booting up the system, and only
for large files - 1GB or more. This can be a serious problem since all
access to any hard disk is blocked and will hang until the first write
begins again.

Post by Ken Bass
The prevailing thought at the time was this was associated with loading

into memory the directory information looking for free space, which I would
believe now.
It isn't actually directory information that is being loaded, but rather the
block bitmaps from each group, and each one needs a seek to read.
This will take up to 7.5 TB / 128 MB/group / 100 seeks/sec = 600s
if the filesystem is nearly full. After this point, the bitmaps are cached
In memory and allocation is faster.

Post by Ken Bass
The filesystem in question is 7.5TB, with about 4TB used. There are over

250,000 files. I also have another system with 1TB total and 400GB used,
with 65,000 files. This system, the smaller one, is beginning to show
delays as well, although only a few seconds.

Post by Ken Bass
This problem seems to involve several factors: the total size of the

system; the current "fragmentation" of that system; and finally the amount
of physical memory available.

Post by Ken Bass
As to the last factor, the 7.5TB system has only 2GB of memory (I didn't

think that it would need a lot since it is mostly being used as a file
server). The "fragmentation" factor (I am only guessing here) occurs with
having many files written and deleted over time.

Post by Ken Bass
So my questions are: is there a solution or work around for this; and is

this a bug, or perhaps an undesirable feature. If the latter, should this
be reported (somewhere)?
You might consider mounting the filesystem as ext4 instead of ext3.
It will do a slightly better job of finding contiguous free space
and avoid loading bitmaps that do not have enough space, but the
physics of seeking to read bitmaps is still the same.
If you format a new filesystem as ext4 (as opposed to just mounting the
existing filesystem as ext4) you can use a new feature "flex_bg" that
locates the block and inode bitmaps together so that they can be read
without so much seeking. You'd need a spare disk to format and copy
the data over to.
Using ext4 is also more resistant to fragmentation over time.
Cheers, Andreas

Post by Ken Bass
Any suggestions, tips, etc. greatly appreciated.
TIA
ken
_______________________________________________
Ext3-users mailing list
https://www.redhat.com/mailman/listinfo/ext3-users

Andreas Dilger

2014-01-20 21:46:01 UTC

Permalink

re: block bitmaps - yes that is what I really meant. My experience with filesystems is mainly from CPM/BDOS, where "directory" and block mapping are essentially synonymous.
And now I understand about the timing. Makes sense when you describe it that way.
My system is ext4, although I doubt that I used "flex_bg" option, since this was first created awhile back. I did try to run e4defrag. It simply said that no defrag was needed.

Use "dumpe2fs -h /dev/XXX | grep feature" to see if it is listed.

So, now I'm only left in need of a work-around. Perhaps a way to have the system load the bitmaps at boot time in the background? It would need to be done in such a way that it would not block any other access to that system.

We had a similar problem in the past. Run "dumpe2fs /dev/XXX > /dev/null"
at startup time (can be before or after mount) to start it reading the
block and inode allocation bitmaps.

Or, is there a better filesystem format that would not have this problem? (Not a really great solution, since I would need to somehow/somewhere backup my 7.5TB system first.)

Yes, formatting with "mke2fs -t ext4" should enable flex_bg by default.

It does seem strange that this hasn't become a more serious issue, as typical filesystems are getting bigger now. And I can't imagine a really large network server (10TB+) having to deal with this.

That's why the flex_bg feature was added to ext4 in the first place.

Cheers, Andreas

Again, thx for the response.
ken

You might consider mounting the filesystem as ext4 instead of ext3.
It will do a slightly better job of finding contiguous free space
and avoid loading bitmaps that do not have enough space, but the
physics of seeking to read bitmaps is still the same.
If you format a new filesystem as ext4 (as opposed to just mounting the
existing filesystem as ext4) you can use a new feature "flex_bg" that
locates the block and inode bitmaps together so that they can be read
without so much seeking. You'd need a spare disk to format and copy
the data over to.
Using ext4 is also more resistant to fragmentation over time.
Cheers, Andreas

Any suggestions, tips, etc. greatly appreciated.
TIA
ken
_______________________________________________
Ext3-users mailing list
https://www.redhat.com/mailman/listinfo/ext3-users

_______________________________________________
Ext3-users mailing list
https://www.redhat.com/mailman/listinfo/ext3-users

Cheers, Andreas