filesystem performance with lots of files

From: David Lang <dlang(at)invendra(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: filesystem performance with lots of files
Date: 2005-12-01 15:09:02
Message-ID: Pine.LNX.4.62.0512010616200.2807@qnivq.ynat.uz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

this subject has come up a couple times just today (and it looks like one
that keeps popping up).

under linux ext2/3 have two known weaknesses (or rather one weakness with
two manifestations). searching through large objects on disk is slow, this
applies to both directories (creating, opening, deleting files if there
are (or have been) lots of files in a directory), and files (seeking to
the right place in a file).

the rule of thumb that I have used for years is that if files get over a
few tens of megs or directories get over a couple thousand entries you
will start slowing down.

common places you can see this (outside of postgres)

1. directories, mail or news storage.
if you let your /var/spool/mqueue directory get large (for example a
server that can't send mail for a while or mail gets misconfigured on).
there may only be a few files in there after it gets fixed, but if the
directory was once large just doing a ls on the directory will be slow.

news servers that store each message as a seperate file suffer from this
as well, they work around it by useing multiple layers of nested
directories so that no directory has too many files in it (navigating the
layers of directories costs as well, it's all about the tradeoffs). Mail
servers that use maildir (and Cyrus which uses a similar scheme) have the
same problem.

to fix this you have to create a new directory and move the files to
that directory (and then rename the new to the old)

ext3 has an option to make searching directories faster (htree), but
enabling it kills performance when you create files. And this doesn't help
with large files.

2. files, mbox formatted mail files and log files
as these files get large, the process of appending to them takes more
time. syslog makes this very easy to test. On a box that does syncronous
syslog writing (default for most systems useing standard syslog, on linux
make sure there is not a - in front of the logfile name) time how long it
takes to write a bunch of syslog messages, then make the log file large
and time it again.

a few weeks ago I did a series of tests to compare different filesystems.
the test was for a different purpose so the particulars are not what I
woud do for testing aimed at postgres, but I think the data is relavent)
and I saw major differences between different filesystems, I'll see aobut
re-running the tests to get a complete set of benchmarks in the next few
days. My tests had their times vary from 4 min to 80 min depending on the
filesystem in use (ext3 with hash_dir posted the worst case). what testing
have other people done with different filesystems?

David Lang

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Ing. Jhon Carrillo // Caracas, Venezuela 2005-12-01 15:45:25 Re: [PERFORM] 15,000 tables
Previous Message Michael Riess 2005-12-01 14:56:02 Re: 15,000 tables