Re: filesystem performance with lots of files

From: David Lang <dlang(at)invendra(dot)net>
To: Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: filesystem performance with lots of files
Date: 2005-12-02 07:07:56
Message-ID: Pine.LNX.4.62.0512012258430.2807@qnivq.ynat.uz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, 1 Dec 2005, Qingqing Zhou wrote:

> "David Lang" <dlang(at)invendra(dot)net> wrote
>>
>> a few weeks ago I did a series of tests to compare different filesystems.
>> the test was for a different purpose so the particulars are not what I
>> woud do for testing aimed at postgres, but I think the data is relavent)
>> and I saw major differences between different filesystems, I'll see aobut
>> re-running the tests to get a complete set of benchmarks in the next few
>> days. My tests had their times vary from 4 min to 80 min depending on the
>> filesystem in use (ext3 with hash_dir posted the worst case). what testing
>> have other people done with different filesystems?
>>
>
> That's good ... what benchmarks did you used?

I was doing testing in the context of a requirement to sync over a million
small files from one machine to another (rsync would take >10 hours to do
this over a 100Mb network so I started with the question 'how long would
it take to do a tar-ftp-untar cycle with no smarts) so I created 1m x 1K
files in a three deep directory tree (10d/10d/10d/1000files) and was doing
simple 'time to copy tree', 'time to create tar', 'time to extract from
tar', 'time to copy tarfile (1.6G file). I flushed the memory between each
test with cat largefile >/dev/null (I know now that I should have
unmounted and remounted between each test), source and destination on
different IDE controllers

I don't have all the numbers readily available (and I didn't do all the
tests on every filesystem), but I found that even with only 1000
files/directory ext3 had some problems, and if you enabled dir_hash some
functions would speed up, but writing lots of files would just collapse
(that was the 80 min run)

I'll have to script it and re-do the tests (and when I do this I'll also
set it to do a test with far fewer, far larger files as well)

David Lang

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message David Lang 2005-12-02 07:35:17 Re: Open request for benchmarking input (fwd)
Previous Message Luke Lonergan 2005-12-02 06:11:53 Re: Database restore speed