Re: Large number of open(2) calls with bulk INSERT into empty table

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, Florian Weimer <fweimer(at)bfk(dot)de>
Subject: Re: Large number of open(2) calls with bulk INSERT into empty table
Date: 2011-12-07 03:03:31
Message-ID: CA+TgmobOxE3NHyFYQU9GEpes8zocGz2PVNuQfxfZ8_2j0kQAzA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 6, 2011 at 8:12 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On Tuesday, December 06, 2011 08:53:42 PM Robert Haas wrote:
>> On Tue, Dec 6, 2011 at 7:12 AM, Florian Weimer <fweimer(at)bfk(dot)de> wrote:
>> > * Robert Haas:
>> >> I tried whacking out the call to GetPageWithFreeSpace() in
>> >> RelationGetBufferForTuple(), and also with the unpatched code, but the
>> >> run-to-run randomness was way more than any difference the change
>> >> made.  Is there a better test case?
>> >
>> > I think that if you want to exercise file system lookup performance, you
>> > need a larger directory, which presumably means a large number of
>> > tables.
>>
>> OK.  I created 100,000 dummy tables, 10,000 at a time avoid blowing up
>> the lock manager.  I then repeated my previous tests, and I still
>> can't see any meaningful difference (on my MacBook Pro, running MacOS
>> X v10.6.8).  So at least on this OS, it doesn't seem to matter much.
>> I'm inclined to defer putting any more work into it until such time as
>> someone can demonstrate that it actually causes a problem and provides
>> a reproducible test case.  I don't deny that there's probably an
>> effect and it would be nice to improve this, but it doesn't seem worth
>> spending a lot of time on until we can find a case where the effect is
>> measurable.
> I think if at all youre going to notice differences at a high concurrency
> because you then would start to hit the price of synchronizing the dcache
> between cpu cores in the kernel.

Well, if the premise is that the table has been truncated in the same
transaction, then it's going to be tough to get high concurrency.
Maybe you could do it with multiple tables or with without truncation,
but either way I think you're going to be primarily limited by I/O
bandwidth or WALInsertLock contention, not kernel dcache
synchronization. I might be wrong, of course, but that's what I
think. I'm not saying this isn't worth improving, just that I don't
see it as a priority for me personally to spend time on right now. If
you or someone else wants to beat on it, or even just come up with a
test case, great!

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-12-07 03:20:31 Re: Timing overhead and Linux clock sources
Previous Message Greg Smith 2011-12-07 02:58:06 Timing overhead and Linux clock sources