Re: mosbench revisited

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: mosbench revisited
Date: 2011-08-08 13:29:39
Message-ID: CA+TgmoYN2hs4ewHjhcRUH5cCey-OBDSNX9KiEEsUwtEEiVwA-g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Aug 6, 2011 at 2:16 PM, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> It would be nice if the Linux guys would fix this problem for us, but
>> I'm not sure whether they will.  For those who may be curious, the
>> problem is in generic_file_llseek() in fs/read-write.c.  On a platform
>> with 8-byte atomic reads, it seems like it ought to be very possible
>> to read inode->i_size without taking a spinlock.  A little Googling
>> around suggests that some patches along these lines have been proposed
>> and - for reasons that I don't fully understand - rejected.  That now
>> seems unfortunate.  Barring a kernel-level fix, we could try to
>> implement our own cache to work around this problem.  However, any
>> such cache would need to be darn cheap to check and update (since we
>> can't assume that relation extension is an infrequent event) and must
>> somehow having the same sort of mutex contention that's killing the
>> kernel in this workload.
>
> What about making the relation extension much less frequent?  It's been
> talked about before here, that instead of extending 8kB at a time we
> could (should) extend by much larger chunks.  I would go as far as
> preallocating the whole next segment (1GB) (in the background) as soon
> as the current is more than half full, or such a policy.
>
> Then you have the problem that you can't really use lseek() anymore to
> guess'timate a relation size, but Tom said in this thread that the
> planner certainly doesn't need something that accurate.  Maybe the
> reltuples would do?  If not, it could be that some adapting of its
> accuracy could be done?

I think that pre-extending relations or extending them in larger
increments is probably a good idea, although I think the AMOUNT of
preallocation you just proposed would be severe overkill. If we
extended the relation in 1MB chunks, we'd reduce the number of
relation extensions by more than 99%, and with far less space wastage
than the approach you are proposing.

However, it doesn't really do anything to solve this problem. The
problem here is not that the size of the relation is changing too
frequently - indeed, it's not changing at all in this test case. The
problem is rather that testing whether or not the size has in fact
changed is costing too much.

The reason why we are testing the size of the relation here rather
than just using reltuples is because the relation might have been
extended since it was last analyzed. We can't count on analyze to run
often enough to avoid looking at the actual file size. If the file's
grown, we have to scale the number of tuples up proportional to the
growth in relpages.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-08-08 13:49:27 Re: mosbench revisited
Previous Message Robert Haas 2011-08-08 13:22:50 Re: mosbench revisited