Re: Sequential scans

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Simon Riggs <simon(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Sequential scans
Date: 2007-05-02 22:37:01
Message-ID: 1178145421.28383.189.camel@dogma.v10.wvs
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2007-05-02 at 20:58 +0100, Heikki Linnakangas wrote:
> Jeff Davis wrote:
> > What should be the maximum size of this hash table?
>
> Good question. And also, how do you remove entries from it?
>
> I guess the size should somehow be related to number of backends. Each
> backend will realistically be doing just 1 or max 2 seq scan at a time.
> It also depends on the number of large tables in the databases, but we
> don't have that information easily available. How about using just
> NBackends? That should be plenty, but wasting a few hundred bytes of
> memory won't hurt anyone.

One entry per relation, not per backend, is my current design.

> I think you're going to need an LRU list and counter of used entries in
> addition to the hash table, and when all entries are in use, remove the
> least recently used one.
>
> The thing to keep an eye on is that it doesn't add too much overhead or
> lock contention in the typical case when there's no concurrent scans.
>
> For the locking, use a LWLock.
>

Ok. What would be the potential lock contention in the case of no
concurrent scans?

Also, is it easy to determine the space used by a dynahash with N
entries? I haven't looked at the dynahash code yet, so perhaps this will
be obvious.

> No, not the segment. RelFileNode consists of tablespace oid, database
> oid and relation oid. You can find it in scan->rs_rd->rd_node. The
> segmentation works at a lower level.

Ok, will do.

> Hmm. Should we care then? CFG is the default on Linux, and an average
> sysadmin is unlikely to change it.
>

Keep in mind that concurrent sequential scans with CFQ are *already*
very poor. I think that alone is an interesting fact that's somewhat
independent of Sync Scans.

> - when ReadBuffer is called, let the caller know if the read did
> physical I/O.
> - when the previous ReadBuffer didn't result in physical I/O, assume
> that we're not the pack leader. If the next buffer isn't already in
> cache, wait a few milliseconds before initiating the read, giving the
> pack leader a chance to do it instead.
>
> Needs testing, of course..
>

An interesting idea. I like that the most out of the ideas of
maintaining a "pack leader". That's very similar to what the Linux
anticipatory scheduler does for us.

> >> 4. It fails regression tests. You get an assertion failure on the portal
> >> test. I believe that changing the direction of a scan isn't handled
> >> properly; it's probably pretty easy to fix.
> >>
> >
> > I will examine the code more carefully. As a first guess, is it possible
> > that test is failing because of the non-deterministic order in which
> > tuples are returned?
>
> No, it's an assertion failure, not just different output than expected.
> But it's probably quite simple to fix..
>

Ok, I'll find and correct it then.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2007-05-02 22:59:51 Re: Sequential scans
Previous Message Tom Lane 2007-05-02 22:25:03 Re: reindexdb hangs