Re: Sync Scan update

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org, Luke Lonergan <llonergan(at)greenplum(dot)com>
Subject: Re: Sync Scan update
Date: 2006-12-19 18:37:21
Message-ID: 1166553441.24294.30.camel@dogma.v10.wvs
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2006-12-19 at 18:05 +0000, Gregory Stark wrote:
> "Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
>
> > Like to see some tests with 2 parallel threads, since that is the most
> > common case. I'd also like to see some tests with varying queries,
> > rather than all use select count(*). My worry is that these tests all
> > progress along their scans at exactly the same rate, so are likely to
> > stay in touch. What happens when we have significantly more CPU work to
> > do on one scan - does it fall behind??
>
> If it's just CPU then I would expect the cache to help the followers keep up
> pretty easily. What concerns me is queries that involve more I/O. For example
> if the leader is doing a straight sequential scan and the follower is doing a
> nested loop join driven by the sequential scan. Or worse, what happens if the

That would be one painful query: scanning two tables in a nested loop,
neither of which fit into physical memory! ;)

If one table does fit into memory, it's likely to stay there since a
nested loop will keep the pages so hot.

I can't think of a way to test two big tables in a nested loop because
it would take so long. However, it would be worth trying it with an
index, because that would cause random I/O during the scan.

> leader is doing a nested loop and the follower which is just doing a straight
> sequential scan is being held back?
>

The follower will never be held back in my current implementation.

My current implementation relies on the scans to stay close together
once they start close together. If one falls seriously behind, it will
fall outside of the main "cache trail" and cause the performance to
degrade due to disk seeking and lower cache efficiency.

I think Simon is concerned about CPU because that will be a common case:
if one scan is CPU bound and another is I/O bound, they will progress at
different rates. That's bound to cause seeking and poor cache
efficiency.

Although I don't think either of these cases will be worse than current
behavior, it warrants more testing.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2006-12-19 18:38:01 Re: Companies Contributing to Open Source
Previous Message Carlo Stonebanks 2006-12-19 18:06:48 ODBC: how to change search_path in DSN?