Re: Page access pattern in query plan using index scan

From: Greg Stark <gsstark(at)mit(dot)edu>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Page access pattern in query plan using index scan
Date: 2004-06-03 11:06:55
Message-ID: 87k6yo529c.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


Jack Orenstein <jao(at)geophile(dot)com> writes:

> Alvaro Herrera wrote:
>
> > On Wed, Jun 02, 2004 at 08:38:58PM -0400, Jack Orenstein wrote:
> >
> > >What is the pattern of access to data pages? I can think of two likely
> > >answers:
> > >
> > >1) The index is scanned for ages 30 through 40. As each index entry is
> > >scanned, a row is retrieved.
> >
> > This one. There have been noises about doing the second, but it's non
> > trivial and there's no hacker currently working on it. Not to be
> > expected on the next version, I'd think.

I think the main problem is dealing with locking issues. As long as the index
scan is using the tuples it sees right away it doesn't care if someone else
changes some tuple it hasn't gotten to yet or has already seen. If you read
them all and then sort them it's possible that while you're dealing with your
local copy that someone else comes along and does something that you would
want to know about.

Actually I'm not exactly clear on what types of changes would cause problems,
and thinking about it now I don't see how this is any different than a
materialize or other plan that stores intermediate results. So there's
something I'm missing.

> Can you recommend an application-level workaround that will access pages
> in the right order?

Well you could try CLUSTER-ing your table on that index.

Alternatively you could try partitioning your table so the optimizer can use a
sequential scan to efficiently get the records you want. But there's no native
support for partitioning so you would have to roll your own entirely manually.
That would mean e.g. having tables cheques_jan, cheques_feb, cheques_mar,...
And a union of all the months. Any query that only spanned one month would use
the monthly table so it could do a sequential scan of just that month. I don't
recommend this unless you have a _lot_ of data that you often deal with in
particular chunks. Even then unless you're purging and archiving data in those
chunks it's probably still not worth it.

--
greg

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Hubert Fröhlich 2004-06-03 12:02:35 again, invalid page header in block ...
Previous Message Bambero 2004-06-03 10:11:40 row access