Re: not exactly a bug report, but surprising behaviour

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: not exactly a bug report, but surprising behaviour
Date: 2003-02-05 05:37:20
Message-ID: 87vfzz5lv3.fsf@stark.dyndns.tv
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com> writes:

> > This would have a couple big advantages:
> >
> > 1) If columns were excluded from the results from limit/offset clauses then
> > possibly expensive functions in the select list wouldn't have to be
> > calculated. It also means if the sort is quick but the functions slow that
> > the first rows would be returned quickly to the application even if the
> > total time was the same.
>
> That's a useful advantage. You can probably get this effect from pushing
> clauses into from list subselects as a current optimization, but it'd be
> nice since that's not an obvious conversion.

Not if the clause depends on the sort results such as limit/offset or having
clauses.

> > 2) The sort could be done on pointers to the tuples rather than pushing the
> > data in the tuple around in the sort. Obviously the transaction integrity
> > issues are tricky with this though. But it could save a lot of memory
> > pressure from sorts.
>
> This'd be an advantage in the case of a single table. Unless you can
> guarantee the join keeps the ordering properties, I don't think this'd
> help with anything that's doing joins or anything more complicated.

Joins shouldn't be a problem, just sort a list of pointers to records and have
a structure on the side saying which tables each pointer corresponds to and
which columns and expressions are requested from each. I'm not sure how to
retain transactional integrity with this though.

The tricky part would be evaluating when it would help. When sorting many
records with a narrow result set it wouldn't pay to generate pointers that
would be just as large as the original data and possibly push the original
data out of cache. But for sorting a limited number of records of many columns
each it would be way more efficient.

I'm watching this query I'm working on now drive the cpu to 100% for 6+ hours
with virtually no I/O. And I think it's the best I can do. All the times is
being spent moving bits around from one place to another. I'm entirely blocked
on memory throughput right now. The fewer data are actually pushed around the
faster sorts will be.

It occurs to me that it's possible postgres is doing this already at a lower
level of abstraction. The sort engine could load all the records to sort into
an array and generate a list of indexes into that array and sort the indexes.
But I get the impression from the previous conversation that that's not what's
happening.

I'm constantly surprised by how much work there is to build a good database
engine. But I'm also constantly amazed by how much postgres is already doing.
It's really quite remarkable how complete it is.

--
greg

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Greg Stark 2003-02-05 05:42:55 Re: Tuning Question sort_mem vs pgsql_tmp
Previous Message Stephan Szabo 2003-02-05 02:25:33 Re: UPDATE slow