Re: Browsing the tables / why pgsql does not perform well (with temp fix)

From: Hannu Krosing <hannu(at)trust(dot)ee>
To: pgsql-hackers(at)postgresql(dot)org
Cc: scrappy(at)hub(dot)org, honza(at)ied(dot)com, vadim(at)sable(dot)krasnoyarsk(dot)su
Subject: Re: Browsing the tables / why pgsql does not perform well (with temp fix)
Date: 1998-01-24 09:21:39
Message-ID: 34C9B2A3.3142005B@sid.trust.ee
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote :

> The Hermit Hacker wrote:
> > Jan Vicherek wrote:
> > > Please please help me solve this or make workarounds or anything. I
> > > would *really* like to see PosgreSQL to be playing against the Big
> > > (commercial) Boys !
> >
> > I'm curious, but can the "Big (commercial) Boys" do this? If so,
> > can you please provide an example of which and how?
>
They optimise sorting (use indexes) as well and so make it doable using
cursors

I even think that Oracle can use indexes for some cases of count(*) and
min() and max().

> Most of us here have
> > access to an one or the other (me, Oracle) to use as a sample system...if
> > we can prove that it does work on another system, then we have something
> > to work with, but right now all I've seen is "I wish I could do this", and
> > several examples on how to accomplish it using PostgreSQL, but that's
> > it...
> >
> The main problem is that PostgreSQL does not use index for sorting and
> thus really does a
> "copy" of the whole table and then sorts it before it can use a few rows
> from the beginning.
>
> Using indexes for sorting as well as selecting is on the TODO list, but
> seems to be not very high priority.
>
> >
> > > 0. having a value of a field on which there is an index, how can
> > I do :
> > > a) current_pointer = some_function("value_I_have");
> > > b) next_pointer = some_other_function(current_pointer);
> > > c) one_tupple = yet_another_function(next_pointer);
> > > If I can accomplish a,b,c, then I win and I don't have to do
> > questions
> > > 1..5 below.
> >
> > Why not put a sequence field on the table so that you can do:
> >
> > select * from table where rowid = n; -or-
> > select * from table where rowid = n - 1; -or-
> > select * from table where rowid = n + 1; -or-
> > select * from table where rowid >= n and rowid <= n+x;
> >
> > And create the index on rowid?
>
> It works no better than any other indexed field unless you disallow
> deletes.
>
> if aggregates were able to use indexes you could do:
>
> select min(n) from table where rowid >n;
>
> and then
>
> select * from table where n = n_found by_last_previous_select;
>
> but as they don't you would get very poor performance from the first
> select;
>
> This could be simulated by fetching only the first row from a cursor
> sorted on the field.
>
> So the real solution would be to use indexes for sorting, maybe at first
> for single field sorts.
>
> Then one could just do:
>
> --8<---------
> begin;
> declare cursor part_cursor for
> select * from part_table
> where indexed_field > 'last_value'
> order by indexed_field ;
>
> fetch 10 from part_cursor;
>
> close part_cursor;
> end;
> --8<---------
>
> for moving backwards you would of course use '<' and 'desc' in the
> select clause.
>
> Unfortunately it does not work nearly fast enough for big tables as
> often almost the whole table is copied and then sorted before you get
> your few rows.
>
It actually works for forward queries if you have a b-tree index on
indexed_field and if you omit the 'order by'-clause, but for all the
wrond reasons ;), i.e. the backend performes an index scan and so does
the right thing.

I just tried it on a 2 780 000 record db (web access log, table size
500MB) and both the cursor was created and data was returned
immediatedly (fetch 3 in my_cursor).

I did:

begin;

declare my_cursor cursor for select * from access_log where adate >
'28/12/97';

fetch 3 in mu_cursor;

end;

It does not work for moving backwards as the backend does the index scan
in forward direction regardless of the comparison being '>' or '<';

I then tried adding the 'order by adate desc' to cursor definition. The
result was a complete disaster (barely avoided by manually killing the
postgres processes), as 'fetch 3 in mycursor' run for about 10 min and
in the process exhausted 2.3GB of disc space for pg_sort files.

> Even more unfortunately client side development tools like Access or
> Delphi seem to rely on sorted queries using indexes for sorting and as a
> result perform very poorly with PostgreSQL in their default modes.
>
> OTOH, it usually shows poor database design if you can't specify your
> seach criteria precisely enough to limit the number of rows to some
> manageable level in interactive applications. It really is the task of
> the database server to look up things. The poor user should not have to
> wade through zillions of records in a looking for the one she wants
> (even tho the you cand do it quite effectively using ISAM).
>
> OTOOH, it would be very hard for general client side tools to do without
> keyed access, so addind using indexes for sorting should be given at
> least some priority.
>
Maybe an quicker and easier but not 'future proof' (as it uses
undocumented and possibly soon-to-change features) fix would be to
reverse the index scan direction for '<' operator?

Vadim: how hard would implementing this proposal be?

(as I think that making the optimiser also optimize ordering would be
quite a big undertaking)

-----------------------
Hannu Krosing
Trust-O-Matic OÜ

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Maarten Boekhold 1998-01-24 10:32:46 Re: Copyright question: GPL patches for non-GPL packages (fwd)
Previous Message Hannu Krosing 1998-01-24 07:36:41 Re: Browsing the tables and why pgsql does not perform well