Re: plperl (7.5)

From: elein <elein(at)varlena(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: plperl (7.5)
Date: 2004-07-11 23:15:45
Message-ID: 20040711161545.H6381@cookie.varlena.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I'll try these in plperl when I get a chance, but there is still
a scalability problem.

There was some work done at UCB a while back (post-postgres)
to try to speed up queries by making the first n rows available
quickly. This is based on the googlish idea that people want
results fast and sometimes only want to see the first few results.

(I saw the return n rows fast before I heard of the google algorithm,
by some years, though.) (It may have been part of hellerstein, et al's
estimated aggregrate capabilities. It was around that time anyway.)

If any function is going to return a gazillion rows
there ought to be a way of doing that block by block.

When a query uses a cursor, is the entire result set always
materialized? If so, perhaps this is just the way it is
for postgres. If not, in the future perhaps there can be a
way to do this. I do not know that part of the source, however.
And I suspect any sorted query would need to be materialized
anyway.

The other piece of the scalability/speed problem is copying
the data from the materialized result set into the return
structure. Theoretically this requires twice the memory of
the result set. Is that actually true in reality?

Obviously these comments are not for 7.5 nor are they
make or break issues.

--elein

On Sun, Jul 11, 2004 at 11:17:19AM -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl> writes:
> > On Sat, Jul 10, 2004 at 09:18:28PM -0700, elein wrote:
> >> The new plperl returns sets by having
> >> the function return an array.
>
> > I think RETURN NEXT does the same thing anyway ... they just store
> > tuples in a Tuplestore and then the whole thing is returned. So the
> > function actually doesn't return until the whole function is done.
>
> However, it's likely that the tuplestore infrastructure can deal
> comfortably with sets far larger than a Perl array can. (For one thing,
> it will swap tuples out to a temp file on disk once the set size exceeds
> work_mem.) I think elein's concern is justified, unless someone can
> produce a test case showing that plperl actually performs OK with a
> large result set.
>
> As a simple test for plpgsql's speed with such things, I tried
>
> create function seq(int) returns setof int as '
> begin
> for i in 1..$1 loop
> return next i;
> end loop;
> return;
> end' language plpgsql;
>
> regression=# \timing
> Timing is on.
> regression=# select count(*) from seq(100000);
> count
> --------
> 100000
> (1 row)
>
> Time: 396.524 ms
> regression=# select count(*) from seq(1000000);
> count
> ---------
> 1000000
> (1 row)
>
> Time: 3615.115 ms
> regression=# select count(*) from seq(10000000);
> count
> ----------
> 10000000
> (1 row)
>
> Time: 40356.972 ms
>
> My Perl is too rusty to immediately whip out the equivalent incantation
> in plperl; would someone like to compare the timings on their own machine?
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2004-07-11 23:23:13 Re: [Re] Re: PREPARE and transactions
Previous Message Bruce Momjian 2004-07-11 22:52:34 Re: possibly updating techdocs; mysql2pgsql on gborg