Re: tuplestore API problem

From: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: tuplestore API problem
Date: 2009-03-28 06:31:17
Message-ID: e08cc0400903272331q4e3ab3a7m60e05d605c7c6944@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2009/3/28 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com> writes:
>> 2009/3/27 Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>:
>>> 2009/3/27 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
>>>> A brute-force solution is to change tuplestore_gettupleslot() so that it
>>>> always copies the tuple, but this would be wasted cycles for most uses
>>>> of tuplestores.  I'm thinking of changing tuplestore_gettupleslot's API
>>>> to add a bool parameter specifying whether the caller wants to force
>>>> a copy.
>
>> Here's the patch. Hope there are no more on the same reason. It seems
>> that we'd need to implement something like garbage collector in
>> tuplestore, marking and tracing each row references, if the complete
>> solution is required.
>
> I don't like this; I'm planning to go with the aforementioned API
> change instead.  The way you have it guarantees an extra copy cycle
> even when tuplestore is already making a copy internally; and it doesn't
> help if we find similar problems elsewhere.  (While I'm making the
> API change I'll take a close look at each call site to see if it has
> any similar risk.)

You're right. It kills performance even after dumptuples(). Thinking
more, I found the cause is only around dumptuples(). If you can trace
TupleTableSlots that points to memtuples inside tuplestore, you can
materialize them just before WRITETUP() in dumptuples().
So I tried pass EState.es_tupleTables to tuplestore_begin_heap() to
trace those TupleTableSlots. Note that if you pass NULL the behavior
is as before so nothing's broken. Regression passes but I'm not quite
sure EState.es_tupleTable is only place that holds TupleTableSlots
passed to tuplestore...

I know you propose should_copy boolean parameter would be added to
tuplestore_gettupleslot(). That always adds overhead even if
tuplestore *doesn't* dump tuples. That case doesn't need copy tuples,
I guess.

Regards,

--
Hitoshi Harada

Attachment Content-Type Size
tuplestore_apichange.20090328.patch application/octet-stream 12.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2009-03-28 07:20:30 Re: Should SET ROLE inherit config params?
Previous Message Euler Taveira de Oliveira 2009-03-28 05:25:52 Re: psql \d* and system objects