Re: Expression Pruning in postgress

From: HarmeekSingh Bedi <harmeeksingh(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Expression Pruning in postgress
Date: 2011-07-11 01:35:37
Message-ID: CALLwk6tSx3_rmBpLwYViGhBBeidy2n64gPZaPDcwKELNC1iQmA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi tom .

Thanks for your input . Appreciate your taking time and responding . Just
some comments.

1. May be I am mistaken Kindly help me understand a bit more. I do agree
that passing datums up the node chain helps - but consider the case when
either Sort or Hash joins spills on disk - large columns that get written on
to the disk will still cause a lot of performance issues {as sorts spills
will detoast} - lot of unnecessary columns will cause lot of I/O. 1024
varchars and lot of rows and you can see that serial case detoriates due to
this.
2. The parallel case works - the parallel nodes inherit the target list
of the underlying nodes - but in my case the issue of non pruned column
becomes worse as it also adds to network payload which is worse.
3. Now coming to your detoast . I have to do that at parallel node
boundaries as the data flow operators {delimited by parallel operators} run
on different machines and hence has to pass by value.

I did make a fix at least to alleviate this case in the optimizer . But I am
going to work on a more general approach of expression pruning based on the
lifetime of an expression. Basically each node will either references or
generate an expression. Any expression that is generated and is not
referenced by any top on top will be eliminated.

Regards
Harmeek

On Sun, Jul 10, 2011 at 10:28 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> HarmeekSingh Bedi <harmeeksingh(at)gmail(dot)com> writes:
> > Thanks Tom. Here is a example. Just a background of things . I have made
> > changes in postgress execution and storage engine to make it a MPP style
> > engine - keeping all optimizer intact. Basically take pgress serial plan
> and
> > construct a parallel plan. The query I am running is below.
>
> The output lists for the parallel nodes look pretty broken, but I guess
> you weren't asking about those. As near as I can tell, what you're
> unhappy about is that it's passing up both raw column values and
> pre-evaluated placeholder expressions using those values, when only the
> placeholders are really going to be needed. Yeah, that's probably true,
> because the placeholder mechanism isn't (yet) taken into account by the
> code that determines how far up a column value will be needed.
>
> In standard Postgres this isn't much of an issue because passing up
> by-reference Datums is really quite cheap ... it's only a pointer copy
> in many cases, and even where it's not, it's probably just a
> toast-pointer copy. I suspect it's costing you more because your
> "parallel" nodes have to instantiate the tuples instead of just passing
> virtual slots around ... but it's still not clear to me why you're
> passing more than a toast pointer for big values. Maybe you're being
> too enthusiastic about detoasting pointers early?
>
> regards, tom lane
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joe Conway 2011-07-11 02:19:03 Re: Kaigai's current patches -- review, commit status
Previous Message Stephen Frost 2011-07-11 01:34:27 Re: Enhanced psql in core?