Re: DML and column cound in aggregated subqueries

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: DML and column cound in aggregated subqueries
Date: 2016-10-31 13:53:26
Message-ID: 20161031135326.xwwqpnoxdupegi6y@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2016-10-31 09:35:57 -0400, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > this doesn't look right. The ctid shouldn't be in the aggregate output -
> > after all it's pretty much meaningless here.
>
> I suspect it's being added to support EvalPlanQual row re-fetches.

Hm, that doesn't seem particularly meaningful though? I wonder if it's
not actually more an accident than anything. Looks like
preprocess_rowmarks() adds them pretty unconditionally to everything for DML.

> > Casting a wider net: find_hash_columns() and it's subroutines seem like
> > pretty much dead code, including outdated comments (look for "SQL99
> > semantics").
>
> Hm, maybe. In principle the planner could do that instead, but it doesn't
> look like it actually does at the moment; I'm not seeing any distinction
> in tlist-building for AGG_HASHED vs other cases in create_agg_plan().

Isn't the important part what's inside Agg->numCols/grpColIdx/grpOperators?
Those should Because those are the ones that are minimal, and it looks
we do the right thing there already (and if not, we'd be in trouble
anyways afaics).

We already only store columns that find_hash_columns() figures out to be
required in the hashtable, but afaics we can instead just iterate over
grpColIdx[] and just store the ones in there.

The reason I'm looking into this in the first place is that the slot
access inside execGrouping turns out to be pretty expensive, especially
because all the tuples have nulls, so we can't even skip columns and
such. I wrote code to create a new tupledesc which only containing the
columns referenced by grpColIdx, and asserted that it's the same set
that find_hash_columns() - but that's not always the case, but afaics
spuriously.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2016-10-31 13:54:14 Re: Dumb mistakes in WalSndWriteData()
Previous Message Amit Kapila 2016-10-31 13:51:52 Re: Speed up Clog Access by increasing CLOG buffers