Re: On columnar storage

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: On columnar storage
Date: 2015-06-14 17:26:54
Message-ID: 11572.1434302814@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2015-06-11 20:03:16 -0300, Alvaro Herrera wrote:
>> Parsing occurs as currently. During query rewrite, specifically at the
>> bottom of the per-relation loop in fireRIRrules(), we will modify the
>> query tree: each relation RTE containing a colstore will be replaced
>> with a JoinExpr containing the relation as left child and the colstore
>> as right child (1). The colstore RTE will be of a new RTEKind. For
>> each such change, all Var nodes that point to attnums stored in the
>> colstore will modified so that they reference the RTE of the colstore
>> instead (2).

> FWIW, I think this is a pretty bad place to tackle this. For one I think
> we shouldn't add more stuff using the rewriter unless it's clearly the
> best interface. For another, doing things in the rewriter will make
> optimizing things much harder - the planner will have to reconstruct
> knowledge which of the joins are column store joins and such.

As a comparison point, one of my Salesforce colleagues just put in
a somewhat similar though single-purpose thing, to expand what originally
is a simple table reference into a join (against a system catalog that's
nowhere mentioned in the original query). In our case, we wanted to force
a scan on a large table to have a constraint on the leading primary key
column; if the query has such a constraint already, then fine, else create
one by joining to a catalog that lists the allowed values of that column.
We started out by trying to do it in the rewriter, and that didn't work
well at all. We ended up actually doing it at createplan.c time, which
is conceptually ugly, but there was no good place to do it earlier without
duplicating a lot of indexqual analysis. But the thing that made that
painful was that the transformation was optional, and indeed might happen
or not happen for a given query depending on the selected plan shape.
AFAICT the transformation Alvaro is proposing is unconditional, so
it might be all right to do it in the rewriter. As you say, if the
planner needs to reconstruct what happened, that would be a strike
against this way, but it's not clear from here whether any additional
info is needed beyond the already-suggested extra RTEKind.

Another model that could be followed is expansion of inheritance-tree
references, which happens early in the planner. In that case the
planner does keep additional information about what it did (the appendrel
data structures), so that could be a good model if this code needs to
do likewise.

The existing join-alias-var flattening logic in the planner might be of
interest as well for the variable-substitution business, which I suspect
is the main reason Alvaro is proposing doing it in the rewriter.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-06-14 17:38:33 Re: On columnar storage
Previous Message Andres Freund 2015-06-14 16:58:51 Re: On columnar storage