Re: How to know referenced sub-fields of a composite type?

From: Kohei KaiGai <kaigai(at)heterodb(dot)com>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: How to know referenced sub-fields of a composite type?
Date: 2019-05-30 23:14:21
Message-ID: 29E07F2C-6257-4537-8DC1-B952BF08448B@heterodb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2019/05/30 16:33、Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>のメール:

>> On 2019/05/29 15:50, Kohei KaiGai wrote:
>> 2019年5月29日(水) 13:26 Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>:
>>>> It means we can skip to load the sub-fields unreferenced, if
>>>> query-planner can handle
>>>> referenced and unreferenced sub-fields correctly.
>>>> On the other hands, it looks to me RelOptInfo or other optimizer
>>>> related structure don't have
>>>> this kind of information. RelOptInfo->attr_needed tells extension
>>>> which attributes are referenced
>>>> by other relation, however, its granularity is not sufficient for sub-fields.
>>>
>>> Isn't that true for some other cases as well, like when a query accesses
>>> only some sub-fields of a json(b) column? In that case too, planner
>>> itself can't optimize away access to other sub-fields. What it can do
>>> though is match a suitable index to the operator used to access the
>>> individual sub-fields, so that the index (if one is matched and chosen)
>>> can optimize away accessing unnecessary sub-fields. IOW, it seems to me
>>> that the optimizer leaves it up to the indexes (and plan nodes) to further
>>> optimize access to within a field. How is this case any different?
>>
>> I think it is a little bit different scenario.
>> Even if an index on sub-fields can indicate the tuples to be fetched,
>> the fetched tuple contains all the sub-fields because heaptuple is
>> row-oriented data.
>>
>> For example, if WHERE-clause checks a sub-field: "x" then aggregate
>> function references other sub-field "y", Scan/Join node has to return
>> a tuple that contains both "x" and "y". IndexScan also pops up a tuple
>> with a full composite type, so here is no problem if we cannot know
>> which sub-fields are referenced in the later stage.
>> Maybe, if IndexOnlyScan supports to return a partial composite type,
>> it needs similar infrastructure that can be used for a better composite
>> type support on columnar storage.
>
> Ah indeed. I think I had misunderstood your intent. Indexes have to do
> with optimizing the "filtering" of complex/nested type (json, Arrow
> Struct, etc.) values, where unnecessary sub-fields need not be read before
> filtering, whereas you're interested in optimizing "projections" of
> complex types, where sub-fields that are not used anywhere in the query
> need not be read from the stored values.
>
>>>> Probably, all we can do right now is walk-on the RelOptInfo list to
>>>> lookup FieldSelect node
>>>> to see the referenced sub-fields. Do we have a good idea instead of
>>>> this expensive way?
>>>> # Right now, PG-Strom loads all the sub-fields of Struct column from
>>>> arrow_fdw foreign-table
>>>> # regardless of referenced / unreferenced sub-fields. Just a second best.
>>>
>>> I'm missing something, but if PG-Strom/arrow_fdw does look at the
>>> FieldSelect nodes to see which sub-fields are referenced, why doesn't it
>>> generate a plan that will only access those sub-fields or why can't it?
>>>
>> Likely, it is not a technical problem but not a smart implementation.
>> If I missed some existing infrastructure we can apply, it may be more suitable
>> than query/expression tree walking.
>
> There is no infrastructure for this as far as I know. Maybe, some will be
> built in the future now that storage format is pluggable.

If we design a common infrastructure for both of built-in and extension features, it makes sense for the kinds of storage system.
IndexOnlyScan is one of the built-in feature that is beneficial by the information of projection. Currently, we always don’t choose IndexOnlyScan if index is on sub-field of composite.

Best regards,

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2019-05-31 00:26:32 Re: New committer: David Rowley
Previous Message Michael Paquier 2019-05-30 22:15:15 Re: incorrect xlog.c coverage report