Re: How to know referenced sub-fields of a composite type?

From: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
To: Kohei KaiGai <kaigai(at)heterodb(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: How to know referenced sub-fields of a composite type?
Date: 2019-05-29 08:44:42
Message-ID: CAJrrPGcEnvK86cLsPmE5Eqw-2t9HG-azBUqCqM-f=E+qn2UcxQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 29, 2019 at 4:51 PM Kohei KaiGai <kaigai(at)heterodb(dot)com> wrote:

> Hi Amit,
>
> 2019年5月29日(水) 13:26 Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>:
> >
> > Kaigai-san,
> >
> > On 2019/05/29 12:13, Kohei KaiGai wrote:
> > > One interesting data type in Apache Arrow is "Struct" data type. It is
> > > equivalent to composite
> > > type in PostgreSQL. The "Struct" type has sub-fields, and individual
> > > sub-fields have its own
> > > values array for each.
> > >
> > > It means we can skip to load the sub-fields unreferenced, if
> > > query-planner can handle
> > > referenced and unreferenced sub-fields correctly.
> > > On the other hands, it looks to me RelOptInfo or other optimizer
> > > related structure don't have
> > > this kind of information. RelOptInfo->attr_needed tells extension
> > > which attributes are referenced
> > > by other relation, however, its granularity is not sufficient for
> sub-fields.
> >
> > Isn't that true for some other cases as well, like when a query accesses
> > only some sub-fields of a json(b) column? In that case too, planner
> > itself can't optimize away access to other sub-fields. What it can do
> > though is match a suitable index to the operator used to access the
> > individual sub-fields, so that the index (if one is matched and chosen)
> > can optimize away accessing unnecessary sub-fields. IOW, it seems to me
> > that the optimizer leaves it up to the indexes (and plan nodes) to
> further
> > optimize access to within a field. How is this case any different?
> >
> I think it is a little bit different scenario.
> Even if an index on sub-fields can indicate the tuples to be fetched,
> the fetched tuple contains all the sub-fields because heaptuple is
> row-oriented data.
> For example, if WHERE-clause checks a sub-field: "x" then aggregate
> function references other sub-field "y", Scan/Join node has to return
> a tuple that contains both "x" and "y". IndexScan also pops up a tuple
> with a full composite type, so here is no problem if we cannot know
> which sub-fields are referenced in the later stage.
> Maybe, if IndexOnlyScan supports to return a partial composite type,
> it needs similar infrastructure that can be used for a better composite
> type support on columnar storage.
>

There is another issue related to the columnar store that needs targeted
columns for projection from the scan is discussed in zedstore [1].
Projecting all columns from a columnar store is quite expensive than
the row store.

[1] -
https://www.postgresql.org/message-id/CALfoeivu-n5o8Juz9wW%2BkTjnis6_%2BrfMf%2BzOTky1LiTVk-ZFjA%40mail.gmail.com

Regards,
Haribabu Kommi
Fujitsu Australia

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Sharma 2019-05-29 12:20:35 Server crash due to assertion failure in CheckOpSlotCompatibility()
Previous Message Haribabu Kommi 2019-05-29 08:30:20 Re: MSVC Build support with visual studio 2019