Re: RFC: Table access methods and scans

From: Mats Kindahl <mats(at)timescale(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: RFC: Table access methods and scans
Date: 2021-06-04 06:23:37
Message-ID: CA+14427vcgc+K3GVuGFuZzvN7u8DbB170gijVSutoeBrZmBe8w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Jeff,

On Fri, Jun 4, 2021 at 2:52 AM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:

> Hi,
>
> On Wed, 2021-03-31 at 22:10 +0200, Mats Kindahl wrote:
> > As an example of how this is useful, I noticed the work by Heikki and
> > Ashwin [1], where they return a `TableScanDesc` that contains
> > information about what columns to scan, which looks very useful.
> > Since
> > the function `table_beginscan` in `src/include/access/tableam.h`
> > accept a `ScanKey` as input, this is (AFAICT) what Heikki and Ashwin
> > was exploiting to create a specialized scan for a columnar store.
>
> I don't think ScanKeys are the right place to store information about
> what columns would be useful. See another thread[2] about that topic.
>

Yeah, it is not a good example. The examples below are better examples.
The scan keys are not sufficient to get all the columns, but AFAICT, it is
this callback that is exploited in the patch.

>
> > Another example of where this can be useful is to optimize access
> > during a sequential scan when you can handle some specific scans very
> > efficiently and can "skip ahead" many tuples if you know what is
> > being
> > looked for instead of filtering "late". Two examples of where this
> > could be useful are:
> >
> > - An access method that reads data from a remote system and doesn't
> > want
> > to transfer all tuples unless necessary.
> > - Some sort of log-structured storage with Bloom filters that allows
> > you to quickly skip suites that do not have a key.
>
> I agree that would be very conventient for non-heap AMs. There's a very
> old commit[3] that says:
>
> + /*
> + * Note that unlike IndexScan, SeqScan never use keys
> + * in heap_beginscan (and this is very bad) - so, here
> + * we have not check are keys ok or not.
> + */
>
> and that language has just been carried forward for decades. I wonder
> if there's any major reason this hasn't been done yet. Does it just not
> improve performance for a heap, or is there some other reason?
>

That is basically the question. I'm prepared to take a shot at it unless
there is a good reason not to.

Best wishes,
Mats Kindahl

>
> Regards,
> Jeff Davis
>
> [2]
>
> https://www.postgresql.org/message-id/CAE-ML+9RmTNzKCNTZPQf8O3b-UjHWGFbSoXpQa3Wvuc8YBbEQw@mail.gmail.com
>
> [3]
>
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=e3a1ab764ef2
>
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joel Jacobson 2021-06-04 06:37:52 Re: security_definer_search_path GUC
Previous Message Michael Paquier 2021-06-04 06:15:05 Re: Teaching users how they can get the most out of HOT in Postgres 14