Re: Table AM Interface Enhancements

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
Subject: Re: Table AM Interface Enhancements
Date: 2024-04-11 17:48:25
Message-ID: CAAKRu_YkphAPNbBR2jcLqnxGhDEWTKhYfLFY=0R_oG5LHBH7Gw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 11, 2024 at 12:19 PM Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
>
> On Wed, Apr 10, 2024 at 5:21 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
> > Hi,
> >
> > On 2024-04-10 16:50:44 -0400, Melanie Plageman wrote:
> > > This brings up a question about the prefetching. We never had to have
> > > this discussion for sequential scan streaming read because it didn't
> > > (and still doesn't) do prefetching. But, if we push the streaming read
> > > code down into the heap AM layer, it will be doing the prefetching.
> > > So, do we remove the prefetching from acquire_sample_rows() and expect
> > > other table AMs to implement it themselves or use the streaming read
> > > API?
> >
> > The prefetching added to acquire_sample_rows was quite narrowly tailored to
> > something heap-like - it pretty much required that block numbers to be 1:1
> > with the actual physical on-disk location for the specific AM. So I think
> > it's pretty much required for this to be pushed down.
> >
> > Using a read stream is a few lines for something like this, so I'm not worried
> > about it. I guess we could have a default implementation for block based AMs,
> > similar what we have around table_block_parallelscan_*, but not sure it's
> > worth doing that, the complexity is much lower than in the
> > table_block_parallelscan_ case.
>
> This makes sense.
>
> I am working on pushing streaming ANALYZE into heap AM code, and I ran
> into a few roadblocks.
>
> If we want ANALYZE to make the ReadStream object in heap_beginscan()
> (like the read stream implementation of heap sequential and TID range
> scans do), I don't see any way around changing the scan_begin table AM
> callback to take a BufferAccessStrategy at the least (and perhaps also
> the BlockSamplerData).

I will also say that, had this been 6 months ago, I would probably
suggest we restructure ANALYZE's table AM interface to accommodate
read stream setup and to address a few other things I find odd about
the current code. For example, I think creating a scan descriptor for
the analyze scan in acquire_sample_rows() is quite odd. It seems like
it would be better done in the relation_analyze callback. The
relation_analyze callback saves some state like the callbacks for
acquire_sample_rows() and the Buffer Access Strategy. But at least in
the heap implementation, it just saves them in static variables in
analyze.c. It seems like it would be better to save them in a useful
data structure that could be accessed later. We have access to pretty
much everything we need at that point (in the relation_analyze
callback). I also think heap's implementation of
table_beginscan_analyze() doesn't need most of
heap_beginscan()/initscan(), so doing this instead of something
ANALYZE specific seems more confusing than helpful.

- Melanie

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David E. Wheeler 2024-04-11 17:52:26 Re: RFC: Additional Directory for Extensions
Previous Message Alexander Korotkov 2024-04-11 17:46:02 Re: Table AM Interface Enhancements