Re: Compressed pluggable storage experiments

From: Natarajan R <nataraj3098(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Ildar Musin <ildar(at)adjust(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Compressed pluggable storage experiments
Date: 2022-08-18 06:32:32
Message-ID: CAPqxBt6to3CH-gqkKCLDmuq+8y_1uXgKZEGyLPrmDkrUWaUfaA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all, This is a continuation of the above thread...

>> > 4. In order to use WAL-logging each page must start with a standard 24
>> > byte PageHeaderData even if it is needless for storage itself. Not a
>> > big deal though. Another (acutally documented) WAL-related limitation
>> > is that only generic WAL can be used within extension. So unless
>> > inserts are made in bulks it's going to require a lot of disk space to
>> > accomodate logs and wide bandwith for replication.
>>
>> Not sure what to suggest. Either you should ignore this problem, or
>> you should fix it.

I am working on an environment similar to the above extension(pg_cryogen
which experiments pluggable storage api's) but don't have much knowledge on
pg's logical replication..
Please suggest some approaches to support pg's logical replication for a
table with a custom access method, which writes generic wal record.

On Wed, 17 Aug 2022 at 19:04, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:

> On Fri, Oct 18, 2019 at 03:25:05AM -0700, Andres Freund wrote:
> >Hi,
> >
> >On 2019-10-17 12:47:47 -0300, Alvaro Herrera wrote:
> >> On 2019-Oct-10, Ildar Musin wrote:
> >>
> >> > 1. Unlike FDW API, in pluggable storage API there are no routines like
> >> > "begin modify table" and "end modify table" and there is no shared
> >> > state between insert/update/delete calls.
> >>
> >> Hmm. I think adding a begin/end to modifytable is a reasonable thing to
> >> do (it'd be a no-op for heap and zheap I guess).
> >
> >I'm fairly strongly against that. Adding two additional "virtual"
> >function calls for something that's rarely going to be used, seems like
> >adding too much overhead to me.
> >
>
> That seems a bit strange to me. Sure - if there's an alternative way to
> achieve the desired behavior (clear way to finalize writes etc.), then
> cool, let's do that. But forcing people to use invonvenient workarounds
> seems like a bad thing to me - having a convenient and clear API is
> quite valueable, IMHO.
>
> Let's see if this actually has a measuerable overhead first.
>
> >
> >> > 2. It looks like I cannot implement custom storage options. E.g. for
> >> > compressed storage it makes sense to implement different compression
> >> > methods (lz4, zstd etc.) and corresponding options (like compression
> >> > level). But as i can see storage options (like fillfactor etc) are
> >> > hardcoded and are not extensible. Possible solution is to use GUCs
> >> > which would work but is not extremely convinient.
> >>
> >> Yeah, the reloptions module is undergoing some changes. I expect that
> >> there will be a way to extend reloptions from an extension, at the end
> >> of that set of patches.
> >
> >Cool.
> >
>
> Yep.
>
> >
> >> > 3. A bit surprising limitation that in order to use bitmap scan the
> >> > maximum number of tuples per page must not exceed 291 due to
> >> > MAX_TUPLES_PER_PAGE macro in tidbitmap.c which is calculated based on
> >> > 8kb page size. In case of 1mb page this restriction feels really
> >> > limiting.
> >>
> >> I suppose this is a hardcoded limit that needs to be fixed by patching
> >> core as we make table AM more pervasive.
> >
> >That's not unproblematic - a dynamic limit would make a number of
> >computations more expensive, and we already spend plenty CPU cycles
> >building the tid bitmap. And we'd waste plenty of memory just having all
> >that space for the worst case. ISTM that we "just" need to replace the
> >TID bitmap with some tree like structure.
> >
>
> I think the zedstore has roughly the same problem, and Heikki mentioned
> some possible solutions to dealing with it in his pgconfeu talk (and it
> was discussed in the zedstore thread, I think).
>
> >
> >> > 4. In order to use WAL-logging each page must start with a standard 24
> >> > byte PageHeaderData even if it is needless for storage itself. Not a
> >> > big deal though. Another (acutally documented) WAL-related limitation
> >> > is that only generic WAL can be used within extension. So unless
> >> > inserts are made in bulks it's going to require a lot of disk space to
> >> > accomodate logs and wide bandwith for replication.
> >>
> >> Not sure what to suggest. Either you should ignore this problem, or
> >> you should fix it.
> >
> >I think if it becomes a problem you should ask for an rmgr ID to use for
> >your extension, which we encode and then then allow to set the relevant
> >rmgr callbacks for that rmgr id at startup. But you should obviously
> >first develop the WAL logging etc, and make sure it's beneficial over
> >generic wal logging for your case.
> >
>
> AFAIK compressed/columnar engines generally implement two types of
> storage - write-optimized store (WOS) and read-optimized store (ROS),
> where the WOS is mostly just an uncompressed append-only buffer, and ROS
> is compressed etc. ISTM the WOS would benefit from a more elaborate WAL
> logging, but ROS should be mostly fine with the generic WAL logging.
>
> But yeah, we should test and measure how beneficial that actually is.
>
>
> regards
>
> --
> Tomas Vondra http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
>
>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Drouvot, Bertrand 2022-08-18 06:39:02 Re: shared-memory based stats collector - v70
Previous Message Peter Smith 2022-08-18 06:29:10 Re: Perform streaming logical transactions by background workers and parallel apply