Re: Compressed pluggable storage experiments

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Ildar Musin <ildar(at)adjust(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Compressed pluggable storage experiments
Date: 2019-10-19 12:23:23
Message-ID: 20191019122323.syfhef6uilbfgkpg@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 18, 2019 at 03:25:05AM -0700, Andres Freund wrote:
>Hi,
>
>On 2019-10-17 12:47:47 -0300, Alvaro Herrera wrote:
>> On 2019-Oct-10, Ildar Musin wrote:
>>
>> > 1. Unlike FDW API, in pluggable storage API there are no routines like
>> > "begin modify table" and "end modify table" and there is no shared
>> > state between insert/update/delete calls.
>>
>> Hmm. I think adding a begin/end to modifytable is a reasonable thing to
>> do (it'd be a no-op for heap and zheap I guess).
>
>I'm fairly strongly against that. Adding two additional "virtual"
>function calls for something that's rarely going to be used, seems like
>adding too much overhead to me.
>

That seems a bit strange to me. Sure - if there's an alternative way to
achieve the desired behavior (clear way to finalize writes etc.), then
cool, let's do that. But forcing people to use invonvenient workarounds
seems like a bad thing to me - having a convenient and clear API is
quite valueable, IMHO.

Let's see if this actually has a measuerable overhead first.

>
>> > 2. It looks like I cannot implement custom storage options. E.g. for
>> > compressed storage it makes sense to implement different compression
>> > methods (lz4, zstd etc.) and corresponding options (like compression
>> > level). But as i can see storage options (like fillfactor etc) are
>> > hardcoded and are not extensible. Possible solution is to use GUCs
>> > which would work but is not extremely convinient.
>>
>> Yeah, the reloptions module is undergoing some changes. I expect that
>> there will be a way to extend reloptions from an extension, at the end
>> of that set of patches.
>
>Cool.
>

Yep.

>
>> > 3. A bit surprising limitation that in order to use bitmap scan the
>> > maximum number of tuples per page must not exceed 291 due to
>> > MAX_TUPLES_PER_PAGE macro in tidbitmap.c which is calculated based on
>> > 8kb page size. In case of 1mb page this restriction feels really
>> > limiting.
>>
>> I suppose this is a hardcoded limit that needs to be fixed by patching
>> core as we make table AM more pervasive.
>
>That's not unproblematic - a dynamic limit would make a number of
>computations more expensive, and we already spend plenty CPU cycles
>building the tid bitmap. And we'd waste plenty of memory just having all
>that space for the worst case. ISTM that we "just" need to replace the
>TID bitmap with some tree like structure.
>

I think the zedstore has roughly the same problem, and Heikki mentioned
some possible solutions to dealing with it in his pgconfeu talk (and it
was discussed in the zedstore thread, I think).

>
>> > 4. In order to use WAL-logging each page must start with a standard 24
>> > byte PageHeaderData even if it is needless for storage itself. Not a
>> > big deal though. Another (acutally documented) WAL-related limitation
>> > is that only generic WAL can be used within extension. So unless
>> > inserts are made in bulks it's going to require a lot of disk space to
>> > accomodate logs and wide bandwith for replication.
>>
>> Not sure what to suggest. Either you should ignore this problem, or
>> you should fix it.
>
>I think if it becomes a problem you should ask for an rmgr ID to use for
>your extension, which we encode and then then allow to set the relevant
>rmgr callbacks for that rmgr id at startup. But you should obviously
>first develop the WAL logging etc, and make sure it's beneficial over
>generic wal logging for your case.
>

AFAIK compressed/columnar engines generally implement two types of
storage - write-optimized store (WOS) and read-optimized store (ROS),
where the WOS is mostly just an uncompressed append-only buffer, and ROS
is compressed etc. ISTM the WOS would benefit from a more elaborate WAL
logging, but ROS should be mostly fine with the generic WAL logging.

But yeah, we should test and measure how beneficial that actually is.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Lakhin 2019-10-19 12:34:56 Remove obsolete options for createuser
Previous Message Tomas Vondra 2019-10-19 11:08:31 Re: jsonb_set() strictness considered harmful to data