Re: Re: [HACKERS] Custom compression methods

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)postgrespro(dot)ru>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Re: [HACKERS] Custom compression methods
Date: 2020-08-13 11:48:17
Message-ID: CAFiTN-uUpX3ck=K0mLEk-G_kUQY=SNOTeqdaNRR9FMdQrHKebw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 19, 2020 at 10:33 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Thu, Mar 7, 2019 at 2:51 AM Alexander Korotkov
> <a(dot)korotkov(at)postgrespro(dot)ru> wrote:
> > Yes. I took a look at code of this patch. I think it's in pretty good shape. But high level review/discussion is required.
>
> I agree that the code of this patch is in pretty good shape, although
> there is a lot of rebasing needed at this point. Here is an attempt at
> some high level review and discussion:
>
> - As far as I can see, there is broad agreement that we shouldn't
> consider ourselves to be locked into 'pglz' forever. I believe
> numerous people have reported that there are other methods of doing
> compression that either compress better, or compress faster, or
> decompress faster, or all of the above. This isn't surprising and nor
> is it a knock on 'pglz'; Jan committed it in 1999, and it's not
> surprising that in 20 years some people have come up with better
> ideas. Not only that, but the quantity and quality of open source
> software that is available for this kind of thing and for many other
> kinds of things have improved dramatically in that time.
>
> - I can see three possible ways of breaking our dependence on 'pglz'
> for TOAST compression. Option #1 is to pick one new algorithm which we
> think is better than 'pglz' in all relevant ways and use it as the
> default for all new compressed datums. This would be dramatically
> simpler than what this patch does, because there would be no user
> interface. It would just be out with the old and in with the new.
> Option #2 is to create a short list of new algorithms that have
> different trade-offs; e.g. one that is very fast (like lz4) and one
> that has an extremely high compression ratio, and provide an interface
> for users to choose between them. This would be moderately simpler
> than what this patch does, because we would expose to the user
> anything about how a new compression method could be added, but it
> would still require a UI for the user to choose between the available
> (and hard-coded) options. It has the further advantage that every
> PostgreSQL cluster will offer the same options (or a subset of them,
> perhaps, depending on configure flags) and so you don't have to worry
> that, say, a pg_am row gets lost and suddenly all of your toasted data
> is inaccessible and uninterpretable. Option #3 is to do what this
> patch actually does, which is to allow for the addition of any number
> of compressors, including by extensions. It has the advantage that new
> compressors can be added with core's permission, so, for example, if
> it is unclear whether some excellent compressor is free of patent
> problems, we can elect not to ship support for it in core, while at
> the same time people who are willing to accept the associated legal
> risk can add that functionality to their own copy as an extension
> without having to patch core. The legal climate may even vary by
> jurisdiction, so what might be questionable in country A might be
> clearly just fine in country B. Aside from those issues, this approach
> allows people to experiment and innovate outside of core relatively
> quickly, instead of being bound by the somewhat cumbrous development
> process which has left this patch in limbo for the last few years. My
> view is that option #1 is likely to be impractical, because getting
> people to agree is hard, and better things are likely to come along
> later, and people like options. So I prefer either #2 or #3.
>
> - The next question is how a datum compressed with some non-default
> method should be represented on disk. The patch handles this first of
> all by making the observation that the compressed size can't be >=1GB,
> because the uncompressed size can't be >=1GB, and we wouldn't have
> stored it compressed if it expanded. Therefore, the upper two bits of
> the compressed size should always be zero on disk, and the patch
> steals one of them to indicate whether "custom" compression is in use.
> If it is, the 4-byte varlena header is followed not only by a 4-byte
> size (now with the new flag bit also included) but also by a 4-byte
> OID, indicating the compression AM in use. I don't think this is a
> terrible approach, but I don't think it's amazing, either. 4 bytes is
> quite a bit to use for this; if I guess correctly what will be a
> typical cluster configuration, you probably would really only need
> about 2 bits. For a datum that is both stored externally and
> compressed, the overhead is likely negligible, because the length is
> probably measured in kB or MB. But for a datum that is compressed but
> not stored externally, it seems pretty expensive; the datum is
> probably short, and having an extra 4 bytes of uncompressible data
> kinda sucks. One possibility would be to allow only one byte here:
> require each compression AM that is installed to advertise a one-byte
> value that will denote its compressed datums. If more than one AM
> tries to claim the same byte value, complain. Another possibility is
> to abandon this approach and go with #2 from the previous paragraph.
> Or maybe we add 1 or 2 "privileged" built-in compressors that get
> dedicated bit-patterns in the upper 2 bits of the size field, with the
> last bit pattern being reserved for future algorithms. (e.g. 0x00 =
> pglz, 0x01 = lz4, 0x10 = zstd, 0x11 = something else - see within for
> details).
>
> - I don't really like the use of the phrase "custom compression". I
> think the terminology needs to be rethought so that we just talk about
> compression methods. Perhaps in certain contexts we need to specify
> that we mean extensible compression methods or user-provided
> compression methods or something like that, but I don't think the word
> "custom" is very well-suited here. The main point of this shouldn't be
> for every cluster in the universe to use a different approach to
> compression, or to compress columns within a database in 47 different
> ways, but rather to help us get out from under 'pglz'. Eventually we
> probably want to change the default, but as the patch phrases things
> now, that default would be a custom method, which is almost a
> contradiction in terms.
>
> - Yet another possible approach to the on-disk format is to leave
> varatt_external.va_extsize and varattrib_4b.rawsize untouched and
> instead add new compression methods by adding new vartag_external
> values. There's quite a lot of bit-space available there: we have a
> whole byte, and we're currently only using 4 values. We could easily
> add a half-dozen new possibilities there for new compression methods
> without sweating the bit-space consumption. The main thing I don't
> like about this is that it only seems like a useful way to provide for
> out-of-line compression. Perhaps it could be generalized to allow for
> inline compression as well, but it seems like it would take some
> hacking.
>
> - One thing I really don't like about the patch is that it consumes a
> bit from infomask2 for a new flag HEAP_HASCUSTOMCOMPRESSED. infomask
> bits are at a premium, and there's been no real progress in the decade
> plus that I've been hanging around here in clawing back any bit-space.
> I think we really need to avoid burning our remaining bits for
> anything other than a really critical need, and I don't think I
> understand what the need is in this case. I might be missing
> something, but I'd really strongly suggest looking for a way to get
> rid of this. It also invents the concept of a TupleDesc flag, and the
> flag invented is TD_ATTR_CUSTOM_COMPRESSED; I'm not sure I see why we
> need that, either.
>
> - It seems like this kind of approach has a sort of built-in
> circularity problem. It means that every place that might need to
> detoast a datum needs to be able to access the pg_am catalog. I wonder
> if that's actually true. For instance, consider logical decoding. I
> guess that can do catalog lookups in general, but can it do them from
> the places where detoasting is happening? Moreover, can it do them
> with the right snapshot? Suppose we rewrite a table to change the
> compression method, then drop the old compression method, then try to
> decode a transaction that modified that table before those operations
> were performed. As an even more extreme example, suppose we need to
> open pg_am, and to do that we have to build a relcache entry for it,
> and suppose the relevant pg_class entry had a relacl or reloptions
> field that happened to be custom-compressed. Or equally suppose that
> any of the various other tables we use when building a relcache entry
> had the same kind of problem, especially those that have TOAST tables.
> We could just disallow the use of non-default compressors in the
> system catalogs, but the benefits mentioned in
> http://postgr.es/m/5541614A.5030208@2ndquadrant.com seem too large to
> ignore.
>
> - I think it would be awfully appealing if we could find some way of
> dividing this great big patch into some somewhat smaller patches. For
> example:
>
> Patch #1. Add syntax allowing a compression method to be specified,
> but the only possible choice is pglz, and the PRESERVE stuff isn't
> supported, and changing the value associated with an existing column
> isn't supported, but we can add tab-completion support and stuff.
>
> Patch #2. Add a second built-in method, like gzip or lz4.
>
> Patch #3. Add support for changing the compression method associated
> with a column, forcing a table rewrite.
>
> Patch #4. Add support for PRESERVE, so that you can change the
> compression method associated with a column without forcing a table
> rewrite, by including the old method in the PRESERVE list, or with a
> rewrite, by not including it in the PRESERVE list.
>
> Patch #5. Add support for compression methods via the AM interface.
> Perhaps methods added in this manner are prohibited in system
> catalogs. (This could also go before #4 or even before #3, but with a
> noticeable hit to usability.)
>
> Patch #6 (new development). Add a contrib module using the facility
> added in #5, perhaps with a slightly off-beat compressor like bzip2
> that is more of a niche use case.
>
> I think that if the patch set were broken up this way, it would be a
> lot easier to review and get committed. I think you could commit each
> bit separately. I don't think you'd want to commit #1 unless you had a
> sense that #2 was pretty close to done, and similarly for #5 and #6,
> but that would still make things a lot easier than having one giant
> monolithic patch, at least IMHO.
>
> There might be more to say here, but that's what I have got for now. I
> hope it helps.

I have rebased the patch on the latest head and currently, broken into 3 parts.

v1-0001: As suggested by Robert, it provides the syntax support for
setting the compression method for a column while creating a table and
adding columns. However, we don't support changing the compression
method for the existing column. As part of this patch, there is only
one built-in compression method that can be set (pglz). In this, we
have one in-build am (pglz) and the compressed attributes will directly
store the oid of the AM. In this patch, I have removed the
pg_attr_compresion as we don't support changing the compression
for the existing column so we don't need to preserve the old
compressions.
v1-0002: Add another built-in compression method (zlib)
v1:0003: Remaining patch set (nothing is changed except rebase on the
current head, stabilizing check-world and 0001 and 0002 are pulled
out of this)

Next, I will be working on separating out the remaining patches as per
the suggestion by Robert.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
v1-0001-Add-support-for-setting-the-compression-method.patch application/x-patch 224.9 KB
v1-0003-Custom-compression-methods.patch application/x-patch 178.3 KB
v1-0002-Add-support-for-another-built-in-compression-meth.patch application/x-patch 9.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-08-13 13:17:20 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Previous Message Joe Conway 2020-08-13 10:54:41 Re: security_context_t marked as deprecated in libselinux 3.1