Re: [HACKERS] Custom compression methods

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Chris Travers <chris(dot)travers(at)adjust(dot)com>
Cc: Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Custom compression methods
Date: 2019-03-19 11:19:53
Message-ID: 87f7b3e6-8d48-654e-6ccc-571c99dce805@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 3/19/19 10:59 AM, Chris Travers wrote:
>
>
> On Mon, Mar 18, 2019 at 11:09 PM Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com <mailto:tomas(dot)vondra(at)2ndquadrant(dot)com>> wrote:
>
>
>
> On 3/15/19 12:52 PM, Ildus Kurbangaliev wrote:
> > On Fri, 15 Mar 2019 14:07:14 +0400
> > David Steele <david(at)pgmasters(dot)net <mailto:david(at)pgmasters(dot)net>> wrote:
> >
> >> On 3/7/19 11:50 AM, Alexander Korotkov wrote:
> >>> On Thu, Mar 7, 2019 at 10:43 AM David Steele
> <david(at)pgmasters(dot)net <mailto:david(at)pgmasters(dot)net>
> >>> <mailto:david(at)pgmasters(dot)net <mailto:david(at)pgmasters(dot)net>>> wrote:
> >>>
> >>>     On 2/28/19 5:44 PM, Ildus Kurbangaliev wrote:
> >>>
> >>>      > there are another set of patches.
> >>>      > Only rebased to current master.
> >>>      >
> >>>      > Also I will change status on commitfest to 'Needs review'.
> >>>
> >>>     This patch has seen periodic rebases but no code review that I
> >>> can see since last January 2018.
> >>>
> >>>     As Andres noted in [1], I think that we need to decide if this
> >>> is a feature that we want rather than just continuing to push it
> >>> from CF to CF.
> >>>
> >>>
> >>> Yes.  I took a look at code of this patch.  I think it's in pretty
> >>> good shape.  But high level review/discussion is required.
> >>
> >> OK, but I think this patch can only be pushed one more time,
> maximum,
> >> before it should be rejected.
> >>
> >> Regards,
> >
> > Hi,
> > in my opinion this patch is usually skipped not because it is not
> > needed, but because of its size. It is not hard to maintain it until
> > commiters will have time for it or I will get actual response that
> > nobody is going to commit it.
> >
>
> That may be one of the reasons, yes. But there are other reasons, which
> I think may be playing a bigger role.
>
> There's one practical issue with how the patch is structured - the docs
> and tests are in separate patches towards the end of the patch series,
> which makes it impossible to commit the preceding parts. This needs to
> change. Otherwise the patch size kills the patch as a whole.
>
> But there's a more important cost/benefit issue, I think. When I look at
> patches as a committer, I naturally have to weight how much time I spend
> on getting it in (and then dealing with fallout from bugs etc) vs. what
> I get in return (measured in benefits for community, users). This patch
> is pretty large and complex, so the "costs" are quite high, while the
> benefits from the patch itself is the ability to pick between pg_lz and
> zlib. Which is not great, and so people tend to pick other patches.
>
> Now, I understand there's a lot of potential benefits further down the
> line, like column-level compression (which I think is the main goal
> here). But that's not included in the patch, so the gains are somewhat
> far in the future.
>
>
> Not discussing whether any particular committer should pick this up but
> I want to discuss an important use case we have at Adjust for this sort
> of patch.
>
> The PostgreSQL compression strategy is something we find inadequate for
> at least one of our large deployments (a large debug log spanning
> 10PB+).  Our current solution is to set storage so that it does not
> compress and then run on ZFS to get compression speedups on spinning disks.
>
> But running PostgreSQL on ZFS has some annoying costs because we have
> copy-on-write on copy-on-write, and when you add file fragmentation... I
> would really like to be able to get away from having to do ZFS as an
> underlying filesystem.  While we have good write throughput, read
> throughput is not as good as I would like.
>
> An approach that would give us better row-level compression  would allow
> us to ditch the COW filesystem under PostgreSQL approach.
>
> So I think the benefits are actually quite high particularly for those
> dealing with volume/variety problems where things like JSONB might be a
> go-to solution.  Similarly I could totally see having systems which
> handle large amounts of specialized text having extensions for dealing
> with these.
>

Sure, I don't disagree - the proposed compression approach may be a big
win for some deployments further down the road, no doubt about it. But
as I said, it's unclear when we get there (or if the interesting stuff
will be in some sort of extension, which I don't oppose in principle).

>
> But hey, I think there are committers working for postgrespro, who might
> have the motivation to get this over the line. Of course, assuming that
> there are no serious objections to having this functionality or how it's
> implemented ... But I don't think that was the case.
>
>
> While I am not currently able to speak for questions of how it is
> implemented, I can say with very little doubt that we would almost
> certainly use this functionality if it were there and I could see plenty
> of other cases where this would be a very appropriate direction for some
> other projects as well.
>
Well, I guess the best thing you can do to move this patch forward is to
actually try that on your real-world use case, and report your results
and possibly do a review of the patch.

IIRC there was an extension [1] leveraging this custom compression
interface for better jsonb compression, so perhaps that would work for
you (not sure if it's up to date with the current patch, though).

[1]
https://www.postgresql.org/message-id/20171130182009.1b492eb2%40wp.localdomain

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2019-03-19 11:22:33 Re: [HACKERS] Block level parallel vacuum
Previous Message Imai, Yoshikazu 2019-03-19 11:13:44 RE: speeding up planning with partitions