Re: [HACKERS] Custom compression methods

From: Chris Travers <chris(dot)travers(at)adjust(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Custom compression methods
Date: 2019-03-19 09:59:38
Message-ID: CAN-RpxDFZUduYmOociaYHsnh3vw2E8wjsQ+Ht1xRvemp3H=6WA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 18, 2019 at 11:09 PM Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:

>
>
> On 3/15/19 12:52 PM, Ildus Kurbangaliev wrote:
> > On Fri, 15 Mar 2019 14:07:14 +0400
> > David Steele <david(at)pgmasters(dot)net> wrote:
> >
> >> On 3/7/19 11:50 AM, Alexander Korotkov wrote:
> >>> On Thu, Mar 7, 2019 at 10:43 AM David Steele <david(at)pgmasters(dot)net
> >>> <mailto:david(at)pgmasters(dot)net>> wrote:
> >>>
> >>> On 2/28/19 5:44 PM, Ildus Kurbangaliev wrote:
> >>>
> >>> > there are another set of patches.
> >>> > Only rebased to current master.
> >>> >
> >>> > Also I will change status on commitfest to 'Needs review'.
> >>>
> >>> This patch has seen periodic rebases but no code review that I
> >>> can see since last January 2018.
> >>>
> >>> As Andres noted in [1], I think that we need to decide if this
> >>> is a feature that we want rather than just continuing to push it
> >>> from CF to CF.
> >>>
> >>>
> >>> Yes. I took a look at code of this patch. I think it's in pretty
> >>> good shape. But high level review/discussion is required.
> >>
> >> OK, but I think this patch can only be pushed one more time, maximum,
> >> before it should be rejected.
> >>
> >> Regards,
> >
> > Hi,
> > in my opinion this patch is usually skipped not because it is not
> > needed, but because of its size. It is not hard to maintain it until
> > commiters will have time for it or I will get actual response that
> > nobody is going to commit it.
> >
>
> That may be one of the reasons, yes. But there are other reasons, which
> I think may be playing a bigger role.
>
> There's one practical issue with how the patch is structured - the docs
> and tests are in separate patches towards the end of the patch series,
> which makes it impossible to commit the preceding parts. This needs to
> change. Otherwise the patch size kills the patch as a whole.
>
> But there's a more important cost/benefit issue, I think. When I look at
> patches as a committer, I naturally have to weight how much time I spend
> on getting it in (and then dealing with fallout from bugs etc) vs. what
> I get in return (measured in benefits for community, users). This patch
> is pretty large and complex, so the "costs" are quite high, while the
> benefits from the patch itself is the ability to pick between pg_lz and
> zlib. Which is not great, and so people tend to pick other patches.
>
> Now, I understand there's a lot of potential benefits further down the
> line, like column-level compression (which I think is the main goal
> here). But that's not included in the patch, so the gains are somewhat
> far in the future.
>

Not discussing whether any particular committer should pick this up but I
want to discuss an important use case we have at Adjust for this sort of
patch.

The PostgreSQL compression strategy is something we find inadequate for at
least one of our large deployments (a large debug log spanning 10PB+). Our
current solution is to set storage so that it does not compress and then
run on ZFS to get compression speedups on spinning disks.

But running PostgreSQL on ZFS has some annoying costs because we have
copy-on-write on copy-on-write, and when you add file fragmentation... I
would really like to be able to get away from having to do ZFS as an
underlying filesystem. While we have good write throughput, read
throughput is not as good as I would like.

An approach that would give us better row-level compression would allow us
to ditch the COW filesystem under PostgreSQL approach.

So I think the benefits are actually quite high particularly for those
dealing with volume/variety problems where things like JSONB might be a
go-to solution. Similarly I could totally see having systems which handle
large amounts of specialized text having extensions for dealing with these.

>
> But hey, I think there are committers working for postgrespro, who might
> have the motivation to get this over the line. Of course, assuming that
> there are no serious objections to having this functionality or how it's
> implemented ... But I don't think that was the case.
>

While I am not currently able to speak for questions of how it is
implemented, I can say with very little doubt that we would almost
certainly use this functionality if it were there and I could see plenty of
other cases where this would be a very appropriate direction for some other
projects as well.

> regards
>
> --
> Tomas Vondra http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>

--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2019-03-19 10:00:20 Re: [HACKERS] CLUSTER command progress monitor
Previous Message Amit Langote 2019-03-19 09:53:59 Re: partitioned tables referenced by FKs