Quick Links

Re: RFC/WIP: adding new configuration options to TOAST

From:	Bill Moran <wmoran(at)potentialtech(dot)com>
To:	Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: RFC/WIP: adding new configuration options to TOAST
Date:	2015-11-04 12:48:05
Message-ID:	20151104074805.f5ad62aae853ca991a5a69d4@potentialtech.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, 4 Nov 2015 13:07:09 +0800
Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:

> On 4 November 2015 at 10:58, Bill Moran <wmoran(at)potentialtech(dot)com> wrote:
> > On Tue, 3 Nov 2015 18:34:39 -0800
> > Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> >
> >> On Tue, Nov 3, 2015 at 5:21 PM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
> >> > On 3 November 2015 at 23:04, Bill Moran <wmoran(at)potentialtech(dot)com> wrote:
> >> >>
> >> >> Looking for feedback to see if anyone sees any issues or has any
> >> >> suggestions on what I'm doing. The attached patch alters 3 things
> >> >> with regard to TOAST behavior:
> >> >
> >> > COMPRESSION_TEST_SIZE (2) seems useful.
> >> >
> >> > The other two mostly seem like options nobody's going to know are
> >> > there, or know how to sensibly set if they do notice them. What's the
> >> > driving reason behind those, the problem you're trying to solve? Why
> >> > make them configurable per-table (or at all)?
> >>
> >> I currently have a table with one column which has a median width of
> >> 500 bytes, a 90th percentile of 650 bytes, and makes up 75% of the
> >> table's size, and the column is rarely used, while the table itself is
> >> frequently seq scanned. I'd very much like to drive that column out
> >> of main and into toast. I think target_tuple_size would let me do
> >> that.
> >
> > That's exactly the use case. As it currently stands, any tuple smaller
> > than about 2K will never be toasted. So if you have 1900 bytes of
> > highly compressible text that is infrequently queried from the table
> > whilst other columns are frequently accessed, there's no way to force
> > it to be out of line from the main table, or be compressed.
>
> Ok, so that's the underlying issue to solve. Make smaller tuples
> TOASTable, especially when quite compressible.
>
> Shouldn't that be column-level, really?
>
> We have SET STORAGE at the moment. Would some sort of "FORCE" option
> to SET STORAGE EXTERNAL meet your needs? It'd potentially force small
> data out of line too, but that'd make sense for your rarely accessed
> use case.

I'm not discounting the potential value of column-level tunables. But
I don't feel that their potential value devalues table-level and
cluster-wide tunables.

As the code currently stands, TOASTing is completely skipped for
tuples smaller than 2k, which means that FORCE* storage types would
need to behave differently than other storage options. Not off the
table, of course, but it seemed unintuitive to me.

A lot of this was discussed previously in the threads linked here:
https://github.com/williammoran/postgres/blob/master/README

One important point is that it's not 100% clear that these tunables
are worthwhile (as you mention) but there's no way to be sure until
there is a prototype in place that can be used for testing ... which
is what this is all about (at this point, anyway). Getting to the
point where there are table-level tunables allows for convenient
testing (i.e. having two tables with different values and the same
data, and not having to restart/recreate them for each value to be
tested).

> > The two new configurables allow the DBA to make tradeoff decisions on
> > CPU usage vs. storage efficiency. Since the TOAST code attempts to
> > process the column that will provide the largest gain first, in your
> > described use case you could calculate the size of the other columns,
> > and set the target_tuple_size to just a bit larger than that, and
> > that large column should get moved into the toast table in most or
> > all cases (depending on how predictable the other sizes are)
>
> I'm just concerned that this is another knob that 0.001% of the user
> base will know about and use correctly, 1% will use incorrectly based
> on some cargo-culted nonsense they pick up somewhere, and the rest
> will have no clue exists. We have more than a few of those already.

From my perspective, I've seen people misunderstand and misuse the
simplest of things, so I don't hold to the idea that just because its
a little complicated means it's a bad idea. Certainly, it needs to
be as understandable as possible. This page will certianly need a
significant rewrite:
http://www.postgresql.org/docs/9.5/static/storage-toast.html

To me, the important factor is whether these tunables can make
Postgres more valuable to a skilled administrator. If they can, then
the fact that a cargo-cult exists to misuse them is simply
entertainment to me. It ensures that companies like 2ndquadrant will
always have work fixing other people's mistakes ;)

> The way you describe using a GUC here makes it sound like what you
> really want is just a column storage option, and that twiddling a GUC
> like this is a workaround to try to make one column more aggressively
> compressed and moved out of line without affecting the others.

Not really. The fact that I've only listed one case doesn't mean it's
the only one. Perhaps I need to enumerate all the potential benefits
somewhere ... but when I first researched and proposed this project,
I didn't see anywhere near the level of disagreement that I'm seeing
now. I'll try to enumerate all the potential benefits and get them
on a web page somewhere today.

> > Compression is a similarly hard-coded value in current versions.
> > I feel that allowing the DBA to control how much savings is required
> > before incurring the overhead of compression is worthwhile, especially
> > when considered on a per-table basis. For example, the compression
> > on an archive table could be very aggressive, whereas compression on
> > a frequently accessed table might only be justified if it saves a lot
> > of space. How much space compression saves is highly dependent on the
> > data being stored.
>
> Yes, I can see value in making attempts at compression more (or less)
> aggressive. In that regard it's a pity the pluggable compression
> support didn't go anywhere really, because the current algorithm is
> cpu-cheap at the cost of pretty poor compression ratios.

This work had to be done before pluggable compression could be implemented
anyway. What value is there to switching compression algorithms if they
never get used for tuples under 2k and you have no way to change that?

On a related note, since I've been digging in this area of code I feel
comfortable commenting that pluggable compression isn't terribly difficult
to implement. The most significant work will probably be in actually
implementing the various algorithms, not in making them pluggable. I've
been considering making that my next project.

> >> (Per-column control would be even nicer, but I'd take what I can get)
> >
> > Oddly, I hadn't considered getting as granualar as per-column, but
> > now that you've got me thinking about it, it seems like a logical
> > step to take.
>
> I think per-column is really where it makes sense, if it's to be done
> at all. At least based on the use cases given.

Per column is certianly possible, but as you mention, it's still not
clear that these changes are valuable at all. Additionally, the code
required to make per-column tunables will be mostly implemented in
the process of making per-table tunables, and I can always just rip
out the GUC definitions and clean up the code if that actually seems
like the best result.

Overall, I don't feel like you're actually disagreeing with
anything I'm doing ... you're just wishing I was already further
along.

--
Bill Moran

In response to

Re: RFC/WIP: adding new configuration options to TOAST at 2015-11-04 05:07:09 from Craig Ringer

Responses

Re: RFC/WIP: adding new configuration options to TOAST at 2015-11-04 14:39:22 from Craig Ringer

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alvaro Herrera	2015-11-04 12:54:10	Re: Valgrind and shared_buffers (Was: Restore-reliability mode)
Previous Message	Pavel Stehule	2015-11-04 12:09:28	Re: [patch] Proposal for \rotate in psql