Re: Remaining 'needs review' patchs in July commitfest

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Remaining 'needs review' patchs in July commitfest
Date: 2015-07-29 14:49:06
Message-ID: alpine.DEB.2.10.1507291537320.27976@sto
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Heikki,

About two patches I submitted:

>> pgbench - allow backslash-continuations in custom scripts
>
> Everyone wants the feature, using multi-line SELECTs in pgbench scripts,
> but we don't seem to be reaching a consensus on how it should work. I
> think we'll need to integrate the lexer, but it would be nice to still
> support multi-statements as well, with some syntax.

I was willing to do a small job of that one, but it drifted to solving the
meta problem of pgbench lexing for doing it "the right way". Kyotaro
HORIGUCHI has taken up the challenge and submitted a 3-part patch to
modify psql lexer, then reuse it "as is" in pgbench. I'm impressed:-)

I think that the approach is a little overkill for the simple targetted
feature, and I'm not sure that it allows for multi-statements, but I have
not checked. Anyway I may try to review it, although not in the short
term. I'm not too optimistic because if passed, it means that the psql
lexer would be used in 2 contexts, so changes in any of them would require
ensuring that it does not break anything in the other one, and I'm not
sure that it is such constraint is desirable. Duplicating the lexer is
not a realistic option either, I do not think that pgbench is worth it.

Anyway, in the mean time, the patch may be switched to "returned with
feedback" or "rejected".

>> checkpoint continuous flushing
>
> This does a big memory allocation at checkpoint, which Tom vehemently
> objects to. I don't much like it either, although I would be OK with a
> more moderately-sized allocation.

AFAICS Tom has not expressed any view on the patch in its current form
(there is no message from Tom on the thread).

ISTM that Tom complained about a year ago about OOM risks on a 2007
version of the sorting part by Takahiro ITAGAKI which was dynamically
allocating and freeing about 24 bytes per buffers on each checkpoint.

The current v5 version uses 4 bytes per buffer at the first run and reuse
the memory, it is allocated once and thus there is no risk of returning it
and failing to get it back, so no significant OOM risk. Maybe the
allocation should be moved earlier when starting the checkpointer process,
though.

A second objection a year ago from Tom was about proof of performance
gains. I've spent quite some time to collect a lot of data to measure
benefits under different loads, representing X00 hours of pgbench runs, as
can be seen in the thread. ISTM that the performance (tps & latency)
figures, for instance:

http://www.postgresql.org/message-id/raw/alpine(dot)DEB(dot)2(dot)10(dot)1506170803210(dot)9794(at)sto

are overwhelmingly in favor of the patch.

If the memory requirement of 4 bytes per buffer is still too much, it is
eventually possible to reduce it with a guc to specify the allowed amount
of memory and some shuffling in the code to do things by chunk.

My opinion is that 4 bytes per buffer is reasonable enough given the
measured benefits. Also, there is more benefits if the whole checkpoint is
sorted instead of just part of it.

I'm really willing to improve the write stall issues which freezes
postgresql when checkpointing on HDD, so if it has to be chunked because
this is a blocker I'll make the effort, but I do not think that it is
useful as such.

> It's not clear on what criteria this should be accepted or rejected.

Given the overall performance gains and reduction in latency, where a lot
of write stalls are avoided or at least greatly reduced, I would be sad
for pg to get it rejected. That does not mean that it cannot be improved.

> What workloads need to be tested?

If you tell me, and provide the matching dedicated host for testing, I can
run tests...

--
Fabien.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-07-29 14:50:53 Re: security labels on databases are bad for dump & restore
Previous Message Tom Lane 2015-07-29 14:48:36 Re: dblink: add polymorphic functions.