Re: brininsert optimization opportunity

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Soumyadeep Chakraborty <soumyadeep2007(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ashwin Agrawal <ashwinstar(at)gmail(dot)com>
Subject: Re: brininsert optimization opportunity
Date: 2023-07-04 21:54:29
Message-ID: 11f78627-a934-1f8b-084e-21ed95f2564c@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7/4/23 21:25, Soumyadeep Chakraborty wrote:
> Thank you both for reviewing!
>
> On Tue, Jul 4, 2023 at 4:24AM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>
>> Hmm, yeah, I remember being bit bothered by this repeated
>> initialization. Your patch looks reasonable to me. I would set
>> bistate->bs_rmAccess to NULL in the cleanup callback, just to be sure.
>> Also, please add comments atop these two new functions, to explain what
>> they are.
>
> Done. Set bistate->bs_desc = NULL; as well. Added comments.
>
>
> On Tue, Jul 4, 2023 at 4:59AM Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
>> Yeah. I wonder how much of that runtime is the generate_series(),
>> though. What's the speedup if that part is subtracted. It's guaranteed
>> to be even more significant, but by how much?
>
> When trying COPY, I got tripped by the following:
>
> We get a buffer leak WARNING for the meta page and a revmap page.
>
> WARNING: buffer refcount leak: [094] (rel=base/156912/206068,
> blockNum=1, flags=0x83000000, refcount=1 1)
> WARNING: buffer refcount leak: [093] (rel=base/156912/206068,
> blockNum=0, flags=0x83000000, refcount=1 1)
>
> PrintBufferLeakWarning bufmgr.c:3240
> ResourceOwnerReleaseInternal resowner.c:554
> ResourceOwnerRelease resowner.c:494
> PortalDrop portalmem.c:563
> exec_simple_query postgres.c:1284
>
> We release the buffer during this resowner release and then we crash
> with:
>
> TRAP: failed Assert("bufnum <= NBuffers"), File:
> "../../../../src/include/storage/bufmgr.h", Line: 305, PID: 86833
> postgres: pivotal test4 [local] COPY(ExceptionalCondition+0xbb)[0x5572b55bcc79]
> postgres: pivotal test4 [local] COPY(+0x61ccfc)[0x5572b537dcfc]
> postgres: pivotal test4 [local] COPY(ReleaseBuffer+0x19)[0x5572b5384db2]
> postgres: pivotal test4 [local] COPY(brinRevmapTerminate+0x1e)[0x5572b4e3fd39]
> postgres: pivotal test4 [local] COPY(+0xcfc44)[0x5572b4e30c44]
> postgres: pivotal test4 [local] COPY(+0x89e7f2)[0x5572b55ff7f2]
> postgres: pivotal test4 [local] COPY(MemoryContextDelete+0xd7)[0x5572b55ff683]
> postgres: pivotal test4 [local] COPY(PortalDrop+0x374)[0x5572b5602dc7]
>
> Unfortunately, when we do COPY, the MemoryContext where makeIndexInfo
> gets called is PortalContext and that is what is set in ii_Context.
> Furthermore, we clean up the resource owner stuff before we can clean
> up the MemoryContexts in PortalDrop().
>
> The CurrentMemoryContext when initialize_brin_insertstate() is called
> depends. For CopyMultiInsertBufferFlush() -> ExecInsertIndexTuples()
> it is PortalContext, and for CopyFrom() -> ExecInsertIndexTuples() it is
> ExecutorState/ExprContext. We can't rely on it to register the callback
> neither.
> > What we can do is create a new MemoryContext for holding the
> BrinInsertState, and we tie the callback to that so that cleanup is not
> affected by all of these variables. See v2 patch attached. Passes make
> installcheck-world and make installcheck -C src/test/modules/brin.
>> However, we do still have 1 issue with the v2 patch:
> When we try to cancel (Ctrl-c) a running COPY command:
> ERROR: buffer 151 is not owned by resource owner TopTransaction
>

I'm not sure if memory context callbacks are the right way to rely on
for this purpose. The primary purpose of memory contexts is to track
memory, so using them for this seems a bit weird.

There are cases that do something similar, like expandendrecord.c where
we track refcounted tuple slot, but IMHO there's a big difference
between tracking one slot allocated right there, and unknown number of
buffers allocated much later.

The fact that even with the extra context is still doesn't handle query
cancellations is another argument against that approach (I wonder how
expandedrecord.c handles that, but I haven't checked).

>
> Maybe there is a better way of doing our cleanup? I'm not sure. Would
> love your input!
>
> The other alternative for all this is to introduce new AM callbacks for
> insert_begin and insert_end. That might be a tougher sell?
>

That's the approach I wanted to suggest, more or less - to do the
cleanup from ExecCloseIndices() before index_close(). I wonder if it's
even correct to do that later, once we release the locks etc.

I don't think ii_AmCache was intended for stuff like this - GIN and GiST
only use it to cache stuff that can be just pfree-d, but for buffers
that's no enough. It's not surprising we need to improve this.

FWIW while debugging this (breakpoint on MemoryContextDelete) I was
rather annoyed the COPY keeps dropping and recreating the two BRIN
contexts - brininsert cxt / brin dtuple. I wonder if we could keep and
reuse those too, but I don't know how much it'd help.

> Now, to finally answer your question about the speedup without
> generate_series(). We do see an even higher speedup!
>
> seq 1 200000000 > /tmp/data.csv
> \timing
> DROP TABLE heap;
> CREATE TABLE heap(i int);
> CREATE INDEX ON heap USING brin(i) WITH (pages_per_range=1);
> COPY heap FROM '/tmp/data.csv';
>
> -- 3 runs (master 29cf61ade3f245aa40f427a1d6345287ef77e622)
> COPY 200000000
> Time: 205072.444 ms (03:25.072)
> Time: 215380.369 ms (03:35.380)
> Time: 203492.347 ms (03:23.492)
>
> -- 3 runs (branch v2)
>
> COPY 200000000
> Time: 135052.752 ms (02:15.053)
> Time: 135093.131 ms (02:15.093)
> Time: 138737.048 ms (02:18.737)
>

That's nice, but it still doesn't say how much of that is reading the
data. If you do just copy into a table without any indexes, how long
does it take?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2023-07-04 22:08:52 Re: Parallel CREATE INDEX for BRIN indexes
Previous Message Matthias van de Meent 2023-07-04 21:53:41 Re: Parallel CREATE INDEX for BRIN indexes