Re: brininsert optimization opportunity

From: Soumyadeep Chakraborty <soumyadeep2007(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ashwin Agrawal <ashwinstar(at)gmail(dot)com>
Subject: Re: brininsert optimization opportunity
Date: 2023-07-05 00:35:15
Message-ID: CAE-ML+-9K2h2R7zcvQGq8FarxvSVhgAGZkgH57iQCnWQoxOFww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 4, 2023 at 2:54 PM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:

> I'm not sure if memory context callbacks are the right way to rely on
> for this purpose. The primary purpose of memory contexts is to track
> memory, so using them for this seems a bit weird.

Yeah, this just kept getting dirtier and dirtier.

> There are cases that do something similar, like expandendrecord.c where
> we track refcounted tuple slot, but IMHO there's a big difference
> between tracking one slot allocated right there, and unknown number of
> buffers allocated much later.

Yeah, the following code in ER_mc_callbackis is there I think to prevent double
freeing the tupdesc (since it might be freed in ResourceOwnerReleaseInternal())
(The part about: /* Ditto for tupdesc references */).

if (tupdesc->tdrefcount > 0)
{
if (--tupdesc->tdrefcount == 0)
FreeTupleDesc(tupdesc);
}
Plus the above code doesn't try anything with Resource owner stuff, whereas
releasing a buffer means:
ReleaseBuffer() -> UnpinBuffer() ->
ResourceOwnerForgetBuffer(CurrentResourceOwner, b);

> The fact that even with the extra context is still doesn't handle query
> cancellations is another argument against that approach (I wonder how
> expandedrecord.c handles that, but I haven't checked).
>
> >
> > Maybe there is a better way of doing our cleanup? I'm not sure. Would
> > love your input!
> >
> > The other alternative for all this is to introduce new AM callbacks for
> > insert_begin and insert_end. That might be a tougher sell?
> >
>
> That's the approach I wanted to suggest, more or less - to do the
> cleanup from ExecCloseIndices() before index_close(). I wonder if it's
> even correct to do that later, once we release the locks etc.

I'll try this out and introduce a couple of new index AM callbacks. I
think it's best to do it before releasing the locks - otherwise it
might be weird
to manipulate buffers of an index relation, without having some sort of lock on
it. I'll think about it some more.

> I don't think ii_AmCache was intended for stuff like this - GIN and GiST
> only use it to cache stuff that can be just pfree-d, but for buffers
> that's no enough. It's not surprising we need to improve this.

Hmmm, yes, although the docs state:
"If the index AM wishes to cache data across successive index insertions within
an SQL statement, it can allocate space in indexInfo->ii_Context and
store a pointer
to the data in indexInfo->ii_AmCache (which will be NULL initially)."
they don't mention anything about buffer usage. Well we will fix it!

PS: It should be possible to make GIN and GiST use the new index AM APIs
as well.

> FWIW while debugging this (breakpoint on MemoryContextDelete) I was
> rather annoyed the COPY keeps dropping and recreating the two BRIN
> contexts - brininsert cxt / brin dtuple. I wonder if we could keep and
> reuse those too, but I don't know how much it'd help.
>

Interesting, I will investigate that.

> > Now, to finally answer your question about the speedup without
> > generate_series(). We do see an even higher speedup!
> >
> > seq 1 200000000 > /tmp/data.csv
> > \timing
> > DROP TABLE heap;
> > CREATE TABLE heap(i int);
> > CREATE INDEX ON heap USING brin(i) WITH (pages_per_range=1);
> > COPY heap FROM '/tmp/data.csv';
> >
> > -- 3 runs (master 29cf61ade3f245aa40f427a1d6345287ef77e622)
> > COPY 200000000
> > Time: 205072.444 ms (03:25.072)
> > Time: 215380.369 ms (03:35.380)
> > Time: 203492.347 ms (03:23.492)
> >
> > -- 3 runs (branch v2)
> >
> > COPY 200000000
> > Time: 135052.752 ms (02:15.053)
> > Time: 135093.131 ms (02:15.093)
> > Time: 138737.048 ms (02:18.737)
> >
>
> That's nice, but it still doesn't say how much of that is reading the
> data. If you do just copy into a table without any indexes, how long
> does it take?

So, I loaded the same heap table without any indexes and at the same
scale. I got:

postgres=# COPY heap FROM '/tmp/data.csv';
COPY 200000000
Time: 116161.545 ms (01:56.162)
Time: 114182.745 ms (01:54.183)
Time: 114975.368 ms (01:54.975)

perf diff also attached between the three: w/ no indexes (baseline),
master and v2.

Regards,
Soumyadeep (VMware)

Attachment Content-Type Size
perf_diff_3way.out application/octet-stream 22.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2023-07-05 01:14:59 Re: Extension Enhancement: Buffer Invalidation in pg_buffercache
Previous Message Newhouse, Robin 2023-07-04 23:44:45 [PATCH] Add GitLab CI to PostgreSQL