Re: Yet another fast GiST build

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Darafei Komяpa Praliaskouski <me(at)komzpa(dot)net>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Yet another fast GiST build
Date: 2020-09-21 08:08:01
Message-ID: 8269ca22-d275-0afe-633c-2342c7ba11e6@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 21/09/2020 02:06, Tom Lane wrote:
> Justin Pryzby <pryzby(at)telsasoft(dot)com> writes:
>> This also appears to break checksums.

Thanks, I'll go fix it.

> I was wondering about that, because the typical pattern for use of
> smgrextend for indexes seems to be
>
> RelationOpenSmgr(rel);
> PageSetChecksumInplace(page, lastblock);
> smgrextend(rel->rd_smgr, MAIN_FORKNUM, lastblock, zerobuf.data, false);
>
> and gist_indexsortbuild wasn't doing either of the first two things.
>
> gist_indexsortbuild_flush_ready_pages looks like it might be
> a few bricks shy of a load too. But my local CLOBBER_CACHE_ALWAYS
> run hasn't gotten to anything except the pretty-trivial index
> made in point.sql, so I don't have evidence about it.

I don't think a relcache invalidation can happen on the index we're
building. Other similar callers call RelationOpenSmgr(rel) before every
write though (e.g. _bt_blwritepage()), so perhaps it's better to copy
that pattern here too.

> Another interesting point is that all the other index AMs seem to WAL-log
> the new page before the smgrextend call, whereas this code is doing it
> in the other order. I strongly doubt that both patterns are equally
> correct. Could be that the other AMs are in the wrong though.

My thinking was that it's better to call smgrextend() first, so that if
you run out of disk space, you get the error before WAL-logging it. That
reduces the chance that WAL replay will run out of disk space. A lot of
things are different during WAL replay, so it's quite likely that WAL
replay runs out of disk space anyway if you're living on the edge, but
still.

I didn't notice that the other callers are doing it the other way round,
though. I think they need to, so that they can stamp the page with the
LSN of the WAL record. But GiST build is special in that regard, because
it stamps all pages with GistBuildLSN.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2020-09-21 08:45:59 Re: Yet another fast GiST build
Previous Message Peter Eisentraut 2020-09-21 07:16:47 Re: VACUUM PARALLEL option vs. max_parallel_maintenance_workers