Re: Loading table with indexed jsonb field is stalling

From: Will Hartung <willhartung(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Will Hartung <willhartung(at)gmail(dot)com>, Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, Ron <ronljohnsonjr(at)gmail(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Loading table with indexed jsonb field is stalling
Date: 2019-05-21 23:26:20
Message-ID: 999C358F-D7BD-4DED-9ED6-AC4E42157427@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


> On May 20, 2019, at 5:31 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Well, you're the only one who's seen this problem, and none of the
> rest of us have any idea how to reproduce it. So if you want something
> to get done in a timely fashion, it's up to you to show us a test case.

So, we had success.

We looked in to maintenance_work_mem.

By default, this value is 64MB.

Our test on AWS was against a small machine, 2 CPU, 16G.

My VM is however many cores it gets (not that it matters) and 4G of RAM.

My VM used the 64MB value for maintenance_work_mem. The AWS small VM used 247M.

We killed the rebuild on the small machine, it was pushing 67 hours.

I was running a load on my machine, and it was still making progress over night, but was at, like 17 hours. But it was moving, but had only loaded 2.2M rows in that time.

We grabbed one of the huge AWS instances. 64 CPU, 488G of ram. Just, silly.

But, that’s fine — I’m good with silly.

It’s mainteance_work_mem was ~8G.

And we loaded all of my files on that instance in about 5.5 hours, about 9M rows per hour.

So, obviously, maintenance_work_mem was the smoking gun. Since I don’t know the details of GIN indexing, its not clear to me how the maintenance_work_mem is utilized with GIN index builds, but, obviously 64M is “not enough”, nor is 247M. And 8G is certainly enough.

We’re cautious just setting these values “to 11” because of the fact that multiple connections can utilize them, so it seems to me that it’s important that they be “generous enough”, but not overly generous.

So this is good news, no bug, and, perhaps, left to its own devices, the DB would have eventually built this index. Whether it would have done so before universal heat death, is a different question.

Can anyone discuss how the maintenance_work_mem is utilized during GIN index creation? On our production systems, this value is set to 1G. And we don’t seem to have any problems for day to day work. This is a very busy table, and we have thousands of inserts/updates daily which seem to proceed well enough. I have not tried to rebuild this index on this system, so I can’t say if 1G is enough to rebuild this index efficiently or not. But its efficient enough for our transaction load.

Now, that said, should I ever be in that situation of having to recover this table like this, I’d have no problem cranking that value up high since it would be the only real connection on the system anyway.

But I’m still curious how the memory is utilized during index builds just to have a better understanding of the nuances of the system.

Thanks all for your help.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Justin Pryzby 2019-05-22 01:34:28 distinguish update from insert (on conflict)
Previous Message Peter Geoghegan 2019-05-21 22:23:00 Re: pg_upgrade can result in early wraparound on databases with high transaction load