Re: Wrong value in metapage of GIN INDEX.

From: keisuke kuroda <keisuke(dot)kuroda(dot)3862(at)gmail(dot)com>
To: "Moon, Insung" <tsukiwamoon(dot)pgsql(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Wrong value in metapage of GIN INDEX.
Date: 2019-08-30 02:48:47
Message-ID: CANDwggLfKsT28YhcoR6k3N-k1Aw887vT5-1cUXZ0XKPBvHw4Rw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Moon-san.

Thank you for posting.

We are testing the GIN index onJSONB type.
The default maintenance_work_mem (64MB) was fine in usually cases.
However, this problem occurs when indexing very large JSONB data.

best regards,
Keisuke Kuroda

2019年8月29日(木) 17:20 Moon, Insung <tsukiwamoon(dot)pgsql(at)gmail(dot)com>:

> Dear Hackers.
>
> Kuroda-san and I are interested in the GIN index and have been testing
> various things.
> While testing, we are found a little bug.
> Some cases, the value of nEntries in the metapage was set to the wrong
> value.
>
> This is a reproduce of bug situation.
> =# SET maintenance_work_mem TO '1MB';
> =# CREATE TABLE foo(i jsonb);
> =# INSERT INTO foo(i) select jsonb_build_object('foobar001', i) FROM
> generate_series(1, 10000) AS i;
>
> # Input the same value again.
> =# INSERT INTO foo(i) select jsonb_build_object('foobar001', i) FROM
> generate_series(1, 10000) AS i;
> # Creates GIN Index.
> =# CREATE INDEX foo_idx ON foo USING gin (i jsonb_ops);
>
>
> =# SELECT * FROM gin_metapage_info(get_raw_page('foo_idx', 0)) WITH
> (fastupdate=off);
> -[ RECORD 1 ]----+-----------
> pending_head | 4294967295
> pending_tail | 4294967295
> tail_free_size | 0
> n_pending_pages | 0
> n_pending_tuples | 0
> n_total_pages | 74
> n_entry_pages | 69
> n_data_pages | 4
> n_entries | 20004 <--★
> version | 2
>
> In this example, the nentries value should be 10001 because the gin
> index stores duplicate values in one leaf(posting tree or posting
> list).
> But, if look at the nentries value of metapage using pageinspect, it
> is stored as 20004.
> So, Let's run the vacuum.
>
>
> =# VACUUM foo;
> =# SELECT * FROM gin_metapage_info(get_raw_page('foo_idx', 0));
> -[ RECORD 1 ]----+-----------
> pending_head | 4294967295
> pending_tail | 4294967295
> tail_free_size | 0
> n_pending_pages | 0
> n_pending_tuples | 0
> n_total_pages | 74
> n_entry_pages | 69
> n_data_pages | 4
> n_entries | 10001 <--★
> version | 2
>
> Ah. Run to the vacuum, nEntries is changing the normal value.
>
> There is a problem with the ginEntryInsert function. That calls the
> table scan when creating the gin index, ginBuildCallback function
> stores the new heap value inside buildstate struct.
> And next step, If GinBuildState struct is the size of the memory to be
> using is equal to or larger than the maintenance_work_mem value, run
> to input value into the GIN index.
> This process is a function called ginEnctryInsert.
> The ginEntryInsert function called at this time determines that a new
> entry is added and increase the value of nEntries.
> However currently, ginEntryInsert is first to increase in the value of
> nEntries, and to determine if there are the same entries in the
> current GIN index.
> That causes the bug.
>
> The patch is very simple.
> Fix to increase the value of nEntries only when a non-duplicate GIN
> index leaf added.
>
> This bug detection and code fix worked with Kuroda-san.
>
> Best Regards.
> Moon.
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2019-08-30 03:15:37 Re: A problem about partitionwise join
Previous Message Michael Paquier 2019-08-30 00:10:16 Re: REINDEX filtering in the backend