Re: SSI bug?

From: yamt(at)mwd(dot)biglobe(dot)ne(dot)jp (YAMAMOTO Takashi)
To: Kevin(dot)Grittner(at)wicourts(dot)gov
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI bug?
Date: 2011-02-14 05:09:23
Message-ID: 20110214050924.2E5E119CE7A@mail.netbsd.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

hi,

all of the following answers are with the patch you provided in
other mail applied.

> YAMAMOTO Takashi <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp> wrote:
>
>> i have seen this actually happen. i've confirmed the creation of
>> the loop with the attached patch. it's easily reproducable with
>> my application. i can provide the full source code of my
>> application if you want. (but it isn't easy to run unless you are
>> familiar with the recent version of NetBSD)
>> i haven't found a smaller reproducible test case yet.
>
> I've never used NetBSD, so maybe a few details will help point me in
> the right direction faster than the source code.
>
> Has your application ever triggered any of the assertions in the
> code? (In particular, it would be interesting if it ever hit the
> one right above where you patched.)

the assertion right above is sometimes triggered. sometimes not.

>
> How long was the loop?

see below.

> Did you notice whether the loop involved multiple tuples within a
> single page?

if i understand correctly, yes.

the following is a snippet of my debug code (dump targets when
triggerCheckTargetForConflictsIn loops >1000 times) and its output.
the same locktag_field3 value means the same page, right?

+ for (t = target, i = 0; t != NULL; i++) {
+ elog(WARNING, "[%u] target %p tag %" PRIx32 ":%" PRIx32 ":%" PRIx32
+ ":%" PRIx16 ":%" PRIx16 " prior %p next %p", i, t,
+ t->tag.locktag_field1,
+ t->tag.locktag_field2,
+ t->tag.locktag_field3,
+ t->tag.locktag_field4,
+ t->tag.locktag_field5,
+ t->priorVersionOfRow,
+ t->nextVersionOfRow);
+ t = t->priorVersionOfRow;
+ if (t == target) {
+ elog(WARNING, "found a loop");
+ break;
+ }
+ }

WARNING: [0] target 0xbb51f238 tag 4000:4017:53b:6c:0 prior 0xbb51f350 next 0xbb51f350
WARNING: [1] target 0xbb51f350 tag 4000:4017:53b:69:0 prior 0xbb51f238 next 0xbb51f238
WARNING: found a loop

another sample:

WARNING: [0] target 0xbb51f530 tag 4000:4017:565:ae:0 prior 0xbb51f1e8 next 0xbb51f300
WARNING: [1] target 0xbb51f1e8 tag 4000:4017:565:ad:0 prior 0xbb51f580 next 0xbb51f530
WARNING: [2] target 0xbb51f580 tag 4000:4017:565:ac:0 prior 0xbb51f300 next 0xbb51f1e8
WARNING: [3] target 0xbb51f300 tag 4000:4017:565:ab:0 prior 0xbb51f530 next 0xbb51f580
WARNING: found a loop

the table seems mostly hot-updated, if it matters.

hoge=# select * from pg_stat_user_tables where relid=16407;
-[ RECORD 1 ]-----+--------------------
relid | 16407
schemaname | pgfs
relname | file
seq_scan | 0
seq_tup_read | 0
idx_scan | 53681
idx_tup_fetch | 52253
n_tup_ins | 569
n_tup_upd | 12054
n_tup_del | 476
n_tup_hot_upd | 12041
n_live_tup | 93
n_dead_tup | 559
last_vacuum |
last_autovacuum |
last_analyze |
last_autoanalyze |
vacuum_count | 0
autovacuum_count | 0
analyze_count | 4922528128875102208
autoanalyze_count | 7598807461784802080

(values in the last two columns seems bogus.
i don't know if it's related or not.)

> Did this coincide with an autovacuum of the table?

no.
(assuming that autovacuum=off in postgresql.conf is enough to exclude
the possibility.)

>
> These last two are of interest because it seems likely that such a
> cycle might be related to this new code not properly allowing for
> some aspect of tuple cleanup.
>
> Thanks for finding this and reporting it, and thanks in advance for
> any further detail you can provide.

thanks for looking.

YAMAMOTO Takashi

>
> -Kevin
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2011-02-14 05:25:57 Re: Sync Rep for 2011CF1
Previous Message Fujii Masao 2011-02-14 05:08:59 Re: Sync Rep for 2011CF1