Re: WIP: store additional info in GIN index

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: store additional info in GIN index
Date: 2012-12-04 21:56:19
Message-ID: 50BE7183.2080807@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4.12.2012 20:12, Alexander Korotkov wrote:
> Hi!
>
> On Sun, Dec 2, 2012 at 5:02 AM, Tomas Vondra <tv(at)fuzzy(dot)cz
> <mailto:tv(at)fuzzy(dot)cz>> wrote:
>
> I've tried to apply the patch with the current HEAD, but I'm getting
> segfaults whenever VACUUM runs (either called directly or from autovac
> workers).
>
> The patch applied cleanly against 9b3ac49e and needed a minor fix when
> applied on HEAD (because of an assert added to ginRedoCreatePTree), but
> that shouldn't be a problem.
>
>
> Thanks for testing! Patch is rebased with HEAD. The bug you reported was
> fixed.

Applies fine, but I get a segfault in dataPlaceToPage at gindatapage.c.
The whole backtrace is here: http://pastebin.com/YEPuWeuV

The messages written into PostgreSQL log are quite variable - usually it
looks like this:

2012-12-04 22:31:08 CET 31839 LOG: database system was not properly
shut down; automatic recovery in progress
2012-12-04 22:31:08 CET 31839 LOG: redo starts at 0/68A76E48
2012-12-04 22:31:08 CET 31839 LOG: unexpected pageaddr 0/1BE64000 in
log segment 000000010000000000000069, offset 15089664
2012-12-04 22:31:08 CET 31839 LOG: redo done at 0/69E63638

but I've seen this message too

2012-12-04 22:20:29 CET 31709 LOG: database system was not properly
shut down; automatic recovery in progress
2012-12-04 22:20:29 CET 31709 LOG: redo starts at 0/AEAFAF8
2012-12-04 22:20:29 CET 31709 LOG: record with zero length at 0/C7D5698
2012-12-04 22:20:29 CET 31709 LOG: redo done at 0/C7D55E

I wasn't able to prepare a simple testcase to reproduce this, so I've
attached two files from my "fun project" where I noticed it. It's a
simple DB + a bit of Python for indexing mbox archives inside Pg.

- create.sql - a database structure with a bunch of GIN indexes on
tsvector columns on "messages" table

- load.py - script for parsing mbox archives / loading them into the
"messages" table (warning: it's a bit messy)

Usage:

1) create the DB structure
$ createdb archives
$ psql archives < create.sql

2) fetch some archives (I consistently get SIGSEGV after first three)
$ wget
http://archives.postgresql.org/pgsql-hackers/mbox/pgsql-hackers.1997-01.gz
$ wget
http://archives.postgresql.org/pgsql-hackers/mbox/pgsql-hackers.1997-02.gz
$ wget
http://archives.postgresql.org/pgsql-hackers/mbox/pgsql-hackers.1997-03.gz

3) gunzip and load them using the python script
$ gunzip pgsql-hackers.*.gz
$ ./load.py --db archives pgsql-hackers.*

4) et voila - a SIGSEGV :-(

I suspect this might be related to the fact that the load.py script uses
savepoints quite heavily to handle UNIQUE_VIOLATION (duplicate messages).

Tomas

Attachment Content-Type Size
load.py text/x-python 12.3 KB
create.sql text/plain 8.8 KB
trace.log text/plain 4.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-12-04 22:30:29 Re: ALTER TABLE ... NOREWRITE option
Previous Message Andreas Karlsson 2012-12-04 21:47:08 Re: Tablespaces in the data directory