Re: [HACKERS] Index creation takes for ever

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Manfred Koizar <mkoi-pg(at)aon(dot)at>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-patches(at)postgresql(dot)org, ohp(at)pyrenet(dot)fr, pgsql-hackers list <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Index creation takes for ever
Date: 2003-09-07 15:43:42
Message-ID: 200309071543.h87Fhg305982@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches


I assume this completes this TODO:

* Order duplicate index entries by tid for faster heap lookups

and you will submit it for 7.5? If you want to post it now, I can get
it into the 7.5 queue so we don't forget it.

---------------------------------------------------------------------------

Manfred Koizar wrote:
> On Mon, 01 Sep 2003 08:46:09 -0400, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> wrote:
> >ohp(at)pyrenet(dot)fr writes:
> >> it took 69 minutes to finish, 75% of this time was devoted to create 2
> >> indexes on varchar(2) with value being 'O', 'N' or null;
> >
> >I still say it's either strcoll or qsort's fault.
>
> If qsort is to blame, then maybe this patch could help. It sorts
> equal key values on item pointer. And if it doesn't help index
> creation speed, at least the resulting index has better correlation.
>
> Test script:
> CREATE TABLE t (i int NOT NULL, t text NOT NULL);
> INSERT INTO t VALUES (1, 'lajshdflasjhdflajhsdfljhasdlfjhasdf');
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t VALUES (100, 's,dmfa.,smdn.famsndfamdnsbfmansdbf');
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> ANALYZE t;
> CREATE INDEX t_i ON t(i);
> SET enable_seqscan = 0;
> SELECT ctid FROM t WHERE i=100 LIMIT 10;
>
> Result without patch:
> ctid
> ----------
> (153,14)
> (306,23)
> (305,80)
> (152,91)
> (76,68)
> (38,34)
> (153,34)
> (305,50)
> (9,62)
> (305,40)
> (10 rows)
>
> Result with patch:
> ctid
> --------
> (0,5)
> (0,10)
> (0,15)
> (0,20)
> (0,25)
> (0,30)
> (0,35)
> (0,40)
> (0,45)
> (0,50)
> (10 rows)
>
> For testing purposes I have made a second patch that provides a
> boolean GUC variable sort_index. It is available here:
> http://www.pivot.at/pg/23.test-IdxTupleSort.diff
>
> Servus
> Manfred

> diff -ruN ../base/src/backend/utils/sort/tuplesort.c src/backend/utils/sort/tuplesort.c
> --- ../base/src/backend/utils/sort/tuplesort.c 2003-08-17 21:58:06.000000000 +0200
> +++ src/backend/utils/sort/tuplesort.c 2003-09-05 10:04:22.000000000 +0200
> @@ -2071,6 +2071,33 @@
> (errcode(ERRCODE_UNIQUE_VIOLATION),
> errmsg("could not create unique index"),
> errdetail("Table contains duplicated values.")));
> + else
> + {
> + /*
> + * If key values are equal, we sort on ItemPointer. This might help
> + * for some bad qsort implementation having performance problems
> + * with many equal items. OTOH I wouldn't trust such a weak qsort
> + * to handle pre-sorted sequences very well ...
> + *
> + * Anyway, this code doesn't hurt much, and it helps produce indices
> + * with better index correlation which is a good thing per se.
> + */
> + ItemPointer tid1 = &tuple1->t_tid;
> + ItemPointer tid2 = &tuple2->t_tid;
> + BlockNumber blk1 = ItemPointerGetBlockNumber(tid1);
> + BlockNumber blk2 = ItemPointerGetBlockNumber(tid2);
> +
> + if (blk1 != blk2)
> + return (blk1 < blk2) ? -1 : 1;
> + else
> + {
> + OffsetNumber pos1 = ItemPointerGetOffsetNumber(tid1);
> + OffsetNumber pos2 = ItemPointerGetOffsetNumber(tid2);
> +
> + if (pos1 != pos2)
> + return (pos1 < pos2) ? -1 : 1;
> + }
> + }
>
> return 0;
> }

>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2003-09-07 15:46:20 Re: pg_id and pg_encoding
Previous Message Andrew Dunstan 2003-09-07 13:21:58 pg_id and pg_encoding

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2003-09-07 15:57:36 Re: [HACKERS] Index creation takes for ever
Previous Message Bruce Momjian 2003-09-07 15:40:33 Re: [PATCHES] MinGW patch