From:
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:
Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
Cc:
Xiao Meng <mx(dot)cogito(at)gmail(dot)com>, pgsql-patches(at)postgresql(dot)org,
"Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Subject:
Re: hash index improving v3
Date:
2008-09-04 23:11:28
Message-ID:
7627.1220569888@sss.pgh.pa.us (view raw or flat )
Thread:
2008-08-18 01:46:06 from "Xiao Meng" <mx(dot)cogito(at)gmail(dot)com>
2008-08-19 09:28:38 from "Xiao Meng" <mx(dot)cogito(at)gmail(dot)com>
2008-09-04 02:06:42 from Simon Riggs <simon(at)2ndQuadrant(dot)com>
2008-09-04 04:10:06 from "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
2008-09-04 05:35:16 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2008-09-04 13:03:45 from "Xiao Meng" <mx(dot)cogito(at)gmail(dot)com>
2008-09-04 16:57:04 from Simon Riggs <simon(at)2ndQuadrant(dot)com>
2008-09-05 13:05:38 from "Xiao Meng" <mx(dot)cogito(at)gmail(dot)com>
2008-09-04 12:54:37 from Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
2008-09-04 20:06:23 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2008-09-04 20:28:34 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2008-09-04 23:11:28 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2008-09-05 00:42:53 from "Alex Hunsaker" <badalex(at)gmail(dot)com>
2008-09-05 01:13:18 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2008-09-05 01:45:53 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2008-09-05 01:54:51 from "Alex Hunsaker" <badalex(at)gmail(dot)com>
2008-09-05 02:17:07 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2008-09-05 03:48:41 from "Alex Hunsaker" <badalex(at)gmail(dot)com>
2008-09-05 06:32:16 from "Alex Hunsaker" <badalex(at)gmail(dot)com>
2008-09-05 06:43:14 from Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
2008-09-05 16:12:01 from "Alex Hunsaker" <badalex(at)gmail(dot)com>
2008-09-05 20:21:36 from "Alex Hunsaker" <badalex(at)gmail(dot)com>
2008-09-06 05:49:05 from "Alex Hunsaker" <badalex(at)gmail(dot)com>
2008-09-06 18:14:26 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2008-09-06 19:09:38 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2008-09-07 02:23:05 from "Alex Hunsaker" <badalex(at)gmail(dot)com>
2008-09-07 02:24:33 from "Alex Hunsaker" <badalex(at)gmail(dot)com>
2008-09-09 13:48:39 from Kenneth Marshall <ktm(at)rice(dot)edu>
2008-09-10 01:23:03 from "Alex Hunsaker" <badalex(at)gmail(dot)com>
2008-09-10 02:45:10 from "Alex Hunsaker" <badalex(at)gmail(dot)com>
2008-09-10 03:04:58 from "Alex Hunsaker" <badalex(at)gmail(dot)com>
2008-09-10 14:47:21 from Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
2008-09-10 16:27:24 from "Alex Hunsaker" <badalex(at)gmail(dot)com>
2008-09-11 03:45:24 from "Alex Hunsaker" <badalex(at)gmail(dot)com>
2008-09-12 09:14:58 from Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
2008-09-12 16:32:57 from "Alex Hunsaker" <badalex(at)gmail(dot)com>
2008-09-10 13:04:01 from Kenneth Marshall <ktm(at)rice(dot)edu>
2008-09-11 03:49:25 from "Alex Hunsaker" <badalex(at)gmail(dot)com>
2008-09-11 04:17:31 from "Alex Hunsaker" <badalex(at)gmail(dot)com>
2008-09-11 15:24:38 from Kenneth Marshall <ktm(at)rice(dot)edu>
2008-09-12 02:51:53 from "Alex Hunsaker" <badalex(at)gmail(dot)com>
2008-09-12 14:29:21 from Kenneth Marshall <ktm(at)rice(dot)edu>
2008-09-15 02:16:52 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2008-09-23 03:25:03 from "Alex Hunsaker" <badalex(at)gmail(dot)com>
2008-09-23 03:30:58 from "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
2008-09-23 04:05:59 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2008-09-23 04:31:02 from Simon Riggs <simon(at)2ndQuadrant(dot)com>
2008-09-23 04:48:36 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2008-09-23 05:15:36 from Simon Riggs <simon(at)2ndQuadrant(dot)com>
2008-09-23 12:16:34 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2008-09-23 13:05:15 from Simon Riggs <simon(at)2ndQuadrant(dot)com>
2008-09-23 13:13:14 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2008-09-23 13:27:02 from Simon Riggs <simon(at)2ndQuadrant(dot)com>
2008-09-23 13:34:39 from Simon Riggs <simon(at)2ndQuadrant(dot)com>
2008-09-24 16:04:22 from Bruce Momjian <bruce(at)momjian(dot)us>
2008-09-24 16:23:38 from Simon Riggs <simon(at)2ndQuadrant(dot)com>
2008-09-23 01:57:16 from "Alex Hunsaker" <badalex(at)gmail(dot)com>
2008-09-23 03:43:14 from "Alex Hunsaker" <badalex(at)gmail(dot)com>
2008-09-08 14:20:15 from Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
2008-09-08 14:49:18 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2008-09-08 08:58:09 from Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
2008-09-05 01:51:14 from "Alex Hunsaker" <badalex(at)gmail(dot)com>
2008-09-05 02:19:05 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Lists:
pgsql-hackers pgsql-patches
Here is an updated patch incorporating Zdenek's review, my own
observation that we should make the index tupledesc tell the truth,
and some other fixes/improvements such as making backwards scans
work as expected.
The main thing lacking before this could be committed, from a code
standpoint, is a cleaner solution to the problem of adjusting the
index tupledesc (see the ugly hack in catalog/index.c). However,
that complaint is irrelevant for functionality or performance testing,
so I'm throwing this back out there in hopes someone will do some...
I thought a little bit about how to extend this to store both hashcode
and original index key, and realized that the desire to have a truthful
index tupledesc makes that a *whole* lot harder. The planner, and
really even the pg_index catalog representation, assume that the visible
columns of an index are one-for-one with the index keys. We can slide
through with the attached patch because this is still true ---
effectively we're just using a "storage type" different from the indexed
column's type for hash indexes, as already works for GIST and GIN.
But having two visible columns would bollix up quite a lot of stuff.
So I think if we actually want to do that, we'd need to revert to the
concept of cheating on the tupledesc. Aside from the various uglinesses
that I was able to remove from the original patch by not having that,
I'm still quite concerned that we'd find something else wrong with
doing that, further down the road.
So my thinking right now is that we should just test this patch as-is.
If it doesn't show really horrid performance when there are lots of
hash key collisions, we should forget the store-both-things idea and
just go with this.
regards, tom lane
In response to
Responses
pgsql-hackers by date
Next :From: Tom LaneDate: 2008-09-04 23:39:21
Subject : Re: Need more reviewers!
Previous :From : Alex HunsakerDate : 2008-09-04 23:10:22
Subject : Re: Need more reviewers!
pgsql-patches by date
Next :From: Alex HunsakerDate: 2008-09-05 00:42:53
Subject : Re: hash index improving v3
Previous :From : Heikki LinnakangasDate : 2008-09-04 22:31:15
Subject : Re: [HACKERS] TODO item: Implement Boyer-Moore searching
(First time hacker)