Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets

From: "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Joshua Tolley" <eggyknap(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>, "Bryce Cutt" <pandasuit(at)gmail(dot)com>, "Michael Henderson" <mikecubed(at)gmail(dot)com>
Subject: Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
Date: 2008-11-03 02:34:20
Message-ID: 6EEA43D22289484890D119821101B1DF2C16C1@exchange20.mercury.ad.ubc.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> What alternatives are there for people who do not run Windows?
>
> regards, tom lane

The TPC-H generator is a standard code base provided at
http://www.tpc.org/tpch/. We have been able to compile this code on
Linux.

However, we were unable to get the Microsoft modifications to this code
to compile on Linux (although they are supposed to be portable). So, we
just used the Windows version with wine on our test Debian machine.

I have also posted the text files for the TPC-H 1G 1Z data set at:

http://people.ok.ubc.ca/rlawrenc/tpch1g1z.zip

Note that you need to trim the extra characters at the end of the lines
for PostgreSQL to read them properly.

Since the data takes a while to generate and load, we can also provide a
compressed version of the PostgreSQL data directory of the databases
with the data already loaded.

--
Ramon Lawrence

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2008-11-03 03:22:06 Re: array_agg and array_accum (patch)
Previous Message Tom Lane 2008-11-03 01:36:24 Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets