Skip site navigation (1) Skip section navigation (2)

Values list-of-targetlists patch for comments (was Re: [HACKERS] 8.2 features?)

From: Joe Conway <mail(at)joeconway(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Christopher Kings-Lynne <chris(dot)kings-lynne(at)calorieking(dot)com>, "Patches (PostgreSQL)" <pgsql-patches(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bernd Helmle <mailings(at)oopsware(dot)de>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Susanne Ebrecht <susanne(dot)ebrecht(at)credativ(dot)de>
Subject: Values list-of-targetlists patch for comments (was Re: [HACKERS] 8.2 features?)
Date: 2006-07-24 03:57:14
Message-ID: 44C4451A.4010906@joeconway.com (view raw or flat)
Thread:
Lists: pgsql-docspgsql-hackerspgsql-patches
Tom Lane wrote:
> Joe Conway <mail(at)joeconway(dot)com> writes:
> 
>>I'm liking this too. But when you say "jointree node", are you saying to 
>>model the new node type after NestLoop/MergeJoin/HashJoin nodes? These 
>>are referred to as "join nodes" in ExecInitNode. Or as you mentioned a 
>>couple of times, should this look more like an Append node?
> 
> 
> No, I guess I confused you by talking about the executor representation
> at the same time.  This is really unrelated to the executor.  The join
> tree I'm thinking of here is the data structure that dangles off
> Query.jointree --- it's a representation of the query's FROM clause,
> and (at present) can contain RangeTblRef, FromExpr, and JoinExpr nodes.
> See the last hundred or so lines of primnodes.h for some details.
> The jointree is used by the planner to compute the plan node tree that
> the executor will run, but it's not the same thing.
> 
> There are basically two ways you could go about this:
> 1. Make a new jointree leaf node type to represent a VALUES construct,
>    and dangle the list of lists of expressions off that.
> 2. Make a new RangeTblEntry type to represent a VALUES construct, and
>    just put a RangeTblRef to it into the jointree.  The expressions
>    dangle off the RangeTblEntry.
> 
> Offhand I'm not certain which of these would be cleanest.  The second
> way has some similarities to the way we handle set operation trees
> (UNION et al), so it might be worth looking at that stuff.  However,
> being a RangeTblEntry has a lot of baggage (eg, various routines expect
> to find an RTE alias, column names, column types, etc) and maybe we
> don't need all that for VALUES.

Since the feature freeze is only about a week off, I wanted to post this 
patch even though it is not yet ready to be applied.

Executive summary:
==================
1. The patch is now large and invasive based on adding new node
    types and associated infrastructure. I modelled the nodes largely
    on RangeFunction and FunctionScan.
2. Performance is close enough to mysql to not be a big issue (I think,
    more data below) as long as the machine does not get into a memory
    swapping regime. Memory usage is now better, but not as good as
    mysql.
3. I specifically coded with the intent of preserving current insert
    statement behavior and code paths for current functionality. So there
    *should* be no performance degradation or subtle semantics changes
    for "INSERT DEFAULT VALUES", "INSERT ... VALUES (with one target
    list)", "INSERT ... SELECT ...". Even Tom's recently discovered
    "insert into foo values (tenk1.*)" still works ;-)

Performance:
============
On my development machine (dual core amd64, 2GB RAM) I get the following 
results using the php script posted earlier:

Postgres:
---------
$loopcount = 100000;
multi-INSERT-at-once Elapsed time is 1 second

$loopcount = 300000;
multi-INSERT-at-once Elapsed time is 5 seconds

$loopcount = 500000;
multi-INSERT-at-once Elapsed time is 9 seconds

$loopcount = 800000;
multi-INSERT-at-once Elapsed time is 14 seconds

$loopcount = 900000;
multi-INSERT-at-once Elapsed time is 17 seconds

$loopcount = 1000000;
multi-INSERT-at-once Elapsed time is 42 seconds

$loopcount = 2000000;
killed after 5 minutes due to swapping

MySQL:
------
$loopcount = 100000;
multi-INSERT-at-once Elapsed time is 2 seconds

$loopcount = 300000;
INSERT failed:Got a packet bigger than 'max_allowed_packet' bytes
changed max_allowed_packet=64M
multi-INSERT-at-once Elapsed time is 5 seconds

$loopcount = 500000;
multi-INSERT-at-once Elapsed time is 8 seconds

$loopcount = 800000;
multi-INSERT-at-once Elapsed time is 13 seconds

$loopcount = 900000;
multi-INSERT-at-once Elapsed time is 15 seconds

$loopcount = 1000000;
multi-INSERT-at-once Elapsed time is 17 seconds

$loopcount = 2000000;
multi-INSERT-at-once Elapsed time is 36 seconds

$loopcount = 3000000;
multi-INSERT-at-once Elapsed time is 54 seconds

$loopcount = 4000000;
multi-INSERT-at-once Elapsed time is 134 seconds

<table value constructor>:
==========================
Included in this patch is support for <table value constructor> in the 
FROM clause, e.g.:

regression=# select * from {values (1,array[1,2]),(2,array[3,4])};
  ?column? | array
----------+-------
         1 | {1,2}
         2 | {3,4}
(2 rows)

The strange syntax is a temporary hack to eliminate shift/reduce 
conflicts. I'm not entirely sure we want to try to support this (or 
something like it) for 8.2, but much of what is needed is now readily 
available. More on known issues next.

Known Issues:
=============

General:
--------
1. Several comments in the patch are marked "FIXME". These are areas
    where I was uncertain what was the "right thing to do". Any advice
    on these specific spots would be very much appreciated.
2. I broke the rules regression test -- still need to look at what I
    did to mess that up. Somewhere in the reconstruction of "VALUES ..."
    according to the diff.

VALUES multi-targetlist INSERTS:
--------------------------------
3. Not yet quite sure how to get DEFAULT to work for "INSERT ...
    multi-values". As noted above, works fine if there is only
    one targetlist.

<table value constructor>:
--------------------------
4. I'm getting shift/reduce conflicts that are not easily eliminated.
    Making VALUES fully reserved only made it 1 shift/reduce conflict.
5. Column aliases are still not working correctly. Haven't really looked
    closely at this yet.
6. Data types are being deduced currently based on the first row,
    and not currently getting checked on subsequent rows. So it is
    easy to induce a crash:

regression=# select * from {values (1,array[1,2]),(2,3)};
server closed the connection unexpectedly
         This probably means the server terminated abnormally
         before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>

7. In general, <table value constructor> in the FROM clause needs
    more discussion -- among other things, how should we determine and
    enforce column types? I think this could be a very useful feature,
    but I'm not comfortable I understand it yet.

=================
As usual, review, advise, comments, flames, etc. requested

Joe

Attachment: multi-insert-r6a.diff.gz
Description: application/x-gzip (16.1 KB)

In response to

Responses

pgsql-docs by date

Next:From: Joe ConwayDate: 2006-07-24 06:22:14
Subject: Re: Values list-of-targetlists patch for comments (was Re:
Previous:From: Jim C. NasbyDate: 2006-07-21 14:08:32
Subject: Re: [PATCHES] 8.2 features?

pgsql-hackers by date

Next:From: Tom LaneDate: 2006-07-24 04:06:30
Subject: Re: Adding a pgbench run to buildfarm
Previous:From: Bort, PaulDate: 2006-07-24 03:52:14
Subject: Adding a pgbench run to buildfarm

pgsql-patches by date

Next:From: Joe ConwayDate: 2006-07-24 06:22:14
Subject: Re: Values list-of-targetlists patch for comments (was Re:
Previous:From: Robert LorDate: 2006-07-24 01:00:31
Subject: Re: [PATCHES] Generic Monitoring Framework with DTrace patch

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group