Skip site navigation (1) Skip section navigation (2)

Re: Postgres restore help

From: Bartosz Dmytrak <bdmytrak(at)eranet(dot)pl>
To: Alessandro Gagliardi <alessandro(at)path(dot)com>
Cc: Piyush Lenka <lenka(dot)piyush(at)gmail(dot)com>, pgsql-novice(at)postgresql(dot)org
Subject: Re: Postgres restore help
Date: 2012-02-28 22:25:50
Message-ID: CAD8_UcbQ0hOx8+Gab4NQK_oYPw-kfp_5740+JkgMdni5pD4CZQ@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-novice
Let's have the tables like this:
CREATE TABLE "tblTest"
(
  "RowId" serial NOT NULL,
  "Value1" text,
  "Value2" text,
  "Value3" text,
  "Value4" text,
  "Value5" text,
  "Value6" text,
  CONSTRAINT "tblTest_pkey" PRIMARY KEY ("RowId" )
)
WITH (
  OIDS=FALSE
);

and
CREATE TABLE "tblTest2"
(
  "RowId" serial NOT NULL,
  "Value1" text,
  "Value2" text,
  "Value3" text,
  "Value4" text,
  "Value5" text,
  "Value6" text,
  CONSTRAINT "tblTest2_pkey" PRIMARY KEY ("RowId" )
)
WITH (
  OIDS=FALSE
);

I mean the same strcture, but with different number of rows and
tblTest - 999000
tblTest2 - 1000

let's try first
EXPLAIN
SELECT a.* FROM "tblTest" a
LEFT JOIN "tblTest2" b
ON (a."RowId" = b."RowId")
WHERE b."RowId" IS NULL;


'Hash Anti Join  (cost=70.50..69551.31 rows=998058 width=339)'
'  Hash Cond: (a."RowId" = b."RowId")'
'  ->  Seq Scan on "tblTest" a  (cost=0.00..56876.58 rows=999058 width=339)'
'  ->  Hash  (cost=58.00..58.00 rows=1000 width=4)'
'        ->  Seq Scan on "tblTest2" b  (cost=0.00..58.00 rows=1000 width=4)'


and:
EXPLAIN
SELECT a.* FROM "tblTest" a
WHERE NOT EXISTS (SELECT "RowId" FROM "tblTest2" b WHERE b."RowId" =
a."RowId")

'Hash Anti Join  (cost=70.50..69551.31 rows=998058 width=339)'
'  Hash Cond: (a."RowId" = b."RowId")'
'  ->  Seq Scan on "tblTest" a  (cost=0.00..56876.58 rows=999058 width=339)'
'  ->  Hash  (cost=58.00..58.00 rows=1000 width=4)'
'        ->  Seq Scan on "tblTest2" b  (cost=0.00..58.00 rows=1000 width=4)'

for me looks the same :) but I personally prefer first.

There is also "...WHERE "RowId" NOT IN..." solution, but in that case it is
a little faster when You are looking for rows from tblTest

EXPLAIN
SELECT * FROM "tblTest"
WHERE "RowId" NOT IN (SELECT "RowId" FROM "tblTest2")

'Seq Scan on "tblTest"  (cost=60.50..59434.72 rows=499529 width=339)'
'  Filter: (NOT (hashed SubPlan 1))'
'  SubPlan 1'
'    ->  Seq Scan on "tblTest2"  (cost=0.00..58.00 rows=1000 width=4)'

but it is much, much slower in opposite way
EXPLAIN
SELECT * FROM "tblTest2"
WHERE "RowId" NOT IN (SELECT "RowId" FROM "tblTest")

'Seq Scan on "tblTest2"  (cost=0.00..*34136318.00* rows=500 width=344)'
'  Filter: (NOT (SubPlan 1))'
'  SubPlan 1'
'    ->  Materialize  (cost=0.00..65774.87 rows=999058 width=4)'
'          ->  Seq Scan on "tblTest"  (cost=0.00..56876.58 rows=999058
width=4)'



Regards,
Bartek


2012/2/28 Alessandro Gagliardi <alessandro(at)path(dot)com>

> Argh! I can't believe I forgot the "LEFT". That's embarrassing. Anyway, I
> was wondering if a WHERE NOT EXISTS clause would be better. I'm still new
> to those, so didn't want to try to offer that as a solution, but I gather
> it can be more efficient than a JOIN in some cases.
>
>
> On Tue, Feb 28, 2012 at 12:13 PM, Bartosz Dmytrak <bdmytrak(at)eranet(dot)pl>wrote:
>
>> This is almost perfect :)
>> but won't work in that case in the way You expect, because of join type.
>> This should be LEFT JOIN to select all from temp_table (left) and matching
>> from main_table (right), then WHERE clausule will  filter rows not existing
>> in main_table. Without USING, which is shorthand (
>> http://www.postgresql.org/docs/9.1/static/sql-select.html#SQL-FROM) it
>> should look like this:
>>
>> INSERT INTO main_table
>> SELECT temp_table.*
>> FROM temp_table
>>       *LEFT JOIN* main_table ON (temp_table.pk = main_table.pk) /*
>> instead of "ON (...)" you can use Alessandro's shorthand: "USING (pk)"*/
>> WHERE main_table.pk IS NULL;
>>
>>
>> Regards,
>> Bartek
>>
>> 2012/2/28 Alessandro Gagliardi <alessandro(at)path(dot)com>
>>
>>> I would recommend restoring the table to a temporary table and then
>>> using something like:
>>> INSERT INTO main_table SELECT temp_table.* FROM temp_table JOIN
>>> main_table USING (pk) WHERE main_table.pk IS NULL
>>> Someone else here might have a more efficient way of phrasing that
>>> insert statement (I'm still fairly new to the concept of anti-joins). But I
>>> think something like this should work for you.
>>>
>>> -Alessandro
>>>
>>> On Tue, Feb 28, 2012 at 7:34 AM, Piyush Lenka <lenka(dot)piyush(at)gmail(dot)com>wrote:
>>>
>>>> Hi,
>>>> I m trying to restore only data from a backup file.but i hav some
>>>> primary key that already exists in the table.
>>>> *I want to restore non duplicate data only.*
>>>> *Please Help.*
>>>> *
>>>> *
>>>> Thanks And Regards
>>>> Piyush
>>>>
>>>
>>>
>>
>

In response to

pgsql-novice by date

Next:From: tukuna patroDate: 2012-02-29 10:33:11
Subject: a role without login priviledge
Previous:From: Tom LaneDate: 2012-02-28 21:49:51
Subject: Re: Postgres restore help

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group