Re: Improve BULK insertion

From: Rodrigo Carvalhaes <grupos(at)carvalhaes(dot)net>
To: Christopher Browne <cbbrowne(at)acm(dot)org>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Improve BULK insertion
Date: 2004-12-05 19:52:03
Message-ID: 41B366E3.7040109@carvalhaes.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi!

1. I am doing the inserts using pg_restore. The dump was created using
pg_dump and the standard format (copy statements)
2. See below the table schema. There are only 7 indexes.
3. My transaction log configuration are : checkpoint_segments = 3 and
checkpoint_timeout = 300 and my transaction logs are on the same disk .

I know that I can increase the performance separating the transaction
logs and making a RAID 5 array BUT I am really curious about WHY this
performance is so poor and HOW can I try to improve on this actual
machine because actualy this inserts are taking around 90 minutes!!!

Cheers!

Rodrigo

dadosadv=# \d si2010
Table "public.si2010"
Column | Type | Modifiers
------------+------------------+---------------------------------------------------------------------
i2_filial | character(2) | not null default ' '::bpchar
i2_num | character(10) | not null default ' '::bpchar
i2_linha | character(2) | not null default ' '::bpchar
i2_data | character(8) | not null default ' '::bpchar
i2_dc | character(1) | not null default ' '::bpchar
i2_debito | character(20) | not null default '
'::bpchar
i2_dcd | character(1) | not null default ' '::bpchar
i2_credito | character(20) | not null default '
'::bpchar
i2_dcc | character(1) | not null default ' '::bpchar
i2_moedas | character(5) | not null default ' '::bpchar
i2_valor | double precision | not null default 0.0
i2_hp | character(3) | not null default ' '::bpchar
i2_hist | character(40) | not null default
' '::bpchar
i2_ccd | character(9) | not null default ' '::bpchar
i2_ccc | character(9) | not null default ' '::bpchar
i2_ativdeb | character(6) | not null default ' '::bpchar
i2_ativcrd | character(6) | not null default ' '::bpchar
i2_vlmoed2 | double precision | not null default 0.0
i2_vlmoed3 | double precision | not null default 0.0
i2_vlmoed4 | double precision | not null default 0.0
i2_vlmoed5 | double precision | not null default 0.0
i2_dtvenc | character(8) | not null default ' '::bpchar
i2_criter | character(4) | not null default ' '::bpchar
i2_rotina | character(8) | not null default ' '::bpchar
i2_periodo | character(6) | not null default ' '::bpchar
i2_listado | character(1) | not null default ' '::bpchar
i2_origem | character(40) | not null default
' '::bpchar
i2_permat | character(4) | not null default ' '::bpchar
i2_filorig | character(2) | not null default ' '::bpchar
i2_intercp | character(1) | not null default ' '::bpchar
i2_identcp | character(12) | not null default ' '::bpchar
i2_lote | character(4) | not null default ' '::bpchar
i2_doc | character(6) | not null default ' '::bpchar
i2_emporig | character(2) | not null default ' '::bpchar
i2_lp | character(3) | not null default ' '::bpchar
i2_itemd | character(9) | not null default ' '::bpchar
i2_itemc | character(9) | not null default ' '::bpchar
i2_prelan | character(1) | not null default ' '::bpchar
i2_tipo | character(2) | not null default ' '::bpchar
i2_dcc | character(1) | not null default ' '::bpchar
i2_moedas | character(5) | not null default ' '::bpchar
i2_valor | double precision | not null default 0.0
i2_hp | character(3) | not null default ' '::bpchar
i2_hist | character(40) | not null default
' '::bpchar
i2_ccd | character(9) | not null default ' '::bpchar
i2_ccc | character(9) | not null default ' '::bpchar
i2_ativdeb | character(6) | not null default ' '::bpchar
i2_ativcrd | character(6) | not null default ' '::bpchar
i2_vlmoed2 | double precision | not null default 0.0
i2_vlmoed3 | double precision | not null default 0.0
i2_vlmoed4 | double precision | not null default 0.0
i2_vlmoed5 | double precision | not null default 0.0
i2_dtvenc | character(8) | not null default ' '::bpchar
i2_criter | character(4) | not null default ' '::bpchar
i2_rotina | character(8) | not null default ' '::bpchar
i2_periodo | character(6) | not null default ' '::bpchar
i2_listado | character(1) | not null default ' '::bpchar
i2_origem | character(40) | not null default
' '::bpchar
i2_permat | character(4) | not null default ' '::bpchar
i2_filorig | character(2) | not null default ' '::bpchar
i2_intercp | character(1) | not null default ' '::bpchar
i2_identcp | character(12) | not null default ' '::bpchar
i2_lote | character(4) | not null default ' '::bpchar
i2_doc | character(6) | not null default ' '::bpchar
i2_emporig | character(2) | not null default ' '::bpchar
i2_lp | character(3) | not null default ' '::bpchar
i2_itemd | character(9) | not null default ' '::bpchar
i2_itemc | character(9) | not null default ' '::bpchar
i2_prelan | character(1) | not null default ' '::bpchar
i2_tipo | character(2) | not null default ' '::bpchar
d_e_l_e_t_ | character(1) | not null default ' '::bpchar
r_e_c_n_o_ | double precision | not null default 0.0
Indexes:
"si2010_pkey" primary key, btree (r_e_c_n_o_)
"si20101" btree (i2_filial, i2_num, i2_linha, i2_periodo,
r_e_c_n_o_, d_e_l_e_t_)
"si20102" btree (i2_filial, i2_periodo, i2_num, i2_linha,
r_e_c_n_o_, d_e_l_e_t_)
"si20103" btree (i2_filial, i2_data, i2_num, i2_linha, r_e_c_n_o_,
d_e_l_e_t_)
"si20104" btree (i2_filial, i2_debito, i2_data, i2_num, i2_linha,
r_e_c_n_o_, d_e_l_e_t_)
"si20105" btree (i2_filial, i2_credito, i2_data, i2_num, i2_linha,
r_e_c_n_o_, d_e_l_e_t_)
"si20106" btree (i2_filial, i2_doc, i2_periodo, r_e_c_n_o_, d_e_l_e_t_)
"si20107" btree (i2_filial, i2_origem, r_e_c_n_o_, d_e_l_e_t_)

Christopher Browne wrote:

>In the last exciting episode, grupos(at)carvalhaes(dot)net (Grupos) wrote:
>
>
>>Hi !
>>
>>I need to insert 500.000 records on a table frequently. It´s a bulk
>>insertion from my applicatoin.
>>I am with a very poor performance. PostgreSQL insert very fast until
>>the tuple 200.000 and after it the insertion starts to be really slow.
>>I am seeing on the log and there is a lot of transaction logs,
>>something like :
>>
>>2004-12-04 11:08:59 LOG: recycled transaction log file "0000000600000012"
>>2004-12-04 11:08:59 LOG: recycled transaction log file "0000000600000013"
>>2004-12-04 11:08:59 LOG: recycled transaction log file "0000000600000011"
>>2004-12-04 11:14:04 LOG: recycled transaction log file "0000000600000015"
>>2004-12-04 11:14:04 LOG: recycled transaction log file "0000000600000014"
>>2004-12-04 11:19:08 LOG: recycled transaction log file "0000000600000016"
>>2004-12-04 11:19:08 LOG: recycled transaction log file "0000000600000017"
>>2004-12-04 11:24:10 LOG: recycled transaction log file "0000000600000018"
>>
>>
>
>It is entirely normal for there to be a lot of transaction log file
>recycling when bulk inserts are taking place; that goes through a lot
>of transaction logs.
>
>
>
>>How can I configure PostgreSQL to have a better performance on this
>>bulk insertions ? I already increased the memory values.
>>
>>
>
>Memory is, as likely as not, NOT the issue.
>
>Two questions:
>
> 1. How are you doing the inserts? Via INSERT statements? Or
> via COPY statements? What sort of transaction grouping
> is involved?
>
> COPY is way faster than INSERT, and grouping plenty of updates
> into a single transaction is generally a "win."
>
> 2. What is the schema like? Does the table have a foreign key
> constraint? Does it have a bunch of indices?
>
> If there should eventually be lots of indices, it tends to be
> faster to create the table with none/minimal indices, and add
> indexes afterwards, as long as your "load" process can be trusted
> to not break "unique" constraints...
>
> If there is some secondary table with a foreign key constraint,
> and _that_ table is growing, it is possible that a sequential
> scan is being used to search the secondary table where, if you
> did an ANALYZE on that table, an index scan would be preferred
> once it grew to larger size...
>
>There isn't a particular reason for PostgreSQL to "hit a wall" upon
>seeing 200K records; I and coworkers routinely load database dumps
>that have millions of (sometimes pretty fat) records, and they don't
>"choke." That's true whether talking about loading things onto my
>(somewhat wimpy) desktop PC, or a SMP Xeon system with a small RAID
>array, or higher end stuff involving high end SMP and EMC disk arrays.
>The latter obviously being orders of magnitude faster than desktop
>equipment :-).
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Gregory S. Williamson 2004-12-05 22:48:28 Re: Improve BULK insertion
Previous Message Rodrigo Carvalhaes 2004-12-05 19:43:36 Re: pg_restore taking 4 hours!