Re: [HACKERS] Slow count(*) again...

From: Віталій Тимчишин <tivv00(at)gmail(dot)com>
To: david(at)lang(dot)hm
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: [HACKERS] Slow count(*) again...
Date: 2011-02-04 07:08:59
Message-ID: AANLkTimko0CviRbwkvB=XXci4h7FU8ue7iKPnhM645jX@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

2011/2/3 <david(at)lang(dot)hm>

>
> If the table is not large enough to fit in ram, then it will compete for
> I/O, and the user will have to wait.
>
> what I'm proposing is that as the records are created, the process doing
> the creation makes copies of the records (either all of them, or some of
> them if not all are needed for the analysis, possibly via shareing memory
> with the analysis process), this would be synchronous with the load, not
> asynchronous.
>
> this would take zero I/O bandwidth, it would take up some ram, memory
> bandwidth, and cpu time, but a load of a large table like this is I/O
> contrained.
>
> it would not make sense for this to be the default, but as an option it
> should save a significant amount of time.
>
> I am making the assumption that an Analyze run only has to go over the data
> once (a seqential scan of the table if it's >> ram for example) and gathers
> stats as it goes.
>
> with the current code, this is a completely separate process that knows
> nothing about the load, so if you kick it off when you start the load, it
> makes a pass over the table (competing for I/O), finishes, you continue to
> update the table, so it makes another pass, etc. As you say, this is a bad
> thing to do. I am saying to have an option that ties the two togeather,
> essentially making the data feed into the Analyze run be a fork of the data
> comeing out of the insert run going to disk. So the Analyze run doesn't do
> any I/O and isn't going to complete until the insert is complete. At which
> time it will have seen one copy of the entire table.
>
> Actually that are two different problems. The one is to make analyze more
automatic to make select right after insert more clever by providing
statistics to it.
Another is to make it take less IO resources.
I dont like for it to be embedded into insert (unless the threshold can be
determined before inserts starts). Simply because it is more CPU/memory that
will slow down each insert. And if you will add knob, that is disabled by
default, this will be no more good than manual analyze.

--
Best regards,
Vitalii Tymchyshyn

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Віталій Тимчишин 2011-02-04 07:24:20 Re: [HACKERS] Slow count(*) again...
Previous Message Samuel Gendler 2011-02-04 06:36:20 Re: [HACKERS] Slow count(*) again...

Browse pgsql-performance by date

  From Date Subject
Next Message Віталій Тимчишин 2011-02-04 07:24:20 Re: [HACKERS] Slow count(*) again...
Previous Message Samuel Gendler 2011-02-04 06:36:20 Re: [HACKERS] Slow count(*) again...