Re: Alter/update large tables - VERRRY annoying behaviour!

From: "Nigel J(dot) Andrews" <nandrews(at)investsystems(dot)co(dot)uk>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Alter/update large tables - VERRRY annoying behaviour!
Date: 2002-04-15 22:59:49
Message-ID: Pine.LNX.4.21.0204152335270.20382-100000@ponder.fairway2k.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-bugs pgsql-general pgsql-patches


On Mon, 15 Apr 2002, Dmitry Tkach wrote:

> Neil Conway wrote:
>
>> On Mon, 15 Apr 2002 13:07:20 -0400
>> "Dmitry Tkach" <dmitry(at)openratings(dot)com> wrote:
>
>>> This took me awfully long, but worked (I guess).
>>> I say 'I guess', because I wasn't able so far to verify that - when I
triued
to do
>>> select * from a limit 1;
>>> It just hungs on me ... at least, it looks like it does.
>
>> This didn't hang, it just requires a sequential scan of the whole table.
>
> I know it does (as I said below). The point is that it SHOULD NOT, and
> especially, that I can't imagine anyone, not familiar with postgres
> internals to expect that it would - all it needs to do is to grab the
> first row and return immediately.
> That's what it would do, if you just create a new table and populate it
> with data.
>
>> As you observe below, it will also need to scan through dead tuples,
>
> Not 'also' - JUST the dead ones! That's what's especially annoying about
> it - as soon as it finds the first tuple, that's not dead, it returns.
>
> but that is just a product of MVCC and there's no real way around
> it.
>
> My whole point is that I don't believe it (that there is no way around)
> :-)
> For one thing, I have never seen ANY database engine (Oracle, Informix,
> DB2) that would take more than a second to get the first row from a
> table, regardless of what has been done to that table before.
> That (and my common sense too) tells me that there MUST be a 'way around
> it'.
> I can see, that it's not currently implemented in postgres, but do
> believe (and that's the whole point of me posting that message in the
> first place) that it is a huge usability issue and really needs to be
> fixed.

Ok, so I realise that addressing this issue of using a limit of 1 doesn't
address anything associated with using this table for something useful, however
I find it an interesting point.

All that has been asked for is that a single row be returned. This doesn't have
to be the first valid row at all. Indeed isn't it one of axioms of SQL that
data is returned in some arbitrary order unless specifically forced into an
order? Therefore shouldn't doing SELECT * FROM x LIMIT 1 use an index, any
index, to just fetch a single row?

Or, is this an effect of a row needing to be visited in order to determine it's
'visibility' as proposed as a reason why a sequential scan was being performed
for one of my queries I posted about recently? I am still a little confused
about that though. If there is an index and deletes have been commited,
shouldn't the index have been updated and forced to forget about the deleted
rows unless there is a transaction open that can still access those deleted
items?

As for the rest of the argument, is it constructive? I for one thought the
original poster had done more than a normal user, me included, would have done
before posting.

>[stuff deleted]
>
>>> First of all, a question for you - is ANY update to a table equivalent (in
this respect) to a delete+insert?
>
>> Yes, AFAIK -- MVCC requires this.
>
> What's MVCC?

Funny, I was about to ask that question. Something about variable size of
fields in the physical storage?

>
>>> - Vacuum, isn't the speediest thing in the world too (it's been running
for a hour now, and still has not finished).
>
>> Is this 7.2? If not, VACUUM should be substantially faster in 7.2.
>
> Yes, it is 7.2

Pleased I haven't tried what I'm doing in a version older than 7.2 in that
case, and my table's only 1 million rows.

> [more deletions]

However, a useful thread, since it explains a 'feature' a ran into earlier
today after managing about 4/5 of an update of every row of my table. I had
been thinking the problem was large storage usage by the txtidx type, I
realise now it is the updated = deleted+inserted storage requirement. I'm
thinking of changing what I'm doing to a way I considered earlier [without
thinking about this thread] so I can stay with in the limits or my resources.

--
Nigel J. Andrews
Director

---
Logictree Systems Limited
Computer Consultants

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Nigel J. Andrews 2002-04-15 23:16:07 Re: Alter/update large tables - VERRRY annoying behaviour!
Previous Message Dmitry Tkach 2002-04-15 22:10:11 Re: Alter/update large tables - VERRRY annoying behaviour!

Browse pgsql-bugs by date

  From Date Subject
Next Message Nigel J. Andrews 2002-04-15 23:16:07 Re: Alter/update large tables - VERRRY annoying behaviour!
Previous Message Dmitry Tkach 2002-04-15 22:10:11 Re: Alter/update large tables - VERRRY annoying behaviour!

Browse pgsql-general by date

  From Date Subject
Next Message Nigel J. Andrews 2002-04-15 23:16:07 Re: Alter/update large tables - VERRRY annoying behaviour!
Previous Message Alvaro Herrera 2002-04-15 22:57:26 Re: using CAST and LIKE

Browse pgsql-patches by date

  From Date Subject
Next Message Neil Conway 2002-04-15 23:03:55 Re: default values for views
Previous Message Bruce Momjian 2002-04-15 22:59:17 Re: default values for views