Re: proposal: be smarter about i/o patterns in index scan

From: "Glen Parker" <glenebob(at)nwlink(dot)com>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: be smarter about i/o patterns in index scan
Date: 2004-05-19 21:06:44
Message-ID: AJEKKAIECKNMBCEKADJPKEACCFAA.glenebob@nwlink.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org
> [mailto:pgsql-hackers-owner(at)postgresql(dot)org]On Behalf Of Tom Lane

> > Or:
> > 2. Sort AND COPY TID's into physical order.
> > 3. Read heap in phy. order, match results to un-sorted TID list.
> > That sounds quite cheap.
>
> No, it sounds like handwaving. What's your "match results" step, and

You got me :-)

> does it take less than O(N^2) time? How do you get to *return* the

Less than O(N^2)??? Geez I think so!

> tuples in index order, without having read them all before you return
> the first one? (Which is really what the problem is with the sort
> alternative, at least for fast-start cases such as the LIMIT 1 example.)

(you'll have to pardon me for thinking as I type... and being to long winded
:-)

It takes O(1) time.

If you have a hard maximum of TID's you'll grab from the index (which sounds
right), you could store the TID list in a vector. The IO-order-sorted list
could be a singly-linked-list OR a vector, but either way, each entry would
have to point back to an offset in the index-ordered list.

The index-ordered list would be (sizeof(TID) * max_TID_count) bytes, and the
IO-ordered list would be (sizeof(TID) + sizeof(int) * max_TID_count) bytes.
If a TID is 8 bytes and we're going to fetch 2048 index entries per
iteration, that gets us (*counting on fingers*) 16K (for index-ordered list)
+ 24K for the IO-ordered list.

The index-ordered list can also be a singly-linked list, in which case the
mapping would be by pointer. If both lists are single-linked lists, then
the size overhead (for a fully realized 2048 entry scan) would be:
((sizeof(TID) + sizeof(void*)) * max_TID_count) + ((sizeof(TID) +
sizeof(void*) + sizeof(void*)) * max_TID_count) = (*more counting on
fingers*) ... 24K + 32K = 56K.

Hmm, the vector style would be crazy expensive for small queries. The
linked-list approach sounds pretty good though. I assume no one thinks
doing this would be completely free :-) And, if you did the IO-ordered list
as a vector, you'd save the linked-list overhead, and you would know exactly
how big to make it, so it wouldn't have to hurt you on small queries.

2048 entries seems pretty darn generous actually.

> How do you get to *return* the
> tuples in index order, without having read them all before you return

Hmm, worsed case scenario would be to scan the heap in exactly-reverse order
of index order. With a 2048 entry batch size, you'd have a fair amount of
overhead on a LIMIT 1 :-)

Unless the index scanner could be given a maximum tuple count, rather than
just 'knowing' it from a GUC value. Would this dirty up the plan tree
beyond all recognition?

> What happens when you run out of memory for the list of TIDs ... er,
> make that two lists of TIDs?

Now you really got me. Hand waving: the same thing you do when you run out
of memory anywhere else :-) By configuring the server to fetch index
entries in 1-entry batches, you'd be right back to the overhead (or very
nearly so) of the current implementation, so it would only hurt people who
chose to be hurt by it :-)

-Glen

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Page 2004-05-19 21:18:52 Re: search engine down (again)
Previous Message Gaetano Mendola 2004-05-19 21:01:11 emaildt_0.0.2