As Postgres scans an index and finds a matching key, it can choose not to read the matching row from the table right away. Instead, it can remember on which page was that matching row, finish scanning the index, and read each data page with matching rows only once. As a result, Postgres expects to make less page reads.
Let us set up an example and see how it works. We shall create a very narrow table, with only 12 bytes per row.
create table narrow_table as
with numbers as(select generate_series as n from generate_series(0,1048575))
select n as seq_number,
trunc(random()*1048575) as rand_number
from numbers;
alter table narrow_table
add constraint pk_narrow_table primary key(seq_number);
create index narrow_table_rand on narrow_table(rand_number);
cluster narrow_table using pk_narrow_table;
-- vacuum full must be run as a separate command
vacuum full analyze narrow_table;
This table is physically ordered by
seq_number column, as a result of
cluster command. However, there is very little correlation between
rand_number values and physical order of rows:
SELECT attname, correlation
FROM pg_stats
WHERE tablename LIKE '%narrow%';
"seq_number"1
"rand_number"-0.00202257
The following query uses Bitmap Index Scan, Recheck Cond, and Bitmap Heap Scan:
explain analyze select seq_number, rand_number
from narrow_table
where rand_number between 1 and 1000;
"Bitmap Heap Scan on narrow_table (cost=23.02..2667.43 rows=1034 width=12) (actual time=0.805..1.848 rows=968 loops=1)"
" Recheck Cond: ((rand_number >= 1::double precision) AND (rand_number <= 1000::double precision))"
" Heap Blocks: exact=903"
" -> Bitmap Index Scan on narrow_table_rand (cost=0.00..22.77 rows=1034 width=0) (actual time=0.511..0.511 rows=968 loops=1)"
" Index Cond: ((rand_number >= 1::double precision) AND (rand_number <= 1000::double precision))"
"Planning time: 0.276 ms"
"Execution time: 1.947 ms"
First, during Bitmap Index Scan step, the index was scanned, and 968 rows matched the range condition. These rows were on 903 pages, which were read during the Bitmap Heap Scan step. All the rows on those pages were matched against the range condition, which was mentioned as Recheck Cond in the execution plan.
The relative complexity of this plan allows Postgres to satisfy the query by reading only 903 pages. This makes sense if random page reads are expensive. Let us have the query planner think that random page reads are no more expensive than sequential ones:
SET random_page_cost = 1;
explain analyze select seq_number, rand_number
from narrow_table
where rand_number between 1 and 1000;
"Index Scan using narrow_table_rand on narrow_table (cost=0.42..971.98 rows=1034 width=12) (actual time=0.071..1.355 rows=968 loops=1)"
" Index Cond: ((rand_number >= 1::double precision) AND (rand_number <= 1000::double precision))"
"Planning time: 0.273 ms"
"Execution time: 1.465 ms"
This time the plan is much simpler - as Postgres scans the index and finds a matching key, the corresponding row is read right away, via a random page read. Since random page reads are cheap, going for a more complex plan to save less than 10% of reads does not make sense any more.
Also note that the query planner takes into account the correlation between columns values and physical location of rows. In all previous queries the correlation was very low, so every time a matching index key was found, most likely it was on a different page.
Let us see what happens when the correlation is high. Our table has been explicitly clustered on its first column, so the correlation between
seq_number and physical location of rows is 1, which is the highest possible value. This means that as we scan the index and find matching key, every next matching row is very likely to be on the same page.
As a result, even if we set the price of random page reads to be very high, such as 10 instead of the default 4, there are not going to be many page reads - the next row is very likely to be found right on the page we have just read for the previous match. So the execution plan is going to be a simple index scan:
SET random_page_cost = 10;
explain analyze select seq_number, rand_number from narrow_table
where seq_number between 1 and 1000;
"Index Scan using pk_narrow_table on narrow_table (cost=0.42..64.20 rows=939 width=12) (actual time=0.062..0.516 rows=1000 loops=1)"
" Index Cond: ((seq_number >= 1) AND (seq_number <= 1000))"
"Planning time: 0.229 ms"
"Execution time: 0.641 ms"
As we have seen, if random page reads are expensive, and if the correlation between column values and physical location of rows is not very high, the query planner may use Bitmap Index Scan, Recheck Cond, and Bitmap Heap Scan in order to minimize the number of random page reads.