Saturday, January 24, 2015

Explicitly Stating Your Intent May Get You Better Execution Plan

Suppose that we need to query a PostgreSql database, and get the number of developers who currently have tickets assigned to them. The following query clearly describes the intent, and gets a good execution plan (the scripts that set up test data are provided below):
explain analyze
select count(*) from developers as d
where exists(select * from tickets as t 

  where t.developer_id=d.developer_id);
"Aggregate  (cost=141506.53..141506.54 rows=1 width=0) (actual time=1542.890..1542.891 rows=1 loops=1)"

"  ->  Merge Semi Join  (cost=1.59..139010.67 rows=998343 width=0) (actual time=0.076..1461.177 rows=1048576 loops=1)"

"        Merge Cond: (d.developer_id = t.developer_id)"

"        ->  Index Only Scan using pk_developers on developers d  (cost=0.42..36032.07 rows=1048576 width=4) (actual time=0.045..289.787 rows=1048576 loops=1)"

"              Heap Fetches: 1048576"

"        ->  Index Only Scan using tickets_developer_id on tickets t  (cost=0.43..74148.15 rows=2097151 width=4) (actual time=0.024..598.723 rows=2097151 loops=1)"

"              Heap Fetches: 2097252"

"Planning time: 0.927 ms"

"Execution time: 1542.987 ms"

Clearly this execution plan, a merge join, is very efficient. 

There are many more contrived, more complicated ways to accomplish the same thing, such as the following query:
explain analyze
select count(distinct d.developer_id) 
from developers as d join tickets as t 

  on t.developer_id=d.developer_id;
"Aggregate  (cost=139472.49..139472.50 rows=1 width=4) (actual time=4273.890..4273.891 rows=1 loops=1)"

"  ->  Hash Join  (cost=36476.96..134229.61 rows=2097151 width=4) (actual time=404.983..2438.207 rows=2097151 loops=1)"

"        Hash Cond: (t.developer_id = d.developer_id)"

"        ->  Seq Scan on tickets t  (cost=0.00..40572.51 rows=2097151 width=4) (actual time=0.008..362.940 rows=2097151 loops=1)"

"        ->  Hash  (cost=19273.76..19273.76 rows=1048576 width=4) (actual time=404.567..404.567 rows=1048576 loops=1)"

"              Buckets: 16384  Batches: 16  Memory Usage: 2321kB"

"              ->  Seq Scan on developers d  (cost=0.00..19273.76 rows=1048576 width=4) (actual time=0.006..179.046 rows=1048576 loops=1)"

"Planning time: 0.870 ms"

"Execution time: 4273.981 ms"
Clearly this query is logically equivalent to the previous one. Also this query runs as an expensive hash join, using up considerable memory, and taking almost three times longer to complete.

As we have seen, it usually makes sense to express our intent clearly. Contrived queries not only are more difficult to understand and maintain, they may also perform worse.

The following script creates and populates the tables used in this post.

create table developers as

with numbers as(
select generate_series as n from generate_series(0,1048575))
select n as developer_id, 
cast(n as text) as name,
cast('Some developer data ........' as text) as some_data 
from numbers;
alter table developers
add constraint pk_developers primary key(developer_id);
drop table tickets;
create table tickets as

with numbers as(
select generate_series as n from generate_series(0,1048575*2))
select n/2 as developer_id,
n as ticket_id, 
cast('Normal' as text) as ticket_priority,
cast('Some ticket data ........' as text) as some_data 
from numbers;
alter table tickets
add constraint pk_ticket primary key(ticket_id);
create index tickets_developer_id on tickets(developer_id);
update tickets set ticket_priority='High'
where ticket_id between 1000 and 1100;
analyze developers;
analyze tickets;






Wednesday, January 21, 2015

Learning Bitmap Index Scan, Recheck Cond, and Bitmap Heap Scan

As Postgres scans an index and finds a matching key, it can choose not to read the matching row from the table right away. Instead, it can remember on which page was that matching row, finish scanning the index, and read each data page with matching rows only once. As a result, Postgres expects to make less page reads.

Let us set up an example and see how it works. We shall create a very narrow table, with only 12 bytes per row.
create table narrow_table as
with numbers as(select generate_series as n from generate_series(0,1048575))
select n as seq_number, 
trunc(random()*1048575) as rand_number
from numbers;

alter table narrow_table
add constraint pk_narrow_table primary key(seq_number);
create index narrow_table_rand on narrow_table(rand_number);
cluster narrow_table using pk_narrow_table;
-- vacuum full must be run as a separate command
vacuum full analyze narrow_table;
This table is physically ordered by seq_number column, as a result of cluster command. However, there is very little correlation between rand_number values and physical order of rows:

SELECT attname, correlation
FROM pg_stats
WHERE tablename LIKE '%narrow%';
"seq_number"1
"rand_number"-0.00202257
The following query uses Bitmap Index Scan, Recheck Cond, and Bitmap Heap Scan:
explain analyze select seq_number, rand_number 
from narrow_table
where rand_number between 1 and 1000;
"Bitmap Heap Scan on narrow_table  (cost=23.02..2667.43 rows=1034 width=12) (actual time=0.805..1.848 rows=968 loops=1)"
"  Recheck Cond: ((rand_number >= 1::double precision) AND (rand_number <= 1000::double precision))"
"  Heap Blocks: exact=903"
"  ->  Bitmap Index Scan on narrow_table_rand  (cost=0.00..22.77 rows=1034 width=0) (actual time=0.511..0.511 rows=968 loops=1)"
"        Index Cond: ((rand_number >= 1::double precision) AND (rand_number <= 1000::double precision))"
"Planning time: 0.276 ms"
"Execution time: 1.947 ms"
First, during Bitmap Index Scan step, the index was scanned, and 968 rows matched the range condition. These rows were on 903 pages, which were read during the Bitmap Heap Scan step. All the rows on those pages were matched against the range condition, which was mentioned as Recheck Cond in the execution plan.

The relative complexity of this plan allows Postgres to satisfy the query by reading only 903 pages. This makes sense if random page reads are expensive. Let us have the query planner think that random page reads are no more expensive than sequential ones:
SET random_page_cost = 1;
explain analyze select seq_number, rand_number 
from narrow_table
where rand_number between 1 and 1000;

"Index Scan using narrow_table_rand on narrow_table  (cost=0.42..971.98 rows=1034 width=12) (actual time=0.071..1.355 rows=968 loops=1)"
"  Index Cond: ((rand_number >= 1::double precision) AND (rand_number <= 1000::double precision))"
"Planning time: 0.273 ms"
"Execution time: 1.465 ms"
This time the plan is much simpler -  as Postgres scans the index and finds a matching key, the corresponding row is read right away, via a random page read. Since random page reads are cheap, going for a more complex plan to save less than 10% of reads does not make sense any more.

Also note that the query planner takes into account the correlation between columns values and physical location of rows. In all previous queries the correlation was very low, so every time a matching index key was found, most likely it was on a different page.

Let us see what happens when the correlation is high. Our table has been explicitly clustered on its first column, so the correlation between seq_number and physical location of rows is 1, which is the highest possible value. This means that as we scan the index and find matching key, every next matching row is very likely to be on the same page.

As a result, even if we set the price of random page reads to be very high, such as 10 instead of the default 4, there are not going to be many page reads - the next row is very likely to be found right on the page we have just read for the previous match. So the execution plan is going to be a simple index scan:
SET random_page_cost = 10;
explain analyze select seq_number, rand_number from narrow_table
where seq_number between 1 and 1000;
"Index Scan using pk_narrow_table on narrow_table  (cost=0.42..64.20 rows=939 width=12) (actual time=0.062..0.516 rows=1000 loops=1)"
"  Index Cond: ((seq_number >= 1) AND (seq_number <= 1000))"
"Planning time: 0.229 ms"
"Execution time: 0.641 ms"
As we have seen, if random page reads are expensive, and if the correlation between column values and physical location of rows is not very high, the query planner may use Bitmap Index Scan, Recheck Cond, and Bitmap Heap Scan in order to minimize the number of random page reads.