Data Warehouse Query Tuning

Tuesday, November 3, 2015

Some tips on a performance issue I have faced multiple times: the tuning of a data warehouse query. Behold!

Below you can see an overview of a star schema (or a dimensional schema if you must) which is commonly used in business intelligence reporting tools.

Star Schema

A common data warehouse query, which is usually clicked together in a reporting tool, can look like this:

 

SELECT  tim.quarter, prod.category, geo.district,

 SUM(sal.revenue)

FROM    sales sal

 JOIN    geography geo      ON sal.geography_id = geo.sal.geography_id

 JOIN    time tim       ON sal.time_id = tim.time_id

 JOIN    product pro     ON sal.product_id = pro.product_id

     WHERE  tim.year IN (2008, 2009, 2010)

     AND    pro.prod_group IN ('JUMBO PACK')

GROUP BY    tim.quarter, prod.category, geo.district

ORDER BY    tim.quarter, prod.category, geo.district

 

In this example, the dimension tables (geography, product and time) store attributes to describe the sales facts, while the sales fact table stores the measures; i.e., quantity, revenue and gross profit.

The complaint I most often experience when working with end users in BI tools is that ‘it’ works slow. Most of the time, the ‘it’ refers to the database query they clicked together while using the tool and in most of the cases the addition of filters somewhat solves the problem. But not always…

 

4 possible solutions for when ‘it’ becomes too slow:

1. Nested Loops with B*Tree Indexes: these are the most commonly used indexes. The query will be executed according to the explain plan that the optimizer comes up with. In the case of the example above, it will first filter the fact records that match the prod_group filter after which it will continue to join with the other dimensions. The last step will aggregate and sort the data. This approach accesses the fact table very often. The query takes about 3.4 hours to execute.

2. Star Transformation with Bit Mapped Indexes: the basic idea of this transformation is to steer clear of using a full table scan access method on the fact table. The query will be rewritten so that all the filtering can happen first, before the fact table is accessed. The performance is better than the B*Tree indexes, but it can still perform poorly when a lot of data is required from the fact table. The query ran in 3.9 minutes.

3. Full Scans with Intelligent Filtering: a bloom filter is used to filter out the data that is needed by using the intelligent filtering on Exadata. The query takes about 5 seconds.

4. In-Memory Aggregation: in short; in-memory aggregation it is a cunning mechanism that combines the aspects of the star-transformation, but without the bitmap indexes, Bloom filters and the ‘group by’ placement to minimize the cost of aggregation over high-volume joins. The result is that everything is as fast as the previous method, with the difference that the aggregation is performed as a part of the fact table access. The query now executes under 5 seconds.

 

A small recap:

 

Technique

Primary Fact Table Access Method

Requirements

Pros

Cons

B*Tree Indexes with NL Joins

•B*Tree index access

•Nested Loops joins

•Indexes on fact table

Decent performance if number of rows is very small and all data accessed is satisfied from memory

Algorithmically weak; can’t get fact table rows fast enough

Star transformation

•Rowid from bitmap index

•Bitmap merge

•Star transformation

•star_transformation_enabled

•query_rewrite_integrity

•PK/FK constraints

•NOT NULL constraints

•Bitmap indexes on fact table

Excellent performance if number of rows is small and all data accessed is satisfied from memory

Poor performance if number of rows from fact table is high and requires random I/O

Full Scans with Intelligent Filtering

•Full scans

•Swap join optimization & right-deep tree

•Bloom Filters and pipelined hash joins

•Exadata or DBIM

•cell_offload_processing

•PK/FK constraints

•NOT NULL constraints

Can handle high and low cardinality queries to achieve consistent response times

Infrastructure cost, scalability as concurrency increases

In-Memory Aggregation

•Full scans

•Vector Transformation

•DBIM

•PK/FK constraints

•NOT NULL constraints

Excellent performance for both scan, filter, and aggregation

 


 

From this we notice that the in-memory aggregation performs the best, which sounded to me as the most expensive solution :)

Long story short, some basic things:

  1. the same performance issues that existed 25 years ago still exist today
  2. use bind variables
  3. moore’s law: the progress in computing processing power is exponential
  4. don’t program the fail: errors are expensive in computing time, when you program exceptions for when they occur, you’ll never notice how many times an error occurs or even that it occurs!


/PVE