Some tips on a performance issue I have faced multiple times: the tuning of a data warehouse query. Behold!
Below you can see an overview of a star schema (or a dimensional schema if you must) which is commonly used in business intelligence reporting tools.
A common data warehouse query, which is usually clicked together in a reporting tool, can look like this:
SELECT tim.quarter, prod.category, geo.district,
SUM(sal.revenue)
FROM sales sal
JOIN geography geo ON sal.geography_id = geo.sal.geography_id
JOIN time tim ON sal.time_id = tim.time_id
JOIN product pro ON sal.product_id = pro.product_id
WHERE tim.year IN (2008, 2009, 2010)
AND pro.prod_group IN ('JUMBO PACK')
GROUP BY tim.quarter, prod.category, geo.district
ORDER BY tim.quarter, prod.category, geo.district
In this example, the dimension tables (geography, product and time) store attributes to describe the sales facts, while the sales fact table stores the measures; i.e., quantity, revenue and gross profit.
The complaint I most often experience when working with end users in BI tools is that ‘it’ works slow. Most of the time, the ‘it’ refers to the database query they clicked together while using the tool and in most of the cases the addition of filters somewhat solves the problem. But not always…
4 possible solutions for when ‘it’ becomes too slow:
1. Nested Loops with B*Tree Indexes: these are the most commonly used indexes. The query will be executed according to the explain plan that the optimizer comes up with. In the case of the example above, it will first filter the fact records that match the prod_group filter after which it will continue to join with the other dimensions. The last step will aggregate and sort the data. This approach accesses the fact table very often. The query takes about 3.4 hours to execute.
2. Star Transformation with Bit Mapped Indexes: the basic idea of this transformation is to steer clear of using a full table scan access method on the fact table. The query will be rewritten so that all the filtering can happen first, before the fact table is accessed. The performance is better than the B*Tree indexes, but it can still perform poorly when a lot of data is required from the fact table. The query ran in 3.9 minutes.
3. Full Scans with Intelligent Filtering: a bloom filter is used to filter out the data that is needed by using the intelligent filtering on Exadata. The query takes about 5 seconds.
4. In-Memory Aggregation: in short; in-memory aggregation it is a cunning mechanism that combines the aspects of the star-transformation, but without the bitmap indexes, Bloom filters and the ‘group by’ placement to minimize the cost of aggregation over high-volume joins. The result is that everything is as fast as the previous method, with the difference that the aggregation is performed as a part of the fact table access. The query now executes under 5 seconds.
A small recap:
Technique |
Primary Fact Table Access Method |
Requirements |
Pros |
Cons |
B*Tree Indexes with NL Joins |
•B*Tree index access •Nested Loops joins |
•Indexes on fact table |
Decent performance if number of rows is very small and all data accessed is satisfied from memory |
Algorithmically weak; can’t get fact table rows fast enough |
Star transformation |
•Rowid from bitmap index •Bitmap merge •Star transformation |
•star_transformation_enabled •query_rewrite_integrity •PK/FK constraints •NOT NULL constraints •Bitmap indexes on fact table |
Excellent performance if number of rows is small and all data accessed is satisfied from memory |
Poor performance if number of rows from fact table is high and requires random I/O |
Full Scans with Intelligent Filtering |
•Full scans •Swap join optimization & right-deep tree •Bloom Filters and pipelined hash joins |
•Exadata or DBIM •cell_offload_processing •PK/FK constraints •NOT NULL constraints |
Can handle high and low cardinality queries to achieve consistent response times |
Infrastructure cost, scalability as concurrency increases |
In-Memory Aggregation |
•Full scans •Vector Transformation • |
•DBIM •PK/FK constraints •NOT NULL constraints |
Excellent performance for both scan, filter, and aggregation |
From this we notice that the in-memory aggregation performs the best, which sounded to me as the most expensive solution :)
Long story short, some basic things:
/PVE