A few notes on The Real World Performance Tour, organised by OBUG with Andrew Holdsworth, Tom Kyte and Graham Wood.
Think different about performance
Threads vs Arrays
Arrays are much faster than threads.
Multiple threads suffer from contention, buffer busy wait, TX index contention
Is Load Balancing Slowing Your System?
Initial setup
Devices ship files.
Files read and processed by multiple application servers.
Each application server uses multiple threads that connect to the database through a connection pool which is distributed by a scan listener over two instances.
How we usually think when this setup is performing badly
It’s too slow
It’s a problem with the database (“Look at all those waits”)
Need to be able to process an order of magnitude more data
“Obviously need to move to Hadoop”
How to solve this?
Only small amount of data being processed.
Both instances essentially idle with most processes waiting in RAC and concurrency waits.
→ Remove all of those RAC waits by running against a single database instance.
Throughput up by factor of 10x
RAC waits gone
But high concurrency waits
◦ Buffer busy
◦ Tx index contention
→ Reduce contention waits by processing a file entirely within a single application server.
More throughput
Log file sync predominant event
CPU usage close to core count
→ Reintroduce RAC to add more CPU resource
Implement separate service for each instance
Connect application server to one instance
Analyzing SQL
Setup
Table has 1.2B rows and is 64 GB
Withdefault statistics the query exceeds his target time of 5 seconds with 44 seconds
Intial Optimization Steps
Add more predicate values
Query runs faster just by changing the list of values in the select list
Plan changed from a broadcast to a hash distribution due to the higher but inaccurate cardinality estimate
Getting the correct plan with a wrong cardinality estimate can lead to inconsistent plans and performance
Change Degree of Parallelism
Just changing DoP from 32 to 128 improves performance and meets the target; 4X more
Resources yields a 25X performance improvement
Plan has changed from a broadcast distribution to a hash distribution due to DoP change
DoP is a resource management technique, not a query tuning tool
Indexes
Indexes on columns: owner_id, country, make, model and [country, make, model]
Not understanding the big/little data challenge
Indexes are not efficient for operations on a large numbers of rows
Full table scan is faster with predictable performance
→ Add Indexes and query takes longer: 160 seconds!
→ Index lookups on millions of rows is slow
To index or not?
Indexing is an OLTP technique for operations on a small number of rows
A table scan may consume more resources but it will be predictable, no matter how many rows are returned
Indexes impact DML operations
Index driven query retrieving 1,000,000 rows
◦ Assume the index is cached and the data is not.
▪ 1,000,000 random IOPS @ 5ms per I/O
▪ This would require 5000 Seconds (or over 1 hour) to execute
◦ How much data could you scan in 5000 Seconds with a fully sized I/O
▪ system able to scan 25 GB/Sec
▪ Over 100 TB !
Histograms
Re-gathered stats to automatically create histograms
Frequency histograms on country, make and model columns
No change in plan: query still exceeds target
Lots of wait time on temp IO
Flash Temp
Most of the wait time was spent performing IO on temp, so move temp to flash disks
Improved performance but still does not meet target
Not a good use of flash
→ Incorrect use of tools/products
Manual memory parameters
Set sort_area_size and hash_area_size to 2G
Eliminated temp usage but still did not meet target
Memory is allocated per parallel server process, which can quickly exceed resources
→ Moving to a solution before understanding the problem
Cardinality Hint
SQL Monitor showed poor cardinality estimates
Cardinality hint gives optimizer the correct number of rows for the table scan
Plan changed from a broadcast to hash distribution
Query time now meets target
Now temp is not an issue
Disable broadcast ditribution
Googling reveals a hidden parameter to disable broadcast distribution
Plan and run times are similar to cardinality hint, meeting target
→ Moving to a solution before understanding the problem
Histogram on column group
Re-gathered stats after running the query with the column groups
Frequency Histogram on the column group
Accurate cardinality estimates
Optimizer now uses a hash distribution
Auto column groups
dbms_stats.report_col_usage shows column groups identified during Seed Column Usage
dbms_stats.create_extended_stats creates column groups identified
Automatically identifies usage of Country, Make and Model columns together and creates column group
Regather stats
Automatically creates Histogram on the column group
Query meets target
What did we learn?
Bad performance
Potential indicators of Session Leaking
Frequent application server resets
init.ora parameters process and sessions set very high
Configuration of large and dynamic connection pools
Large number of idle connections connected to the database
Free memory on database server continually reduced
Presence of idle connection kill scripts or middleware configured to kill idle sessions
Without warning, the database appears to hang and the application servers time out simultaneously
The DBA sees that all connections are waiting on a single lock held by a process that has not been active for a while.
Each time the problem occurs, the DBA responds by running a script to kill sessions held by long time lock holders and allowing the system to restart.
The real-time demo's of the presenters can easily be looked up on YouTube.
If you search on “Real World Performance” and “Holdsworth” you should find the latest demo's.
Hope this helps!
Resources:
http://www.obug.be
GDD/JUVO