A few notes on The Real World Performance Tour

Tuesday, May 20, 2014

A few notes on The Real World Performance Tour, organised by OBUG with Andrew Holdsworth, Tom Kyte and Graham Wood.

Think different about performance

Threads vs Arrays
Arrays are much faster than threads.
Multiple threads suffer from contention, buffer busy wait, TX index contention

Is Load Balancing Slowing Your System?
Initial setup
    Devices ship files.
    Files read and processed by multiple application servers.
    Each application server uses multiple threads that connect to the database through a connection pool which is distributed by a scan listener over two instances.


How we usually think when this setup is performing badly
    It’s too slow
    It’s a problem with the database (“Look at all those waits”)
    Need to be able to process an order of magnitude more data
    “Obviously need to move to Hadoop”

How to solve this?
    Only small amount of data being processed.
    Both instances essentially idle with most processes waiting in RAC and concurrency waits.

→     Remove all of those RAC waits by running against a single database instance.



    Throughput up by factor of 10x
    RAC waits gone
    But high concurrency waits
◦    Buffer busy
◦    Tx index contention

→     Reduce contention waits by processing a file entirely within a single application server.



    More throughput
    Log file sync predominant event
    CPU usage close to core count

→     Reintroduce RAC to add more CPU resource
    Implement separate service for each instance
    Connect application server to one instance



Analyzing SQL
Setup



    Table has 1.2B rows and is 64 GB
    Withdefault statistics the query exceeds his target time of 5 seconds with 44 seconds

 

Intial Optimization Steps


Add more predicate values
    Query runs faster just by changing the list of values in the select list
    Plan changed from a broadcast to a hash distribution due to the higher but inaccurate cardinality estimate
    Getting the correct plan with a wrong cardinality estimate can lead to inconsistent plans and performance

Change Degree of Parallelism
    Just changing DoP from 32 to 128 improves performance and meets the target; 4X more
    Resources yields a 25X performance improvement
    Plan has changed from a broadcast distribution to a hash distribution due to DoP change
    DoP is a resource management technique, not a query tuning tool

Indexes
Indexes on columns: owner_id, country, make, model and [country, make, model]
    Not understanding the big/little data challenge
    Indexes are not efficient for operations on a large numbers of rows
    Full table scan is faster with predictable performance

→ Add Indexes and query takes longer: 160 seconds!
→ Index lookups on millions of rows is slow

To index or not?
    Indexing is an OLTP technique for operations on a small number of rows
    A table scan may consume more resources but it will be predictable, no matter how many rows are returned
    Indexes impact DML operations
    Index driven query retrieving 1,000,000 rows
◦    Assume the index is cached and the data is not.
▪    1,000,000 random IOPS @ 5ms per I/O
▪    This would require 5000 Seconds (or over 1 hour) to execute
◦    How much data could you scan in 5000 Seconds with a fully sized I/O
▪    system able to scan 25 GB/Sec
▪    Over 100 TB !

Histograms
    Re-gathered stats to automatically create histograms
    Frequency histograms on country, make and model columns
    No change in plan: query still exceeds target
    Lots of wait time on temp IO

Flash Temp
    Most of the wait time was spent performing IO on temp, so move temp to flash disks
    Improved performance but still does not meet target
    Not a good use of flash
→ Incorrect use of tools/products

Manual memory parameters
    Set sort_area_size and hash_area_size to 2G
    Eliminated temp usage but still did not meet target
    Memory is allocated per parallel server process, which can quickly exceed resources
→ Moving to a solution before understanding the problem

Cardinality Hint
    SQL Monitor showed poor cardinality estimates
    Cardinality hint gives optimizer the correct number of rows for the table scan
    Plan changed from a broadcast to hash distribution
    Query time now meets target
    Now temp is not an issue

Disable broadcast ditribution
    Googling reveals a hidden parameter to disable broadcast distribution
    Plan and run times are similar to cardinality hint, meeting target
→ Moving to a solution before understanding the problem

Histogram on column group
    Re-gathered stats after running the query with the column groups
    Frequency Histogram on the column group
    Accurate cardinality estimates
    Optimizer now uses a hash distribution

Auto column groups
    dbms_stats.report_col_usage shows column groups identified during Seed Column Usage
    dbms_stats.create_extended_stats creates column groups identified
    Automatically identifies usage of Country, Make and Model columns together and creates column group
    Regather stats
    Automatically creates Histogram on the column group
    Query meets target

What did we learn?



Bad performance
Potential indicators of Session Leaking

    Frequent application server resets
    init.ora parameters process and sessions set very high
    Configuration of large and dynamic connection pools
    Large number of idle connections connected to the database
    Free memory on database server continually reduced
    Presence of idle connection kill scripts or middleware configured to kill idle sessions
    Without warning, the database appears to hang and the application servers time out simultaneously
    The DBA sees that all connections are waiting on a single lock held by a process that has not been active for a while.
    Each time the problem occurs, the DBA responds by running a script to kill sessions held by long time lock holders and allowing the system to restart.

The real-time demo's of the presenters can easily be looked up on YouTube.
If you search on “Real World Performance” and “Holdsworth” you should find the latest demo's.

Hope this helps!

Resources:
http://www.obug.be

GDD/JUVO