Friday, October 21, 2016

Short notes on Statistics


QUERY EXECUTION
  • Recompile would be the 1st step to influence to optimize to recompile the code again.
  • Compilation is the most important stage
  • Cost based optimizer == need to evaluate a query reasonably quickly
  • When writing a query, there are many ways to write it but do not focus to that first. Focus the result.
  • WHERE msal * 12 > literal -> Bad
  • WHERE msal > literal / 12 -> Good
  • All about statistics / estimation -> Query performance
  • Most of the cases the issue is not because of out of date stats or indexes, it could be parameter sniffing
  • EXEC <proc name> <parameters> -> uses the existing plan
  • EXEC <proc name> <parameters> WITH RECOMPILE -> generates a new plan
  • Using above two, you can identify the parameter sensitiveness (sniffing)
  • Update stats invalidate the plan
  • Estimates vs Actual -> If it is different, it may not be the stat problem always. It may be parameter sniffing.
STATISTICS
  • Selectivity
    • Optimizer loves highly selective predicates
    • Low no.of rows -> high selectivity
    • High no.of rows -> low selectivity
  • Statistics -> Summarized info about the data distribution of table columns
  • DBCC AUTOPILOT -> Undocumented
  • Hypothetical Indexes -> just the index structure.
    • What if analysis using AUTOPILOT with >= SQL Server 2008
  • sp_helpstats ‘<table name>’, ‘all’
  • Histogram -> 200 steps + 1 row for the null if the column allows null
    • SQL Server 7.0 had 300 rows
    • When the histogram is large, it increases the compilation time because histogram is not like an index
    • EQ_ROWS – Equal rows for the index key
    • Rows * All density = Avg no.of rows returns for that column or the combination of cols
    • If the table is huge, the data distribution present in histogram is not quite accurate
    • Step compression -> When building the histogram, if it finds the value approximately similar to each adjacent steps, then the algorithm will compress those steps and create 1 step
    • RANGE_ROWS -> values between two HI_KEYs and excluding the HI_KEYs at both ends
    • AVG_RANGE_ROWS = RANGE_ROWS / DISTINCT_RANGE_ROWS
  • Exec sp_autostats ‘<table name>’
  • Sys.stats -> shows all the stats including index stats
  • Sys.dm_db_stats_properties -> gives lot of details that you can use to update stats more effectively programmatically.
  • Histogram -> direct hit
  • DBCC SHOWSTATISTICS  WITH HISTOGRAM
  • Stats will update on Index Rebuild but not on Index re-org.
  • Entire stats structure stored in db as a BLOB (Header, Density Vector and Histogram)
  • Partitioned table uses table level stats which has 200 steps by each by each partition
  • Online partition level index rebuild in SQL Server 2014, but when MS introduced partitioning? It was in SQL Server 2005. So MS took 9 years to get online partition level index rebuild
  • Tuple cardinality -> used to estimate distinct cardinality
  • Optimization | Compilation | Execution
    • Local variable values are not known at optimization time
  • Parameters and literals can be sniffed -> uses histogram
  • Variables cannot be sniffed -> density vector
  • UDTATE STATS with any % will not be parallelized but FULL SCAN can be parallelized.
  • OPTION (QUERYTRACEON 3602, 9204, RECOMPILE)
  • Dynamic auto updating threshold
    • For large tables to have 20% change , you need to wait long time to update stats.
  • SQL Server does not understand the correlation between columns
  • Calculation direction in query plan is from right to left.
CONCERNS AROUND THE HISTOGRAM
  • Problem is always with monster tables
  • Is 200 steps enough for such tables?
  • Even the table is partitioned, still the histogram is @ table level
  • Sp_recompile for a table is an expensive operation. It needs Sch-M lock on the table
  • Filtered stats – consider creating filtered stats, even daily basis to tackle and solve the estimates problems due to skewed data. So that you will get better execution plans
  • QUERYTRACEON (2353) -> Additional info.

How to interpret Disk Latency

I was analyzing IO stats in one of our SQL Servers and noticed that IO latency (Read / Write) are very high. As a rule of thumb, we know tha...