Sample Selection

Sample selection logic selects most appropriate sample, based on this relatively simple logic in the current version:

  • If the query is not an aggregation query (based on COUNT, AVG, SUM) then reject the use of any samples. The query is executed on the base table. Else,

  • If query QCS (columns involved in Where/GroupBy/Having matched the sample QCS, then, select that sample

  • If exact match is not available, then, if the sample QCS is a superset of query QCS, that sample is used

  • If superset of sample QCS is not available, a sample where the sample QCS is a subset of query QCS is used

  • When multiple stratified samples with a subset of QCSs match, a sample with most matching columns is used. The largest size of the sample gets selected if multiple such samples are available.