Sponsored by PRISE Ltd.
www.prisetools.com

Analyze "collect statistics" in DBQL

Please note that the solutions found in the article works on the DBQL logs, which covers only that users' activity, for whom the logging is switched on. "Object" and "SQL" option in DBQL is required "on" to use the scripts provided.
This article is applicable up to V13.10 w/o modifications, statistics handling changed from V14.

About Statistics

"Statistics" is a descriptive object in the Teradata database that are used by the optimizer for transforming SQLs to effective execution plans.
Statistics reflect the key data demographic information of one or more table column(s).
These objects should be created and maintained, the RDBMS will not do it by itself.
Statistics internally contain value histogram, which needs the table data (or sample) to be analyzed, which is an expensive task.

Summarized: appropriate statistics are required for getting good and effective executon plans for SQLs, but statistics consume resources to be collected or refreshed.

"Statistics" footprint in DBQL

When a "statistics" is created or refreshed it is executed by an SQL command: collect statistics....
This command will create a log entry into the DBQL if the logging is switched on.

One can track when, which "statistics" was collected, consuming how much CPU and I/O.
Those statements are very easy to identify in the central table:

select * from dbc.DBQLogTbl where StatementType='Collect statistics'

Analyzing DBQL data

Prepare data

You may need to modify the script:

Date (interval)
Online/archived: use commented section
QueryBand: "JOB" variable is used, modify according to your ETL settings


create volatile table DBQLStat_tmp1

as

(

sel a.procId,a.QueryId,a.StartTime,a.AMPCpuTime,a.TotalIOCount

,case when a.querytext like '% sample %' then 'S' else 'F' end Full_Sample

,UserName,(FirstRespTime - StartTime) DAY(4) TO SECOND(4) AS RUNINTERVAL       
,(EXTRACT(DAY FROM RUNINTERVAL) * 86400 + EXTRACT(HOUR FROM RUNINTERVAL)  * 3600 + EXTRACT(MINUTE FROM RUNINTERVAL)  * 60 + EXTRACT(SECOND FROM RUNINTERVAL) ) (decimal(10,1)) Duration
,b.ObjectDatabaseName DatabaseName,b.ObjectTableName TableName,c.ObjectColumnName ColumnName
,case when d.SQLTextInfo like any ('%"PARTITION"%', '%,PARTITION %', '%,PARTITION,%', '% PARTITION,%', '% PARTITION %', '%(PARTITION,%', '%(PARTITION %', '%,PARTITION)%', '% PARTITION)%', '%(PARTITION)%') then 'Y' else 'N' end inclPartition
,CAST((case when index(queryband,'JOB=') >0 then  substr(queryband,index(queryband,'JOB=') ) else '' end) AS VARCHAR(500)) tmp_Q
,case when queryband = '' then 'N/A'

         when tmp_q = '' then '-Other'

   else CAST( (substr(tmp_Q,characters('JOB=')+1, nullifzero(index(tmp_Q,';'))-characters('JOB=')-1)) AS VARCHAR(500)) end QB_info
,sum(1) over (partition by a.procid,a.Queryid order by c.ObjectColumnName, a.QueryID rows unbounded preceding) Rnk

from

/* For achived tables

     dbql_arch.DBQLogTbl_hst       a

join dbql_arch.DBQLObjTbl_hst      b on b.ObjectType='Tab' and a.procid=b.procid and a.QueryID=b.QueryID and a.logDate=b.logDate

left join dbql_arch.DBQLObjTbl_hst c on c.ObjectType='Col' and a.procid=c.procid and a.QueryID=c.QueryID and a.logDate=c.logDate

join dbql_arch.DBQLSQLTbl_hst      d on d.SQLRowNo=1       and a.procid=d.procid and a.QueryID=d.QueryID and a.logDate=d.logDate

where a.logDate=1140113

*/

/*end*/

/* For online tables */

     dbc.DBQLogTbl       a

join dbc.DBQLObjTbl      b on b.ObjectType='Tab' and a.procid=b.procid and a.QueryID=b.QueryID

left join dbc.DBQLObjTbl c on c.ObjectType='Col' and a.procid=c.procid and a.QueryID=c.QueryID

join dbc.DBQLSQLTbl      d on d.SQLRowNo=1       and a.procid=d.procid and a.QueryID=d.QueryID

where cast(cast(a.starttime as char(10)) as date) = '2014-01-13' (date)

/*end*/

and a.StatementType='Collect statistics'

) with data

primary index (procId,QueryId)

on commit preserve rows

;



create volatile table DBQLStat

as

(

WITH RECURSIVE rec_tbl

(
 procId,QueryId,StartTime,AMPCpuTime,TotalIOCount,Duration,Full_Sample,UserName,DatabaseName,TableName,QB_info,inclPartition,ColumnName,Rnk,SColumns

)

AS

(

select
 procId,QueryId,StartTime,AMPCpuTime,TotalIOCount,Duration,Full_Sample,UserName,DatabaseName,TableName,QB_info,inclPartition,ColumnName,Rnk,cast(case when ColumnName is null and inclPartition='Y' then '' else '('||ColumnName end as varchar(10000)) SColumns

from DBQLStat_tmp1 where Rnk=1

UNION ALL

select

  a.procId,a.QueryId,a.StartTime,a.AMPCpuTime,a.TotalIOCount,a.Duration,a.Full_Sample,a.UserName,a.DatabaseName,a.TableName,a.QB_info,a.inclPartition,a.ColumnName,a.Rnk,b.SColumns ||','||a.ColumnName

from DBQLStat_tmp1     a

join rec_tbl b on a.procId=b.ProcId and a.QueryId=b.QueryID and a.Rnk=b.Rnk+1

)

select   procId,QueryId,StartTime,AMPCpuTime,TotalIOCount,Duration,Full_Sample,UserName,DatabaseName,TableName,QB_info,Rnk NumOfColumns

        ,case when SColumns = '' then '(PARTITION)' else SColumns || case when inclPartition='Y' then ',PARTITION)' else ')' end end StatColumns

from rec_tbl qualify sum(1) over (partition by procid,queryid order by Rnk desc, QueryID rows unbounded preceding) = 1

) with data

primary index (procid,queryid)

on commit preserve rows

;

Reports

How many statistics has been collected for how much resources?

select
UserName /*Or: DatabaseName*//*Or: Full_sample*/
, count(*) Nbr
, sum(AMPCpuTIme) CPU
, sum(TotalIOCount) IO
from DBQLStat
group by 1
order by 1
;

Which statistics has been collected multiple times?
(If more days are in preapred data, frequency can be determined, erase "qualify")

select a.*,
sum(1) over (partition by databasename,tablename,statcolumns) Repl
from DBQLStat a
/* Comment for frequency report*/
qualify sum(1) over (partition by databasename,tablename,statcolumns) > 1
/*end*/
order by repl desc, databasename,tablename,statcolumns
;

Sponsored by PRISE Ltd.
www.prisetools.com

PRISE Teradata Database Optimization

Monday, January 13, 2014

DBQL analysis III - Monitor "collect statistics"