Sponsored by PRISE Ltd.
www.prisetools.com
www.prisetools.com
Analyze "collect statistics" in DBQL
Please note that the solutions found in the article works on the DBQL logs, which covers only that users' activity, for whom the logging is switched on. "Object" and "SQL" option in DBQL is required "on" to use the scripts provided.
This article is applicable up to V13.10 w/o modifications, statistics handling changed from V14.
This article is applicable up to V13.10 w/o modifications, statistics handling changed from V14.
About Statistics
"Statistics" is a descriptive object in the Teradata database that are used by the optimizer for transforming SQLs to effective execution plans.Statistics reflect the key data demographic information of one or more table column(s).
These objects should be created and maintained, the RDBMS will not do it by itself.
Statistics internally contain value histogram, which needs the table data (or sample) to be analyzed, which is an expensive task.
Summarized: appropriate statistics are required for getting good and effective executon plans for SQLs, but statistics consume resources to be collected or refreshed.
"Statistics" footprint in DBQL
When a "statistics" is created or refreshed it is executed by an SQL command: collect statistics....This command will create a log entry into the DBQL if the logging is switched on.
One can track when, which "statistics" was collected, consuming how much CPU and I/O.
Those statements are very easy to identify in the central table:
select * from dbc.DBQLogTbl where StatementType='Collect statistics'
Analyzing DBQL data
Prepare data
You may need to modify the script:- Date (interval)
- Online/archived: use commented section
- QueryBand: "JOB" variable is used, modify according to your ETL settings
create volatile table DBQLStat_tmp1
as
(
sel a.procId,a.QueryId,a.StartTime,a.AMPCpuTime,a.TotalIOCount
,case when a.querytext like '% sample %' then 'S' else 'F' end Full_Sample
,UserName,(FirstRespTime - StartTime) DAY(4) TO SECOND(4) AS RUNINTERVAL
,(EXTRACT(DAY FROM RUNINTERVAL) * 86400 + EXTRACT(HOUR FROM RUNINTERVAL) * 3600 + EXTRACT(MINUTE FROM RUNINTERVAL) * 60 + EXTRACT(SECOND FROM RUNINTERVAL) ) (decimal(10,1)) Duration
,b.ObjectDatabaseName DatabaseName,b.ObjectTableName TableName,c.ObjectColumnName ColumnName
,case when d.SQLTextInfo like any ('%"PARTITION"%', '%,PARTITION %', '%,PARTITION,%', '% PARTITION,%', '% PARTITION %', '%(PARTITION,%', '%(PARTITION %', '%,PARTITION)%', '% PARTITION)%', '%(PARTITION)%') then 'Y' else 'N' end inclPartition
,CAST((case when index(queryband,'JOB=') >0 then substr(queryband,index(queryband,'JOB=') ) else '' end) AS VARCHAR(500)) tmp_Q
,case when queryband = '' then 'N/A'
when tmp_q = '' then '-Other'
else CAST( (substr(tmp_Q,characters('JOB=')+1, nullifzero(index(tmp_Q,';'))-characters('JOB=')-1)) AS VARCHAR(500)) end QB_info
,sum(1) over (partition by a.procid,a.Queryid order by c.ObjectColumnName, a.QueryID rows unbounded preceding) Rnk
from
/* For achived tables
dbql_arch.DBQLogTbl_hst a
join dbql_arch.DBQLObjTbl_hst b on b.ObjectType='Tab' and a.procid=b.procid and a.QueryID=b.QueryID and a.logDate=b.logDate
left join dbql_arch.DBQLObjTbl_hst c on c.ObjectType='Col' and a.procid=c.procid and a.QueryID=c.QueryID and a.logDate=c.logDate
join dbql_arch.DBQLSQLTbl_hst d on d.SQLRowNo=1 and a.procid=d.procid and a.QueryID=d.QueryID and a.logDate=d.logDate
where a.logDate=1140113
*/
/*end*/
/* For online tables */
dbc.DBQLogTbl a
join dbc.DBQLObjTbl b on b.ObjectType='Tab' and a.procid=b.procid and a.QueryID=b.QueryID
left join dbc.DBQLObjTbl c on c.ObjectType='Col' and a.procid=c.procid and a.QueryID=c.QueryID
join dbc.DBQLSQLTbl d on d.SQLRowNo=1 and a.procid=d.procid and a.QueryID=d.QueryID
where cast(cast(a.starttime as char(10)) as date) = '2014-01-13' (date)
/*end*/
and a.StatementType='Collect statistics'
) with data
primary index (procId,QueryId)
on commit preserve rows
;
create volatile table DBQLStat
as
(
WITH RECURSIVE rec_tbl
(
procId,QueryId,StartTime,AMPCpuTime,TotalIOCount,Duration,Full_Sample,UserName,DatabaseName,TableName,QB_info,inclPartition,ColumnName,Rnk,SColumns
)
AS
(
select
procId,QueryId,StartTime,AMPCpuTime,TotalIOCount,Duration,Full_Sample,UserName,DatabaseName,TableName,QB_info,inclPartition,ColumnName,Rnk,cast(case when ColumnName is null and inclPartition='Y' then '' else '('||ColumnName end as varchar(10000)) SColumns
from DBQLStat_tmp1 where Rnk=1
UNION ALL
select
a.procId,a.QueryId,a.StartTime,a.AMPCpuTime,a.TotalIOCount,a.Duration,a.Full_Sample,a.UserName,a.DatabaseName,a.TableName,a.QB_info,a.inclPartition,a.ColumnName,a.Rnk,b.SColumns ||','||a.ColumnName
from DBQLStat_tmp1 a
join rec_tbl b on a.procId=b.ProcId and a.QueryId=b.QueryID and a.Rnk=b.Rnk+1
)
select procId,QueryId,StartTime,AMPCpuTime,TotalIOCount,Duration,Full_Sample,UserName,DatabaseName,TableName,QB_info,Rnk NumOfColumns
,case when SColumns = '' then '(PARTITION)' else SColumns || case when inclPartition='Y' then ',PARTITION)' else ')' end end StatColumns
from rec_tbl qualify sum(1) over (partition by procid,queryid order by Rnk desc, QueryID rows unbounded preceding) = 1
) with data
primary index (procid,queryid)
on commit preserve rows
;
Reports
- How many statistics has been collected for how much resources?
select
UserName /*Or: DatabaseName*//*Or: Full_sample*/
, count(*) Nbr
, sum(AMPCpuTIme) CPU
, sum(TotalIOCount) IO
from DBQLStat
group by 1
order by 1
;
- Which statistics has been collected multiple times?
(If more days are in preapred data, frequency can be determined, erase "qualify")
sum(1) over (partition by databasename,tablename,statcolumns) Repl
from DBQLStat a
/* Comment for frequency report*/
qualify sum(1) over (partition by databasename,tablename,statcolumns) > 1
/*end*/
order by repl desc, databasename,tablename,statcolumns
;
Sponsored by PRISE Ltd.
www.prisetools.com
www.prisetools.com