Sponsored by PRISE Ltd.
www.prisetools.com
www.prisetools.com
How to optimize slow OUTER JOINs
Case description
We have a (LEFT or RIGHT) OUTER JOIN, and it runs a long time while causing skewed CPU / Spool usage. In practice most of the time during the query execution only 1 AMP will work, while others have nothing to do, causing poor parallel efficiency.
How to identify
The query typically runs long time, contains a "MERGE JOIN" step in the Explain description, and that step consumes skewed CPU consumption (MaxAMPCPUTime * Number-of-AMPS >> AMPCPUTime) and lasts long.In the DBQL you should find skewed, high CPU usage queries (
dbc.DBQLogtbl.MaxAMPCPUTime * (hashamp()+1) / nullifzero(dbc.DBQLogtbl.AmpCPUTime) > 1.2 and dbc.DBQLogtbl.AMPCPUTime > 1000 , depends on system size) which also has "left outer join" expression in the execution plan text (dbc.DBQLExplaintbl.ExplainText like '%left outer joined using a merge join%')
This is only an approximation since the skewness causing step may be a different one.
PRISE Tuning Assistant supplies easy-to-use GUI based search function.
Explanation
Let's assume that we outer join Table1 and Table2 on a condition that causes no product join (merge join instead), eg.:
select Table1.x,Table2.y
from Table1
LEFT JOIN Table2 on Table1.x = Table2.x
...
If Table2 is not a "small" table, Teradata optimizer will choose to "equi-distribute" (place matchable records on the same AMP) the two tables on the join field(s), in our case: Table1.x and Table2.x respectively.
If Table1.x contains significant percentage of NULLs, then the distribution will be skewed, since all "x is NULL" records will get to the same AMP.
We know that the NULL value never results in a join match, so those records are useless to examine, but they have to appear in the resultset, since it is an OUTER JOIN.
Solution
Let's handle the Table1 into two separate subsets: NULL(x) and NotNULL(x), and modify the select this way:
select Table1.x,Table2.y
from Table1
LEFT INNER JOIN Table2 on Table1.x = Table2.x
where Table1.x is not null -- This will eliminate skewed spool
UNION ALL
select Table1.x,NULL
from Table1
where Table1.x IS NULL;
Practical example:
Some of our transactions are contributed by an operator, in this case OpID is filled, else null. We would like to query the number of transactions by operators including the non-contributed ones. Most of the transactions are non contributed ones (OpID is null).
select
a.Transaction_id,
b.OperatorName as ContribOpName
from TransactionT a
LEFT JOIN OperatorT b on a.OpID = b.OpID
Optimized form:
select
a.Transaction_id, b.OperatorName as ContribOpName
from TransactionT a
LEFT JOIN OperatorT b on a.OpID = b.OpID
where a.OpID is not null
UNION ALL
select
a.Transaction_id, NULL as ContribOpName
from TransactionT
where OpID is null;
The execution will not cause a skewed CPU / Spool, because the those records of Table1 that caused peak ( x is NULL ) are excluded from processing of the join.
The second part will supply the "x is NULL" records to the result set without join processing.
The tradeoff is two full scans and a UNION ALL operation, which are comparably much less cost than a strongly skewed redistribution and a JOIN processing.
What's next
Next post will discuss unexpectedly slow INSERTs (hash collision).
Sponsored by PRISE Ltd.
www.prisetools.com
www.prisetools.com
Thank you so much for your posting. Please keep on posting on different bottlenecks and solution.Thanks
ReplyDeleteThanks, this helped me big time :-)
ReplyDelete