Sponsored by PRISE Ltd.
www.prisetools.com
www.prisetools.com
How to store date and time info effectively
Introduction
Data Warehouse databases usually contain significant amount of date/time information. Physical modeling technique can seriously influence their storage space and usability.Aspects
Date/time information can be stored in different ways/data types, each of them will have its own specialities.Basic options:
- Joint storage: Timestamp
- Timestamp(n) , when n means the fractional digits of seconds
- Separate storage: Date & Time
- Date column + Time column
Storage space
The data types require the following space (if uncompressed)
Type | Space |
Date | 4 bytes |
Integer time (integer format '99:99:99') | 4 bytes |
Time(n) | 6 bytes, independent of n*, where n:[0..6] |
Time(n) with time zone | 8 bytes, independent of n*, where n:[0..6] |
Timestamp(n) | 10 bytes, independent of n*, where n:[0..6] |
Timestamp(n) with time zone | 12 bytes, independent of n*, where n:[0..6] |
Usage complexity
Teradata is not the most ergonomic for handling date-time data. Operations with these data types are typically tricky and sometimes hides traps (try add_months('2014-01-31',1) ). Conversion of a date and a timestamp is different, decisions must be made by considering storage and usage aspects.
- Conversions
- Date: implicit conversions work, easy and comfortable
- Integer time: works fine, but insert-select will loose the formatting, only the integer value will remain
- Time(n): implicit conversion to string is not working. This fails: select cast('2014-01-31' as date) || ' ' ||cast('12:00:00' as time(0))
- Timestamp(n): brrr. Different precisions will not convert automatically either. I don't like it.
- Filtering: comparing date/datetime values with < / <= /between operators
- Joint storage (timestamps)
Straightforward, just use the values - if they are equivalent data types - Separate storage
You have to convert to a "joint" format, either a string or a timestamp before - Arithmetic
- Date: ok, adding a constant, subtracting dates work fine
- Integer time: do not use arithmetic, results are bad!
- Time(n): interval types accepted. Not really comfortable, eg max 99 second long interval is accepted (V13.10)
- Timestamp(n): same as Time(n)
Recommendations
Choosing data type
I recommend to choose data types depending on the table type and usage purposes.I differentiate "transaction" and "all other" table types, because transaction tables are usually allocate most of the PERM space, while others are many in number, but allocate "negligible" space.
- Transaction
- Separate storage
- Integer time
- All others
- Joint type (timestamp)
Saving space - store "delta"
The biggest tables in the data warehouses are the "transaction tables" (call/purchase/transfer/etc. transactions depending on industry), and most of them contain several date fields, most of them w/strong correlation. I explain what I mean. Let's assume a call record (telco), that will have the following date(&time) columns:
- Channel_seizure
- Call_start
- Call_end
- Bill_cycle_start
- Bill_cycle_end
The date component of the first three columns are the same in 99% of the records, and the last ones differ from the first ones with max. of 30 days.
My recommendation is the following:
- Choose a "primary date"
Must be not null, and typically used as partitioning key also, since it is the most often date filtering condition.In our case this will be the Call_start - Choose separate date-time storing
Eg. Date and Integer time , as this combination requires the least space - Store the non-primary dates as delta, multi value comressed
Compute it in the load process, like this:
Call_end_delta := Call_end-Call_start - Compress the "delta" columns
They will reflect low deviation, highly compressible, use PRISE Compress Wizard - Convert to absolute dates back in the view layer
Call_start + Call_end_delta as "Call_end"
Example:
CREATE TABLE T2000_CALL_TRX
(
...
Call_start_date Date NOT NULL
Call_end_date_delta Integer COMPRESS (0)
...
) PRIMARY INDEX (...,Call_start_date)
PARTITION BY RANGE_N ( Call_start_date BETWEEN date '2010-01-01' AND date '2020-12-31' EACH interval '1' day, NO RANGE, UNKNOWN);
;
CREATE VIEW V2000_CALL_TRX
as
SELECT
...
, Call_end_date_delta +Call_start_date as "Call_end_date"
...
FROM
T2000_CALL_TRX
;
CREATE TABLE T2000_CALL_TRX
(
...
Call_start_date Date NOT NULL
Call_end_date_delta Integer COMPRESS (0)
...
) PRIMARY INDEX (...,Call_start_date)
PARTITION BY RANGE_N ( Call_start_date BETWEEN date '2010-01-01' AND date '2020-12-31' EACH interval '1' day, NO RANGE, UNKNOWN);
;
CREATE VIEW V2000_CALL_TRX
as
SELECT
...
, Call_end_date_delta +Call_start_date as "Call_end_date"
...
FROM
T2000_CALL_TRX
;
Sponsored by PRISE Ltd.
www.prisetools.com
www.prisetools.com