Tuesday, March 18, 2014

Storing date&time columns

Sponsored by PRISE Ltd.
www.prisetools.com

How to store date and time info effectively

Introduction

Data Warehouse databases usually contain significant amount of date/time information. Physical modeling technique can seriously influence their storage space and usability.

Aspects

Date/time information can be stored in different ways/data types, each of them will have its own specialities.
Basic options:
  • Joint storage: Timestamp
    • Timestamp(n) , when n means the fractional digits of seconds
  • Separate storage: Date & Time
    • Date column + Time column

Storage space

The data types require the following space (if uncompressed)

Type Space
Date 4 bytes
Integer time (integer format '99:99:99') 4 bytes
Time(n) 6 bytes, independent of n*, where n:[0..6]
Time(n) with time zone 8 bytes, independent of n*, where n:[0..6]
Timestamp(n) 10 bytes, independent of n*, where n:[0..6]
Timestamp(n) with time zone 12 bytes, independent of n*, where n:[0..6]
* n means the precision digits of second

Usage complexity

Teradata is not the most ergonomic for handling date-time data. Operations with these data types are typically tricky and sometimes hides traps (try add_months('2014-01-31',1) ). Conversion of a date and a timestamp is different, decisions must be made by considering storage and usage aspects.
  • Conversions
    • Date: implicit conversions work, easy and comfortable
    • Integer time: works fine, but insert-select will loose the formatting, only the integer value will remain
    • Time(n): implicit conversion to string is not working. This fails: select cast('2014-01-31' as date) || ' ' ||cast('12:00:00' as time(0))
    • Timestamp(n): brrr. Different precisions will not convert automatically either. I don't like it.
  • Filtering: comparing date/datetime values with < / <= /between operators
    • Joint storage (timestamps)
      Straightforward, just use the values - if they are equivalent data types
    • Separate storage
      You have to convert to a "joint" format, either a string or a timestamp before
  • Arithmetic
    • Date: ok, adding a constant, subtracting dates work fine
    • Integer time: do not use arithmetic, results are bad!
    • Time(n): interval types accepted. Not really comfortable, eg max 99 second long interval is accepted (V13.10)
    • Timestamp(n): same as Time(n)
    Regarding arithmetic I suggest building your own UDF library, that will ease your life.

Recommendations

Choosing data type

I recommend to choose data types depending on the table type and usage purposes.
I differentiate "transaction" and "all other" table types, because transaction tables are usually allocate most of the PERM space, while others are many in number, but allocate "negligible" space.
  • Transaction
    • Separate storage
    • Integer time
  • All others
    • Joint type (timestamp)

Saving space - store "delta"

The biggest tables in the data warehouses are the "transaction tables" (call/purchase/transfer/etc. transactions depending on industry), and most of them contain several date fields, most of them w/strong correlation. I explain what I mean. Let's assume a call record (telco), that will have the following date(&time) columns:
  • Channel_seizure
  • Call_start
  • Call_end
  • Bill_cycle_start
  • Bill_cycle_end
The date component of the first three columns are the same in 99% of the records, and the last ones differ from the first ones with max. of 30 days.

My recommendation is the following:
  • Choose a "primary date"
    Must be not null, and typically used as partitioning key also, since it is the most often date filtering condition.In our case this will be the Call_start
  • Choose separate date-time storing
    Eg. Date and Integer time , as this combination requires the least space
  • Store the non-primary dates as delta, multi value comressed
    Compute it in the load process, like this:
    Call_end_delta := Call_end-Call_start
  • Compress the "delta" columns
    They will reflect low deviation, highly compressible, use PRISE Compress Wizard
  • Convert to absolute dates back in the view layer
    Call_start + Call_end_delta as "Call_end"
Example:

CREATE TABLE T2000_CALL_TRX
(
...
Call_start_date Date NOT NULL
Call_end_date_delta Integer COMPRESS (0)
...
) PRIMARY INDEX (...,Call_start_date)
PARTITION BY RANGE_N ( Call_start_date BETWEEN date '2010-01-01' AND date '2020-12-31' EACH interval '1' day, NO RANGE, UNKNOWN);
;

CREATE VIEW V2000_CALL_TRX
as
SELECT
...
, Call_end_date_delta +Call_start_date as "Call_end_date"
...
FROM
T2000_CALL_TRX

;

Sponsored by PRISE Ltd.
www.prisetools.com

No comments:

Post a Comment