Given the following star schema tables.
- fact, two dimensions, two measures.
# geog_abb time_date amount value
#1: AL 2013-03-26 55.57 9113.3898
#2: CO 2011-06-28 19.25 9846.6468
#3: MI 2012-05-15 94.87 4762.5398
#4: SC 2013-01-22 29.84 649.7681
#5: ND 2014-12-03 37.05 6419.0224
- geography dimension, single hierarchy, 3 levels in hierarchy.
# geog_abb geog_name geog_division_name geog_region_name
#1: AK Alaska Pacific West
#2: AL Alabama East South Central South
#3: AR Arkansas West South Central South
#4: AZ Arizona Mountain West
#5: CA California Pacific West
- time dimension, two hierarchies, 4 levels in each.
# time_date time_weekday time_week time_month time_month_name time_quarter time_quarter_name time_year
#1: 2010-01-01 Friday 1 1 January 1 Q1 2010
#2: 2010-01-02 Saturday 1 1 January 1 Q1 2010
#3: 2010-01-03 Sunday 1 1 January 1 Q1 2010
#4: 2010-01-04 Monday 1 1 January 1 Q1 2010
#5: 2010-01-05 Tuesday 1 1 January 1 Q1 2010
Examples is stripped of surrogate keys to improve readability. In results there are levels in hierarchy without other attributes, just don't bother that, they are still levels in hierarchy.
In star schema expressed as:
GEOGRAPHY (all fields)
/
/
FACT
TIME (all fields)
In snowflake schema expressed as:
geog_region_name
/
geog_division_name
/
geog_abb (+ geog_name)
/
/
FACT
time_date
|
hierarchies: |
weekly / monthly
/
/
time_weekday time_month (+ time_month_name)
| |
| |
time_week time_quarter (+ time_quarter_name)
| |
| |
time_year time_year
How would you call the following schema
Does it have any specific name? Starflake? :)
|>-- geog_region_name
|
|>-- geog_division_name
|
|>-- geog_abb (+ geog_name)
|
|
geography base
/
/
FACT
time base
|
|
|>-- time_date
|
|>-- time_weekday
|
|>-- time_week
|
|>-- time_month (+ time_month_name)
|
|>-- time_quarter (+ time_quarter_name)
|
|>-- time_year
It basically has a dimension base table storing identities of every level of every hierarchy within a dimension. No need for recursive walk through snowflake's levels, potentially less joins. Data still well normalized, only keys are denormalized into base table. All levels from all hierarchies tied to lowest grain key of a dimension in dimension base.
Additionally having a dimension base table allows to handle time variant attributes/temporal queries just in that table, at the granularity of a hierarchy level.
Here is the tabular representation.
Still on natural keys!
# geog_abb time_date amount value
# 1: AK 2010-01-01 154.43 12395.472
# 2: AK 2010-01-02 88.89 6257.639
# 3: AK 2010-01-03 81.74 7193.075
# 4: AK 2010-01-04 165.87 11150.619
# 5: AK 2010-01-05 8.75 6953.055
# time_date time_year time_quarter time_month time_week time_weekday
# 1: 2010-01-01 2010 1 1 1 Friday
# 2: 2010-01-02 2010 1 1 1 Saturday
# 3: 2010-01-03 2010 1 1 1 Sunday
# 4: 2010-01-04 2010 1 1 1 Monday
# 5: 2010-01-05 2010 1 1 1 Tuesday
- time dimension normalization to hierarchy levels
# time_year
# 1: 2010
# 2: 2011
# 3: 2012
# 4: 2013
# 5: 2014
# time_quarter time_quarter_name
# 1: 1 Q1
# 2: 2 Q2
# 3: 3 Q3
# 4: 4 Q4
# time_month time_month_name
# 1: 1 January
# 2: 2 February
# 3: 3 March
# 4: 4 April
# 5: 5 May
# time_week
# 1: 1
# 2: 2
# 3: 3
# 4: 4
# 5: 5
# time_weekday
# 1: Friday
# 2: Monday
# 3: Saturday
# 4: Sunday
# 5: Thursday
# time_date time_week time_weekday time_year
# 1: 2010-01-01 1 Friday 2010
# 2: 2010-01-02 1 Saturday 2010
# 3: 2010-01-03 1 Sunday 2010
# 4: 2010-01-04 1 Monday 2010
# 5: 2010-01-05 1 Tuesday 2010
# geog_abb geog_region_name geog_division_name
# 1: AK West Pacific
# 2: AL South East South Central
# 3: AR South West South Central
# 4: AZ West Mountain
# 5: CA West Pacific
- geography dimension normalization to hierarchy levels
# geog_region_name
# 1: North Central
# 2: Northeast
# 3: South
# 4: West
# geog_division_name
# 1: East North Central
# 2: East South Central
# 3: Middle Atlantic
# 4: Mountain
# 5: New England
# geog_abb geog_name geog_division_name geog_region_name
# 1: AK Alaska Pacific West
# 2: AL Alabama East South Central South
# 3: AR Arkansas West South Central South
# 4: AZ Arizona Mountain West
# 5: CA California Pacific West
Dimension base could store also primary key's attributes, this would de-duplicate dimension's lowest level but will be less consistent (time_date
levels from both hierarchies would fit into time dimension base tables).
What drawbacks such schema would have? I don't much bother about speed of joins and aggregates, and a query tool adaptivity.
Does it have any name? It is being use? If not why?
See Question&Answers more detail:
os