Tuesday, February 13, 2007

Data Warehousing Concepts

I'm reading a book on Data Warehousing. This is in preparation for a project that I'm involved in. The project is to build a global reporting solution for IT using SQL Server 2005 Reporting Services. Here are some key words I want to remember about data warehousing:

This termMeans this
AttributeInformation about a specific dimension member
Data warehouseA relational database designed to store management information
DimensionA list of labels that can be used to cross-tabulate values from other dimensions
Fact tableThe relational database table that contains values for one or more measures at the lowest level of detail for one or more dimensions
Foreign key columnA column in a database table that contains many values for each value in the primary key column of another database table
JoinThe processes of linking the primary key of one table to the foreign key of another table
MeasureA summarizable numerical value used to monitor business activity
MemberA single item within a dimension
Member propertyAn attribute of a member that is not meaningful when grouping values for a report, but contains valuable information about a different attribute
Primary key columnA column in a database dimension table that contains values that uniquely identify each row
Snowflake designA database arrangement in which attributes of a dimension are stored in a separate (normalized) table
Star designA database arrangement in which multiple attributes of a dimension are redundantly stored in a single (denormalized) dimension table

Design considerations:

  • Use integers for dimension members to reduce size of the fact table and to gaurantee uniqueness of each member key in a dimension table.
  • Use surrogate keys so that the fact table consumes less space. The ability to create multiple instances of the same product–or the same customer–is an extremely important benefit of surrogate keys, and it is particularly important in a data warehouse where you are maintaining historical information for comparison. Surrogate keys are a critical part of most data warehouse design. The foreign key in the fact table and the primary key in the dimension table are then completely under the control of the data warehouse.
  • Values in a data warehouse are not changing as dynamically as they would in a production database, so storing the values redundantly is less important than is retrieving the values as quickly as possible for a report.

Understanding OLAP and Analysis Services:

This termMeans this
AggregationSummarized values of a measure
CacheServer-based storage locations both in memory (automatic) or on disk (designed) that enhance query performance
Calculated memberA mechanism for aggregating measures using formulas more complex than those stored in a cube
CubeA collection of one or more related measure groups and their associated dimensions
Cube metadataInstructions for creating and querying OLAP structures such as cubes and dimensions
HierarchyLevels of aggregation within a single dimension
Measure groupThe conceptual container of detail values from a single fact table, along with all possible aggregations for one or more dimension hierarchies
Online analytical processing (OLAP)A database system optimized to support decision-making processes
Online transaction processing (OLTP)A database system used to manage transactions such as order processing
Unified Dimensional Model (UDM)The measure groups and dimensions that define your organization's BI data; essentially synonymous with a cube

Benefit of an OLAP cube over a relational database:

  1. Consistently fast response - prestoring calculated values.
  2. Metadata-based queries
  3. Spreadsheet-style formulas

Design considerations:

  • Use Database model OLAP instead of Spreadsheet model OLAP. Advantages and disadvantages, the biggest benefit of OLAP stored using the database model is the ability to avoid data explosion. Because you need relatively few aggregate tables to provide fast results, you can have much larger cubes with many more dimensions and attributes than by using a spreadsheet model. Perhaps the biggest disadvantage of OLAP stored by using a database model is that there is no inherent way to physically store values that are calculated using nonassociative operators.