I'm reading a book on Data Warehousing. This is in preparation for a project that I'm involved in. The project is to build a global reporting solution for IT using SQL Server 2005 Reporting Services. Here are some key words I want to remember about data warehousing:
This term | Means this |
---|
Attribute | Information about a specific dimension member |
Data warehouse | A relational database designed to store management information |
Dimension | A list of labels that can be used to cross-tabulate values from other dimensions |
Fact table | The relational database table that contains values for one or more measures at the lowest level of detail for one or more dimensions |
Foreign key column | A column in a database table that contains many values for each value in the primary key column of another database table |
Join | The processes of linking the primary key of one table to the foreign key of another table |
Measure | A summarizable numerical value used to monitor business activity |
Member | A single item within a dimension |
Member property | An attribute of a member that is not meaningful when grouping values for a report, but contains valuable information about a different attribute |
Primary key column | A column in a database dimension table that contains values that uniquely identify each row |
Snowflake design | A database arrangement in which attributes of a dimension are stored in a separate (normalized) table |
Star design | A database arrangement in which multiple attributes of a dimension are redundantly stored in a single (denormalized) dimension table |
Design considerations:
- Use integers for dimension members to reduce size of the fact table and to gaurantee uniqueness of each member key in a dimension table.
- Use surrogate keys so that the fact table consumes less space. The ability to create multiple instances of the same product–or the same customer–is an extremely important benefit of surrogate keys, and it is particularly important in a data warehouse where you are maintaining historical information for comparison. Surrogate keys are a critical part of most data warehouse design. The foreign key in the fact table and the primary key in the dimension table are then completely under the control of the data warehouse.
- Values in a data warehouse are not changing as dynamically as they would in a production database, so storing the values redundantly is less important than is retrieving the values as quickly as possible for a report.
Understanding OLAP and Analysis Services:
This term | Means this |
---|
Aggregation | Summarized values of a measure |
Cache | Server-based storage locations both in memory (automatic) or on disk (designed) that enhance query performance |
Calculated member | A mechanism for aggregating measures using formulas more complex than those stored in a cube |
Cube | A collection of one or more related measure groups and their associated dimensions |
Cube metadata | Instructions for creating and querying OLAP structures such as cubes and dimensions |
Hierarchy | Levels of aggregation within a single dimension |
Measure group | The conceptual container of detail values from a single fact table, along with all possible aggregations for one or more dimension hierarchies |
Online analytical processing (OLAP) | A database system optimized to support decision-making processes |
Online transaction processing (OLTP) | A database system used to manage transactions such as order processing |
Unified Dimensional Model (UDM) | The measure groups and dimensions that define your organization's BI data; essentially synonymous with a cube |
Benefit of an OLAP cube over a relational database:
- Consistently fast response - prestoring calculated values.
- Metadata-based queries
- Spreadsheet-style formulas
Design considerations:
- Use Database model OLAP instead of Spreadsheet model OLAP. Advantages and disadvantages, the biggest benefit of OLAP stored using the database model is the ability to avoid data explosion. Because you need relatively few aggregate tables to provide fast results, you can have much larger cubes with many more dimensions and attributes than by using a spreadsheet model. Perhaps the biggest disadvantage of OLAP stored by using a database model is that there is no inherent way to physically store values that are calculated using nonassociative operators.