Data Warehouse Guide
You are here: Home What is snowflake schema?

What is snowflake schema?

E-mail Print PDF

A snowflake schema is a term that describes a star schema structure normalized through the use of outrigger tables. i.e dimension table hierachies are broken into simpler tables. This way of logical arrangement of tables in a multidimensional database will make the entity relationship diagram resemble a snowflake in shape.

Where is snowflake schema used?

The star and snowflake schema are most commonly found in dimensional data warehouses and data marts where speed of data retrieval is more important than the efficiency of data manipulations. As such, the tables in these schema are not normalized much, and are frequently designed at a level of normalization short of third normal form. Example of snowflake schema is shown in the below diagram.
Snowflake schema illustration
The decision whether to employ a star schema or a snowflake schema should consider the relative strengths of the database platform in question and the query tool to be employed. Star schema should be favored with query tools that largely expose users to the underlying table structures, and in environments where most queries are simpler in nature. Snowflake schema are often better with more sophisticated query tools that isolate users from the raw table structures and for environments having numerous queries with complex criteria.

Benefits of snowflake schema design

  • Some OLAP multidimensional database modeling tools that use dimensional data marts as a data source are optimized for snowflake schemas.
  • If a dimension is very sparse (i.e. most of the possible values for the dimension have no data) and/or a dimension has a very long list of attributes which may be used in a query, the dimension table may occupy a significant proportion of the database and snowflaking may be appropriate.
  • A multidimensional view is sometimes added to an existing transactional database to aid reporting. In this case, the tables which describe the dimensions will already exist and will typically be normalized. A snowflake schema will therefore be easier to implement.
  • A snowflake schema can sometimes reflect the way in which users think about data. Users may prefer to generate queries using a star schema in some cases, although this may or may not be reflected in the underlying organization of the database.
  • Some users may wish to submit queries to the database which, using conventional multidimensional reporting tools, cannot be expressed within a simple star schema. This is particularly common in data mining of customer databases, where a common requirement is to locate common factors between customers who bought products meeting complex criteria. Some snowflaking would typically be required to permit simple query tools to form such a query, especially if provision for these forms of query weren't anticipated when the data warehouse was first designed.
 
Interview Questions Data minining blog