What are SCDs ?
Slowly Changing Dimensions (SCDs) are an approach in data warehousing used to manage and track changes to dimensions over time. It plays an important role in data modeling especially in the context of a data warehouse where maintaining historical accuracy of data over time is essential. The term “slow” in SCDs refer to rate of change and the method of handling these changes. For example, changes to a customer’s address or marital status happen infrequently, making these “slowly” changing dimensions. This is in contrast to “fast-changing” scenarios where data elements like stock prices or inventory levels might update several times a day or even minute by minute.
Types of SCDs
In the context of data modeling, there are 3 types of slowly changing dimensions. Choosing the right type of SCD depends on business requirements. Given below is brief overview of the most frequently used types of SCDs in data modeling
Type 1 SCD (No History)
Overwrites old data with new data. It’s used when the history of changes isn’t necessary. For example, correcting a misspelled last name of a customer.
Scenario:
A customer changes their email address.
Before the Change:
Customer ID | Name | |
---|---|---|
001 | Manan Younas | myemail@oldmail.com |
After the Change:
The old email is overwritten with the new one in the same record.
Customer ID | Name | |
---|---|---|
001 | Manan Younas | myemail@newmail.com |
Explanation:
In this table Manan’s email address is updated directly in the database, replacing the old email with the new one. No historical record of old mail is maintained.
Type 2 SCD (Full History)
Adds new records for changes, keeping historical versions. It’s crucial for auditing purposes and when analyzing the historical context of data, like tracking price changes over time.
Scenario:
A customer changes their subscription plan.
Before the Change:
Originally, the customer is subscribed to the “Basic” plan.
Customer ID | Name | Subscription Plan | Start Date | End Date | Current |
---|---|---|---|---|---|
001 | Manan Younas | Basic | 2023-01-01 | NULL | Yes |
After the Change:
The customer upgrades to the “Premium” plan on 2023-06-01.
- Update the existing record to set the end date and change the “Current” flag to “No.”
- Add a new record with the new subscription plan, starting on the date of the change.
Customer ID | Name | Subscription Plan | Start Date | End Date | Current |
---|---|---|---|---|---|
001 | John Doe | Basic | 2023-01-01 | 2023-06-01 | No |
001 | John Doe | Premium | 2023-06-01 | NULL | Yes |
Explanation:
Before the Change: The table shows John Doe with a “Basic” plan, starting from January 1, 2022. The “End Date” is NULL, indicating that this record is currently active.
After the Change: Two changes are made to manage the subscription upgrade:
- The original “Basic” plan record is updated with an “End Date” of January 1, 2023, and the “Current” status is set to “No,” marking it as historical.
- A new record for the “Premium” plan is added with a “Start Date” of January 1, 2023. This record is marked as “Current” with a NULL “End Date,” indicating it is the active record.
This method of handling SCDs is beneficial for businesses that need to track changes over time for compliance, reporting, or analytical purposes, providing a clear and traceable history of changes.
Type 3 SCD (Limited History)
Type 3 SCDs add new columns to store both the current and at least one previous value of the attribute, which is useful for tracking limited history without the need for multiple records.It is less commonly used but useful for tracking a limited history where only the most recent change is relevant.
Scenario:
A customer moves from one city to another.
Before the Move:
Initially, only current information is tracked.
Customer ID | Name | Current City |
---|---|---|
001 | Manan Younas | Sydney |
After the Move:
A new column is added to keep the previous city alongside the current city.
Customer ID | Name | Current City | Previous City |
---|---|---|---|
001 | Manan Younas | Melbourne | Sydney |
Explanation:
In this table, when Manan moves from Sydney to Melbourne, the “Current City” column is updated with the new city, and “Previous City” is added to record his last known location. This allows for tracking the most recent change without creating a new record.
These examples illustrate the methods by which Type 1 , Type2 and Type 3 SCDs manage data changes. Type 1 SCDs are simpler and focus on the most current data, discarding historical changes. Type 3 SCDs, meanwhile, provide a way to view a snapshot of both current and previous data states without maintaining full historical records as Type 2 SCDs do.
Architectural considerations for Managing SCDs
The management of Slowly Changing Dimensions (SCDs) in data warehousing requires careful architectural planning to ensure data accuracy, completeness, and relevance. Lets discuss the implementation considerations for each type of SCD and the architectural setups required to support these patterns effectively.
Type 1 Implementation consideration
Scenarios Where Most Effective
Type 1 SCDs are most effective in scenarios where historical data is not needed for analysis and only the current state is relevant. Common use cases include:
- Correcting data errors in attributes, such as fixing misspelled names or incorrect product attributes.
- Situations where business does not require tracking of historical changes, such as current status updates or latest measurements.
Architectural Setup
Database Design: A simple design where each record holds the latest state of the data. Historical data tracking mechanisms are not needed.
Data Update Mechanism: The implementation requires a straightforward update mechanism where old values are directly overwritten by new ones without the need for additional fields or complex queries.
Performance Considerations: Since this pattern only involves updating existing records, it typically has minimal impact on performance and does not significantly increase storage requirements.
Type 2 Implementation consideration
Scenarios Where Most Effective
Type 2 SCDs are crucial when the full history of changes must be preserved for compliance, reporting, or analytical purposes. They are widely used in:
- Customer information management, where it is necessary to track address history, status changes, or subscription details.
- Product information tracking, where changes over time can provide insights into lifecycle management and evolution.
Architectural Setup
Database Design: Requires a more complex setup with additional fields for managing historical data, such as start date, end date, and a current flag.
Data Management Strategy: Insertion of new records for each change, while updating or closing previous records to indicate they are no longer current. This setup can be managed through triggers or application logic.
Versioning and Timestamping: Implementation of version control and timestamping to ensure each change is accurately recorded with its validity period.
Performance and Storage Considerations: Type 2 can significantly increase the volume of data stored, which may impact performance. Indexing on key fields and partitioning historical data can help optimize query performance.
Type 3 Implementation Pattern
Scenarios Where Most Effective:
Type 3 SCDs are used when tracking a limited history is sufficient. This can be applicable in cases like:
- Tracking a previous address alongside the current one for a short-term promotional campaign.
- Monitoring recent changes in terms and conditions for services, where only the most recent previous state is relevant.
Architectural Setup:
Database Design: Includes additional columns to store both the current and previous values of the tracked attributes. This setup is simpler than Type 2 but more complex than Type 1.
Data Update Mechanism: Updates involve changing multiple fields within a single record—both updating the previous value fields and writing new current values.
Performance Considerations: This method increases the size of each record but does not increase the total number of records as Type 2 does. Performance impacts are generally moderate but can involve more complex queries than Type 1.
Conclusion
Each type of SCD requires a different architectural approach based on the business requirements for historical data. While Type 1 is simpler and less resource-intensive, Types 2 and 3 provide data tracking capabilities at the cost of additional complexity and potentially higher resource requirements. Properly choosing and implementing these patterns will depend on the specific needs of the business, the criticality of historical data, and the performance impacts on the data warehouse system.