Data Vault Modeling is a database modeling method used for delivering Data Analytics Service to an enterprise supporting its Business Intelligence, Data Warehousing, Analytics and Data Science requirements.
Traditional Data Warehouse (DW) Design Approaches:
Based on the requirement of the project, there are two types of approaches to choose from for designing the DW solution
- Top down (Inman)
- Bottom up (Kimball)

For both approaches, we are starting with extracting data from different variety of source systems and transform it within the staging area for the next steps.
Top down: Inmon defines a data warehouse as a centralized repository for the entire system. The information is gathered and integrated first in a normalized data model (complete data warehouse) and after that dimensional data marts are created which contain data required for a specific business process(use case)
Bottom up: In Ralph Kimball's dimensional design approach data marts are created first to provide reporting and analytical capabilities for a specific business process as quickly as possible. Afterwards, these data marts can then be integrated with confirmed dimensions in order to create a comprehensive data warehouse. Kimball uses dimensional models such as Star or Snowflakes schema to organize the data in both data marts and enterprise data warehouses, while Inman's uses ER modelling to create normalized data models for enterprise data warehouse and dimensional modelling for only DataMart.
Challenges with traditional approaches:
- Not agile to changing business systems, rules or structure
- Significant construction, maintenance and reengineering efforts
- Difficulties in real time loading of data due to complex dependencies
- Not able to handle big data (structured/unstructured, large volume)
Data vault to the rescue:
As defined by linsted it is a hybrid approach encompassing the best of breed between 3NF and star schema with a design which is flexible, scalable, consistent and adaptable to the needs of the enterprise and is perfect to address the challenges posed by traditional approaches.
And contrary to popular belief, data vault is not just a modeling technique it is an entire methodology for data warehouse projects.
Data vault model architecture:
It is composed of hubs, links and satellites

Hubs:
Contains a distinct list of business keys (low tendency to change) and metadata about when each key was first loaded and from where
Links:
Represents relationship between the hubs
Satellites:
- Satellites contain data about their parent Hub or Link and metadata about when the data was loaded, from where, and a business effectivity date
- it contains data that tends to change over time
- They are Point in Time: so we can ask and answer the question, "what did we know when?"
Modelling steps in data vault:
- Setting naming conventions: Common set of rules or guidelines to apply to the naming of tables and columns.
- Model Hubs table: This requires an understanding of the business keys and their usage across the business. So after establishing the business keys and hash keys of it, we will create the hub tables.
- Model Links table: Link tables are created establishing relationships between the business keys and hash keys in different hub tables
- Model satellite: - Satellite provide context to each business case(hubs and links) - Therefore, first we will group the satellites by rate of change ,type of information or source system, then we will establish the description around hubs and links
Optional steps in modelling:
There are few optional steps based on your business and system performance requirements
- Add standalone objects: like calendar or description as our reference tables
- Add performance tables: like point in time and bridge table to optimize the model
- Design information mart
Drawbacks of Data Vault
So while there are many advantages to data vault, it does also have some drawbacks that must be considered as well.
- In data vault, source tables are de-normalized by distributing relationships, business skills and attributes into separate tables therefore, the number of tables being created is high when compared to the normalized structure such as 3NF schema
- Data Vault requires a lot of JOIN's to derive data marts - Bridge tables can help
- Like 3NF, Data Vault is impractical for direct querying - Query from a derived data mart
- And due to De-normalized structure, more storage is required - Better use cheap storage
There is not a single solution that would perfectly match for each data warehouse requirements, So data Vault modeling is robust and major data architecture, but can only provide real value to an organization when it is used for the right use case.
References:
1.What is Data Vault? I Frequently asked questions I Learn about Data Vault