Should Businesses Deploy Enterprise-Wide Data Warehouse Architecture or Should They Go with Data Marts?

Data warehousing began in the 1980s as part of decision support systems (DSS). Fundamentally different to the already existing operational and transactional processing systems, DSSs utilized data warehouses to store and integrate information from these disparate systems that served only the day to day needs of their target business units and allowed for a historical view of all organizational data [1][2]. In 1990, Bill Inmon was bestowed the designation of the “Father of Data Warehousing” after he published his influential book Building the Data Warehouse which describes the top-down approach, or enterprise-wide data warehouse (EDW) methodology, to architecting a data warehouse. A few years later in 1996 Ralph Kimbell described a competing bottom-up approach, or data mart (DM) methodology, in his work The Data Warehouse Toolkit [3]. This report examines the two methodologies, the differences between them, and why till this day there is still no industry standard, or agreement, on which approach is “better”.

Inmon defined a data warehouse as data that is integrated, subject-oriented, time-variant, and nonvolatile to be used by management during decision making processes [4]. While there are several prominent variations in data warehouse architecture (e.g. centralized, hub and spoke, federated, independent DMs, and bus architecture) they may all be designated either an EDW or DM approach. The EDW methodology looks at the data warehouse architecture from the top-down, that an organization’s data warehouse should be planned, designed, and executed centrally. Individual business units should follow the centrally designed architecture when building their databases and no database should be developed independently. In comparison, Kimball’s DM methodology looks at the data warehouse architecture from the bottom-up, that each business unit should build their data warehouse inline with their own department requirements and then integrate to form one large data warehouse as needed [2]. In this case, cohesion of the data warehouse enterprise-wide is accomplished by using a data bus standard (i.e. all DMs use the same standardized dimensions to allow for seamless integration) [3].

Comparing the two methodologies there are major differences between them. First, the DM architecture is easier to design and build when compared to EDW. This is simply due to the reduction in required scale and enterprise-wide collaboration needed for the design. Second, the DM approach is more agile when compared to EDW, in which more meticulous planning is required and is therefore more difficult to alter. Third, the DM approach is more difficult to maintain as an enterprise-wide resource when compared to the EDW architecture which places emphasis on this aspect from the onset. Fourth, the DM approach can lead to redundant data (i.e. when the same data is in more than one DM) which can negatively impact data warehouse resources and efficiency. This issue occurs to a much lesser extent with EDWs due to design and regulation. Fifth, it can be extremely difficult to integrate independent DMs from a DM architecture with disparate data structures. This is not an issue for EDW as the architecture is centralized [2]. Sixth, the DM approach has lower start-up costs, with each DM after the first costing roughly the same. This in comparison to the EDW methodology which has a much higher initial cost but lower subsequent project costs. Seventh, and following on from the previous point, the DM approach requires a small team of generalist to manage each DM whereas a large specialized team, or teams, is needed to manage an entire EDW [3].

So, which methodology is “better”? The answer is, it depends on the organization and who you ask. In 2005, a study conducted by Watson & Ariyachandra found that of the 454 companies surveyed, 39% deployed a hub and spoke architecture (EDW), 26% deployed the bus architecture (DM), 17% were centralized (EDW), 12% were independent DMs, and 4% were federated (no one enterprise-wide data warehouse). The researchers found that there was no significant difference in initial development time and cost when comparing independent DMs, bus architecture, and centralized architecture. In addition, there were almost no significant differences between the hub and spoke, centralized, and bus architectures on success metrics [5]. Another study of 400 companies by the same researchers found that there are specific factors that affect the likelihood of an organization choosing a DM or EDW approach. These factors include resource constraints, perception of IT staff, and strategic view of the data warehouse project [6].

Singh & Malhotra (2011) found that independent DM architecture is more likely to be deployed when resources are limited, and the bus architecture only when there is a real need for data share. In opposition, centralized or hub and spoke architectures were used when the data warehouse project was considered a fundamental component of the organization’s strategy and the resources were available. Interestingly, it was noted that even when this was the case the bus architecture (DM) was typically chosen over an EDW methodology especially when speed was a factor [7]. In addition, a study by Agosta (2005) concluded that the type of data warehouse methodology deployed by an organization depends on what type of company it is. For example, companies that are very centralized regarding governance or geography would benefit from pursing an EDW architecture to increase efficiency. Whereas the opposite is true for highly decentralized companies and they should deploy a DM methodology [8].

Finally, Kimball & Ross (2002) highlight that most organizations don’t have the time, or financial resources, to build a complete EDW and that the bus architecture should be deployed to build an enterprise-wide data warehouse in a decentralized fashion [9]. Availability of resources is a major hurdle to building out an EDW. To deploy an EDW successfully an organization must engage with costly technology vendors, infrastructure vendors, and/or information modeling companies to help guide them in design and implementation. Sen & Sinha (2005) highlight that the data warehouse methodology and the vendor, or company, that an organization works with should depend on how well an organization understands their business issues [10]. If an organization chooses the wrong approach it may cost them even more in the long run.

In conclusion, the findings from the discussed research helps explain why there is still no industry standard for developing a data warehouse and why both EDW and DM methodologies have survived. There is no simple answer to which approach is “better” and organizations should be thoughtful about which methodology they choose, looking at resources, technical expertise of staff, and strategic importance. ____________________________________________________________

References

1. Ponniah, P. (2010). Data Warehousing: Fundamentals for IT Professionals (2nd Edition). Wiley.

2. Alsqour, M., Matouk, K., & Owoc, M. (2012). A survey of data warehouse architectures: preliminary results. Conference: Computer Science and Information Systems (FedCSIS).

3. Breslin, M. (2004). Data Warehousing Battle of Giants: Comparing the Basics of the Kimball and Inmon models. Business Intelligence Journal, 9(4), 6–20.

4. Inmon, W. H. (2006). DW 2.0 Architecture for the Next Generation of Data Warehousing. DM Review, 16(4), 8–25.

5. Watson, H. J. & Ariyachandra, T. (2005). Data Warehouse Architectures: Factors in the Selection Decision and the Success of the Architectures. Terry College of Business, University of Georgia.

6. Ariyachandra, T. & Watson, H. J. (2010). Key Organizational Factors in data warehouse architecture selection. Decision Support Systems, 49, 200–212

7. Singh, S. & Malhotra, S. (2011). Data warehouse and its methods. Journal of Global Research in Computer Science, 2(5), 113–115.

8. Agosta, L. (2005). Hub-and-Spoke Architecture Favored. DM Review, 15(3), 14–63.

9. Kimball, R. & Ross, M. (2002). The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling (2nd Edition). Wiley.

10. Sen, A. & Sinha A. P. (2005). A Comparison of Data Warehousing Methodologies. Communications of the ACM, 48(3), 79–84.

--

--

--

Data Analytics & Blockchain!

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

What is Continuous Integration? 11 Key Practices and Principles

How I moved a library from Python 2 to 3

What to Choose? Flutter vs. React Native

Vulnhub- DC 4

Chatbot Integrations — Adding an Integration in Teneo

This is how we achieved true CI-CD in our project in 2015

The perfect Project that will get you a job as a Software Developer

Learning an Assembly Language: How the Processor Works

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Tom Doll

Tom Doll

Data Analytics & Blockchain!

More from Medium

Quick-Start with dbt and Postgresql

Using Macros in dbt to Save Time | Crimson Macaw

Datameer vs SQL: What transforms data faster?

Building data pipelines with the next generation of process — ELT