When I bring up the concept of a data warehouse to my clients I often get the same few questions.
- What is a data warehouse?
- Why do I want or need a data warehouse?
- What’s involved in creating a data warehouse?
In this post I will attempt to answer each of these questions.
What is a data warehouse?
First off, do not be overwhelmed by the term “Data Warehouse”. Data Warehouse is just a fancy name for a database with a special purpose. Like other databases, data warehouses are usually setup in Relational Database Management Systems (RDBMS) such as MySQL, MS SQL Server, Oracle or other comparable systems. And like other databases, data warehouses are basically made up of tables, queries and other objects with properties regarding their structure, access rules, etc.
There are two key differences between data warehouses and operational databases, however. First, data warehouses are typically implemented to store a more comprehensive set of your organization’s data, both in terms of operational scope and history. They do this to provide you with a single repository of all operational and historical data. This is the special purpose of a data warehouse.
The second key difference is data warehouse designs are optimized to support reporting and data analysis requirements. Data needs to be retrieved from the system quickly. Operational databases on the other hand are usually optimized to support transaction processing requirements. Data needs to be added and updated quickly.
Why do I want or need a data warehouse?
There are a number of reasons why an organization would implement a data warehouse.
The first reason is to consolidate organizational data into one database to support comprehensive reporting and data analysis. Many organizations use more than one operational system to maintain their data. There are accounting systems, point-of-sale systems, CRM systems, time-clocks, etc, each of which store information in their own way. Within the data stores of these systems is a wealth of information. Unfortunately, many organizations find if very difficult and often impractical to create reports that span these sources in any meaningful way. The data is simply too segregated. By consolidating this information into a data warehouse reporting and data analysis becomes a much easier task. Decision makers and others can be given access to tools that let them see the big picture and drill down on the detail.
The second reason is to maintain a more comprehensive history of information. Some operational systems, in order to maintain adequate performance, need to archive or purge their data after a while. I am currently working with a client who’s point of sale system purges all transactions after 3 months. Because they create 25,000 invoices per month, this makes sense. Allowing that data to accumulate could have adverse effects on performance and cause lag time at the point-of-sale. That would be a customer service no-no, so the point-of-sale system is kept lean and mean. Of course, purging the older data means historical information is lost. By implementing a data warehouse, we are able to preserve and query this historical data indefinitely without degrading the performance of operational systems.
A third reason is to act as a master for data synchronization tools. Data often overlaps in operational systems. Employee information, for example, can be stored in point-of-sale systems, time-clock systems and more. A data warehouse can be a useful tool that helps keep this information in sync. Good syncing tools saves your staff time and help reduce the chance for keying errors. In a similar way, a data warehouse can be used to support other system integration functions like moving transactions from one system to another.
A data warehouse, when implemented properly, is a tool that supports decision making, improves the integrity of your data, and maximizes your staff’s productivity.
What’s involved in creating a data warehouse?
Creating a data warehouse is a process that does require good technical skills and experience in data architecture and software engineering. Nevertheless, the overall process can be understood by most.
The first step in creating a data warehouse is to implement a platform that will support it. This means you will need a server machine running an RDBMS package. Many factors go into selecting the appropriate machine. These include such things as the number of users, the frequency of data migrations, the complexity of reports and data analysis as well as any other applications the server must support. It is important to use some of the proven methods for identifying the hardware requirements of a proposed application server.
As for the RDBMS, I have had great success with MySQL and because it’s open source, the only expense associated with it is the time it takes to set up and administer it. Of course, data warehouses can be implemented with any number of database products. If you already run a specific RDBMS package it will probably be cost effective to use it for your data warehouse as well. It is important to evaluate any proposed package against your specific requirements.
The second step is to design the data model. This involves identifying all of the information that the warehouse must maintain. This is usually done by examining what information is available in your operational systems and identifying specific reports that will be required. From there we create a model - essentially a list of tables - that will hold this data and support the reporting requirements. Once the model is developed it must be implemented on the server. This is done either by writing scripts or using tools provided by the RDBMS vendor.
The third step is to create the data migration tools that will copy data from your operational systems into the data warehouse. In some cases the tools will be scripts or programs that access data using standard interfaces such as ODBC or COM, or a proprietary software development kit (SDK) provided by the vendor.
Other scripts will import data from files in either proprietary formats or common formats like CSV. This will be the case when operational systems don’t offer interfaces and only support export functions or we are receiving files from vendors, customers, field offices or other places.
Many RDBMS tools, including MySQL, have wizards available for creating migration scripts that support common data source formats. This can help speed up the process of creating these tools.
In addition, any number of software developments tools can be used to create the necessary programs.
The fourth step is to setup a migration schedule. To be most useful to decision makers the data warehouse must be updated on a regular basis. Operational data changes constantly and the data warehouse needs to keep pace.
In cases where operational data can be accessed directly, without user intervention, we simply setup the migration tools to run at predetermined times using a task scheduler.
For the cases where data is imported from files, we need to create tools that allow users to initiate the migration by using a File | Open style dialog or by simply uploading, copying or otherwise placing the file into a specified location.
Deciding on the specific migration frequency is simply a balancing act between the need for real time reporting and the load each data migration puts on the data server and other operational systems.
Last but certainly not least, we need to create reports and other data analysis tools that support decision making and other organizational objectives. With a properly designed and implemented data warehouse report writing becomes a breeze. Because the warehouse is built using a SQL-based RDBMS, reports and analysis tools can be developed easily using any number of available tools.
So there you have data warehouses in a nutshell. They help you make more informed decisions, preserve a comprehensive history of your organizations data and improve information management processes. And with the assistance of an experienced data and software consultant building one is relatively straight forward process.
Contact me if you would like to learn more about how to build your data warehouse.
Bookmark and Share This Post