Introduction
A data warehouse (DW) appliance is an integrated set of servers, storage, operating system (OS), database, and software that is specifically preconfigured and tuned for the rigors of data warehousing. Data warehouse appliances offer an attractive price/performance value proposition: they are frequently a fraction of the cost of traditional data warehouse solutions, and they have reduced maintenance requirements.
DW appliance vendors use massively parallel processing (MPP) to provide high query performance and scalability. MPP architectures consist of independent processors or servers executing in parallel. DW appliances distribute data onto dedicated disk storage units that are connected to each server. This data distribution allows multiple servers to resolve queries by scanning data in parallel, yielding high performance for table scans. In addition to high performance, MPP has a proven record for scalability.
Dramatically reduced total cost of ownership (TCO) coupled with high-performance data access provide compelling reasons for companies with large data warehouses to seriously consider DW appliances. However, it is essential that they also consider the various factors associated with planning, migrating, executing, and maintaining an appliance for their particular environment.
Industry Trends
DW appliances started to emerge about five years ago. The first entrants into the DW appliance market distinguished themselves by providing data warehouse performance through high-speed table scans at a low cost. While high-speed scanning is a significant part of data warehouse performance, DW appliances must also provide a comprehensive environment capable of adapting to changing workloads.
To achieve high-speed scans, these DW appliances used proprietary servers, storage, OS, and databases to implement their MPP architectures. This approach limited these first-generation DW appliances in their ability to keep up with hardware innovation, scalability, reliability, and the flexibility needed for mixed workloads. Proprietary architectures have limited ability to partner with major server and storage vendors. They have to create their own proprietary technology for items such as fault tolerance and high availability—features already provided by standard servers and storage units in use in second-generation DW appliances.
Second-generation DW appliances represent the move to mainstream vendor participation. HP Neoview has recently been launched. DATAllegro and Greenplum have partnerships with EMC and Sun, respectively, which allow these DW appliances to take advantage of the best-of-breed hardware their partners offer so they can focus research and development on software rather than hardware innovation. Second-generation DW appliances offer low-cost models capable of solving performance problems, and they provide scalability that traditional solutions and first-generation DW appliances cannot address. These vendor-partnered DW appliances have huge national sales teams and better funding than stand-alone DW appliance vendors, and will likely have faster adoption.
The standards and commodity-based technology used in second-generation DW appliances offer lower costs and accelerated advancement over proprietary solutions. Standards-based and vendor-partnered solutions generally offer higher availability and more quality assurance programs than proprietary solutions. As a result, second-generation DW appliances reduce risk for companies concerned about vendor lock-in and proprietary hardware.
TCO for a Traditional Data Warehouse System
With a traditional data warehouse system, an organization is completely responsible for the individual hardware, OS, software, and storage; there is also significant overhead for full-time employees (FTEs) to maintain the system. Table 1 is a model for a 60 TB system.
Sample Costs for a Traditional DW System
| Server 12 CPU - maintenance |
|
$120,000 |
$360,000 |
| (Storage) Tier 1 - 40,000 GB |
$2.10/gb/mo |
$1,008,000 |
$3,024,000 |
| System admin - 1 FTE |
100,000/yr |
$100,000 |
$300,000 |
| Total costs |
|
$1,688,400 |
$5,065,200 |
Table 1: TCO model for a 60 TB data warehouse.
A few things to note about this model:
- It does not include a backup/recovery system strategy.
- The system administrator is listed as a full-time employee over the three-year analysis because he/she must maintain the balance between hardware, storage, operating system, and RDBMS.
- The database administrator is a full-time employee over the three-year analysis since he/she must constantly tune the database for performance: indexing and re-indexing, restructuring database objects, considering storage, memory, defragmentation, and so on.
Using traditional technology, this DW system has to be built and maintained in-house, and database tuning is an ongoing task. Consequently, this model costs, conservatively, upwards of $1.7 million per year or over $5 million over three years.
With numbers like these, it is no wonder that most organizations are reticent to expand their data warehouse initiative—it simply costs too much to grow, doesn’t scale, and end users don’t get the performance they need to complete analytics in a timely manner.
TCO for a DW Appliance
A data warehouse appliance has been specially designed from the ground up to serve one purpose: high-speed data warehouse analysis. Therefore, all hardware, OS, database, and storage are carefully tuned and balanced using commodity-based technology. Table 2 is a model for a second-generation 60 TB DW appliance system.
Sample Costs for a DW Appliance Solution
60 TB appliance + backup system +
landing zone + maintenance |
|
$1,800,000 |
$272,000 |
$272,000 |
$2,344,000 |
| Conversion costs |
$30/hr |
$9,000 |
0 |
0 |
$9,000 |
| DBA support – 1/2 FTE |
170,000/yr |
$85,000 |
$85,000 |
$85,000 |
$255,000 |
Table 2: TCO model for a 60 TB data warehouse appliance.
A few important notes on this model:
- This model includes backup/restore technology. It also includes a landing zone that dramatically facilitates data loads by providing a convenient staging area that can perform a series of data manipulations (data validation, record counts, etc.) without hampering appliance activities.
- Costs associated with implementation, loading data, and conversions from existing business intelligence (BI) platform have been taken into consideration.
- System administration requirements have been reduced to 50 percent of the traditional model through the use of:
– Fully automated failover
– Non-intrusive backups/restores
– Fully automated query prioritization
– SNMP notification
– OS/DBMS/software upgrades
- Database administration costs were reduced 50 percent from the traditional model through:
– Reduced query tuning
– Reduced DB design/tuning
– Elimination of indices
– Elimination of aggregates
– Elimination of temporary tables or views
– Reduced effort to expand system
By consolidating the hardware, OS, RDBMS, storage, and backup system into a component-based system, staffing requirements for both systems as well as DBA support are reduced dramatically. This much simpler model costs only about $2.8 million to own over a three-year period.
Comparing ROI between Traditional Technology and a DW Appliance
Using the summary numbers from the two scenarios just described, we can readily see the cost advantage of a DW appliance over using traditional technology. While the start-up costs for a DW appliance are higher in the first year than in subsequent years, the overall cost savings are still over 44 percent of the costs of a traditional data warehouse system. See Table 3.
Return on Investment (Three Years)
| First-year costs |
$1,688,400 |
$2,004,000 |
($315,600) |
-18.69% |
| Ongoing annual costs |
$1,688,400 |
$407,000 |
$1,281,400 |
75.89% |
Table 3: The DW appliance saves 44 percent in costs.
Clearly, DW appliances have a tremendous TCO and ROI advantage over traditional technologies. Beyond the obvious cost savings, there are several hidden savings associated with a DW appliance. These include:
- Assurance that the appliance will not become unbalanced or obsolete, since the standard maintenance contract mitigates this risk; this benefit is called “future-proofing” and is an important consideration.
- A single vendor provides full support for the entire technology.
Mitigating Risks
With any technology, associated risks can be reduced drastically if expectations and implementations are planned carefully from the beginning. There are several factors that go into the planning of a DW appliance implementation, but here are some of the more prevalent ones.
Project Scope. As with all projects, it is imperative that the first few implementations are realistic in size and scope. Do not try to “boil the ocean” by implementing every data mart/data warehouse in your organization at the same time. Pick a highly visible project and scope it to be completed in no more than 30 days. In this way, users and executives alike will begin to see the benefits of their investment right away.
Security. This seems like an obvious consideration, but it is alarming how many organizations do not include security in their implementation plan from the very beginning—it is usually an afterthought. With everincreasing legislative requirements to secure data, it is paramount that you consider the capabilities of the DW appliance to support compliance. Security is not limited to just user/group/role models for row-based and columnbased security. You should also consider data encryption for data at rest as part of your security scheme.
ETL/BI Tool Integration. The DW appliance does not stand alone; it communicates with the outside world. Consequently, your evaluation of DW appliances should include a full assessment of how well your BI tool and ETL tools integrate with it. For the more advanced BI vendors, it isn’t sufficient to say “the appliance is accessed via ODBC” and leave it at that. It will still be necessary to understand the level of effort associated with adjusting the metadata mappings to load to the appliance and/or the metadata definitions to access the appliance with the BI tool.
Backup/Recovery. Regular backup of your DW environment is absolutely essential. However, the backups should be non-obtrusive, meaning they should not interfere with the normal operation of the appliance and they should run extremely fast. The same should be true for recoveries.
Conversion Costs. Consider carefully the gyrations you will have to go through to get your current DW environment migrated to the DW appliance. The vendor should be prepared to help you with this process and include it in the implementation fee. Conversion costs include database migration as well as ETL tool and BI tool integration.
Maintenance. Maintenance refers to the overall effort required to perform such tasks as keeping the appliance tuned, the data tables defragmented, the indices (if any) refreshed, and so on. Beware of vendors who claim that tuning is not necessary for their technology. There is seldom a “one size fits all” scenario in the technological world, especially as pertains to data warehousing. It is extremely important to consider the downtime, if any, required to perform these tasks. Note that many of these time-consuming tasks will not reveal themselves in a standard proof of concept (POC) unless you ask specifically to see them.
Real-Time Data Loading (Mixed Workload). As data warehouses mature, more organizations want to load data in a more real-time fashion to have it readily available for queries. However, this scenario introduces interesting problems if you have queries “in flight” while loads are running. Second-generation DW appliances have addressed this issue head-on. This is another scenario that may not be readily demonstrated in a standard POC unless you ask.
System Expansion and Growth. Although your current DW computing and storage capacity may be known, your needs are likely to expand over time. It is imperative that any system expansion can be accomplished with minimal or no downtime. Beware of “forklift” upgrades to your appliance; look for upgrades in place, where additional components can be added quickly and tables do not have to be reloaded.
Implementation and Training. From a technical perspective, implementation includes all aspects of the project, from data transport to BI/ETL tool integration to database tuning. When planning your DW appliance implementation, do not use this opportunity to clean up too many past mistakes. Instead, your data warehouse on the appliance should have the same look and feel as your existing system, except that it is much easier to maintain and significantly faster to run.
Users of the appliance, primarily DBAs, should have an intimate understanding of best practices specific to the technology. For example, if your technology takes advantage of multi-level partitioning of data for query performance, the DBA should understand the best techniques for using it. Likewise, indices, if indicated, should be used judiciously.
Coexistence with Existing Infrastructure. Although the DW appliance will bring tremendous performance and cost savings to an organization, the appliance must coexist with the existing infrastructure. Second-generation DW appliances support this goal, since each server of the MPP system is a stand-alone system. Therefore, integration to the outside world is easily performed using standard Linux shell script or common utilities between the DW appliance and external systems.
Conclusion
There is no doubt that DW appliances will take a pivotal role for organizations with very large database issues. Their price/performance make them a must-have in the data warehouse world.
However, as with all relatively new technologies, be sure to set the expectations of users and executives properly: do not attempt too much in the first project, and minimize the number of changes to the existing system.
If planned properly, your DW appliance will open new doors to your business analytics and reporting end users, allowing your organization to make hard business decisions faster and cheaper than ever before.
Learn More: Download White Papers | Have DATAllegro Contact You
Jesse Fountain is VP of product marketing for DATAllegro and has more than 20 years of experience in the business intelligence arena.
jfountain@datallegro.com
|