Solving retail data warehouse challenges with DATAllegro
In a fragmented consumer market, retailers rely heavily on intelligence gained from customer analysis. With data volumes expanding into the tens of terabytes, the data warehouses from which these analytics are stretched to the limit in both capacity and performance. Comprehensive customer analysis presents both business and technical challenges. Second-generation data warehouse appliances such as DATAllegro v3 offer unique capabilities to meet these challenges.
Business Challenges
The consumer market faces increasing complexity and fragmentation over the next several years. The frequency with which consumer demographics are changing is increasing. Population ages are shifting globally with aging populations in many first world countries. Urbanization and population diversity change both in the U.S and Europe cause changes to spending patterns. Technology advances alter consumer spending habits and the channels through which they spend. Researchers are using the term “Competitive Darwinism” to describe the agility needed to adapt to these fast-paced market conditions. Analytics provides much of the insight needed to provide unique marketing strategies. A comprehensive approach to analytics requires several tactics:
- Dig deeper into the shopping basket – Retailers need deeper insight into what drives customer’s spending behavior. The data warehouse must be able to support analysis of data at the atomic level. This fine-grained intelligence enables retailers to fine-tune customer centric strategy and execution.
- Super-sized Sandbox – Accurate models of customer behavior often includes “what if” analysis. Such analysis requires the support of a broad mix of queries, fast aggregation and the ability to reorder large volumes of historical data quickly. These needs are often suited to an inexpensive, stand-alone data warehouse or data mart capable of high capacity. The challenge in creating this environment is the ability to load, store and serve up the large quantities of data required to provide meaningful and accurate results. High performance data loading is required. The sandbox should also provide a fast time to value and low administration overhead.
- Early Warning Indicators – To respond to new challenges with speed and agility, early warning indicators provide intelligence on item sales at both the high and low end of the curve. Identifying top sellers early allows managers to mobilize supplies to support sales, while discounting low turnover sales items. Early warning indicators need a data warehouse that supports high performance analytics and high availability.
- Secure Analytics – Protecting sensitive customer information remains a high priority for IT management in order to reduce the threat of exposure and liability.
Technical Challenges
The broadening realm of analytics shapes the workload on the retail data warehouse. The technical challenges a data warehouse must overcome in supporting complex analytics are as follows:
- A broad range of ad-hoc queries and enterprise reporting requires a data warehouse capable of handling mixed workloads.
- As global requirements increase, higher availability is required to support users in different time zones.
- Solutions must support concurrent loads and queries with a minimal impact to performance
- Load and update frequencies are increasing in response to the demand for near real time data and reporting.
- Batch processing windows are shrinking as use of the data warehouse is becoming global.
- High-speed backup facilities are required to complete large volume backups outside of peak usage.
- These large data warehouses are not only complex to manage, but also expensive, with high costs for capacity expansion and hardware limitations. Data warehouses that are I/O bound will only be able to increase performance at the same rate that disk technology advances – fifteen percent per year. Database capacity requirements are increasing at over fifty percent per year according to statistics. The capability gap narrows the data warehouse solutions that are able to match database growth and adapt to change.
- Security and compliance standards need continual maintenance and review to reduce exposure of sensitive customer information.
The Solution
Second-generation data warehouse appliances such as DATAllegro v3 offer unique benefits for retail analytics with the flexibility to provide solutions for tactical projects such as aggregation or data marts, super-sized sandboxes and enterprise data warehousing. First generation data warehouse appliances offered a narrow band of solutions because of their proprietary and inflexible architecture. Second-generation data warehouse appliances base their architecture on standard and commodity platforms that offer a broader ability to solve complex problems far beyond the fairly simple capabilities of their predecessors.
Scalability and Flexibility to Support Growth
DATAllegro V3 offers a flexible architecture at a low price point that is able to scale to meet retailer’s data growth needs.
The Multi-Rack Appliance (MRA) allows multiple data storage racks to combine with a control rack, creating appliances than can scale from fifteen terabytes to a petabyte. Data racks are available in 15TB and 25TB increments. Capacity expansion needs can be satisfied on demand within an appliance for $15,000/TB. A landing zone is available for high-speed loading within the high-speed InfiniBand interconnect. A flexible backup architecture improves the support for very large data warehouses. The DATAllegro v3 price/performance ratio provides the scalability necessary to house entire sets of historical data to support both extensive analytics as well as regulatory requirements.
Cost-Effective Performance
The DATAllegro v3 data warehouse appliance provides a reliable high performance solution for retail analytics on large data volumes. Massive parallel processing with DATAllegro software coupled with standard servers from Dell, storage units from EMC and open source Ingres offers high performance results that are ten to one hundred times faster than traditional solutions.
DATAllegro v3 utilizes Direct data streaming (DDS) ™ to stream data off disk. Intel Quad Core Xeon processors process streamed data in parallel, ensuring an architecture that is not bound by I/O and is able to advance as processor technology advances. The Ultra shared Nothing (USN) ™ architecture ensures even data distribution across appliances so that each node shares in the overall processing and operates independently. DATAllegro supports “what if” analytics through USN and DDS so that aggregations and complex queries are performed using massive parallel processing and multi-level partitioning. High Speed re-ordering of data and table creation allows different scenarios to be tested, a process that would take hours on traditional data warehouse solutions.
According to Donald Feinberg, vice president and distinguished analyst, Gartner, “During the next three years, mixed workload performance will become the single most important performance issue in data warehousing. Customers will benefit as data warehousing providers, including data warehouse appliance vendors, develop and strengthen this capability.” Second generation data warehouse appliances provide unique abilities to support concurrent loading with ad-hoc queries and enterprise reporting. DATAllegro v3 fine-tunes workload management and provides the ability to optimize queries easily. The overall architecture provides a data warehouse that is capable of table scans from 0.5TB/minute to 10.56TB/minute in the MRA range.
Enterprise Level Reliability
Major technology partners reduce the risk for managing valuable data warehouse assets. Commodity devices are combined within the appliance to ensure there is no single point of failure. EMC storage nodes connect to both primary and warm standby compute nodes provided by Dell through a dual high-speed fiber channel netword. EMC Storage nodes provide 100% reliability through the implementationof RAID 1 and hot swappable disks. The result is an appliance that offers enterprise class availability and a high MTBF with no single point of failure.
Reduced Administration
Database administration is much simpler than traditional solutions with automated space management, reduced tuning, and utilities for high-speed loading and backup. Second-generation appliances such as DATAllegro v3 can be tuned and optimized for concurrency, workload and throughput requirements. High performance for near real time loads alongside both short and long running queries ensures support for complex mixed workloads. Simple administration ensures rapid deployment of sandboxes and returns fast time to value.
Security
DATAllegro is the first data warehouse appliance to offer encryption. Large retail data warehouses contain large amounts of historical and sensitive information often accessible by users in many locations. Encryption reduces exposure to threats and liability without an impact to performance of the data warehouse.
Superior Business Intelligence
DATAllegro performance enables business intelligence with, or without, aggregation. Retailers are able to build a common view of the customer across the entire history set while enabling fast response to customer behavior changes across channels. As a result, markdowns can be minimized, margins and sales improved, and enterprise performance metrics gathered. Whether the solution is an analytics sandbox or an enterprise data warehouse, DATAllegro provides a unique competitive advantage for retail analytics through a data warehouse appliance designed for change that adapts as the competitive market changes.
|