Enterprise Data Warehouse Optimization with Hadoop on IBM Power Systems Servers

· · ·
· IBM Redbooks
4.3
3条评价
电子书
82
符合条件
评分和评价未经验证  了解详情

关于此电子书

Data warehouses were developed for many good reasons, such as providing quick query and reporting for business operations, and business performance. However, over the years, due to the explosion of applications and data volume, many existing data warehouses have become difficult to manage. Extract, Transform, and Load (ETL) processes are taking longer, missing their allocated batch windows. In addition, data types that are required for business analysis have expanded from structured data to unstructured data.

The Apache open source Hadoop platform provides a great alternative for solving these problems.

IBM® has committed to open source since the early years of open Linux. IBM and Hortonworks together are committed to Apache open source software more than any other company.

IBM Power SystemsTM servers are built with open technologies and are designed for mission-critical data applications. Power Systems servers use technology from the OpenPOWER Foundation, an open technology infrastructure that uses the IBM POWER® architecture to help meet the evolving needs of big data applications. The combination of Power Systems with Hortonworks Data Platform (HDP) provides users with a highly efficient platform that provides leadership performance for big data workloads such as Hadoop and Spark.

This IBM RedpaperTM publication provides details about Enterprise Data Warehouse (EDW) optimization with Hadoop on Power Systems. Many people know Power Systems from the IBM AIX® platform, but might not be familiar with IBM PowerLinuxTM, so part of this paper provides a Power Systems overview. A quick introduction to Hadoop is provided for those not familiar with the topic. Details of HDP on Power Reference architecture are included that will help both software architects and infrastructure architects understand the design.

In the optimization chapter, we describe various topics: traditional EDW offload, sizing guidelines, performance tuning, IBM Elastic StorageTM Server (ESS) for data-intensive workload, IBM Big SQL as the common structured query language (SQL) engine for Hadoop platform, and tools that are available on Power Systems that are related to EDW optimization. We also dedicate some pages to the analytics components (IBM Data Science Experience (IBM DSX) and IBM SpectrumTM Conductor for Spark workload) for the Hadoop infrastructure.

评分和评价

4.3
3条评价

为此电子书评分

欢迎向我们提供反馈意见。

如何阅读

智能手机和平板电脑
只要安装 AndroidiPad/iPhone 版的 Google Play 图书应用,不仅应用内容会自动与您的账号同步,还能让您随时随地在线或离线阅览图书。
笔记本电脑和台式机
您可以使用计算机的网络浏览器聆听您在 Google Play 购买的有声读物。
电子阅读器和其他设备
如果要在 Kobo 电子阅读器等电子墨水屏设备上阅读,您需要下载一个文件,并将其传输到相应设备上。若要将文件传输到受支持的电子阅读器上,请按帮助中心内的详细说明操作。