Course Overview
This four-day Analyzing with Data Warehouse course will teach you to apply traditional data analytics and business intelligence skills to big data. This course presents the tools data professionals need to access, manipulate, transform, and analyze complex data sets using SQL and familiar scripting languages.
Who should attend
This course is designed for data analysts, business intelligence specialists, developers, system architects, and database administrators. Some knowledge of SQL is assumed, as is basic Linux command-line familiarity.
Course Objectives
Through instructor-led discussion and interactive, hands-on exercises, participants will navigate the ecosystem, learning how to:
- Use Apache Hive and Apache Impala to access data through queries
- Identify distinctions between Hive and Impala, such as differences in syntax, data formats, and supported features
- Write and execute queries that use functions, aggregate functions, and subqueries
- Use joins and unions to combine datasets
- Create, modify, and delete tables, views, and databases
- Load data into tables and store query results
- Select file formats and develop partitioning schemes for better performance
- Use analytic and windowing functions to gain insight into their data
- Store and query complex or nested data structures
- Process and analyze semi-structured and unstructured data
- Optimize and extend the capabilities of Hive and Impala
- Determine whether Hive, Impala, an RDBMS, or a mix of these is the best choice for a given task
- Utilize the benefits of CDP Public Cloud Data Warehouse
Course Content
- Foundations for Big Data Analytics
- Introduction to Apache Hive and Impala
- Querying with Apache Hive and Impala
- Common Operators and Built-In Functions
- Data Management
- Data Storage and Performance
- Working with Multiple Datasets
- Analytic Functions and Windowing
- Complex Data
- Analyzing Text
- Apache Hive Optimization
- Apache Impala Optimization
- Extending Hive and Impala
- Choosing the Best Tool for the Job
- CDP Public Cloud Data Warehouse
- Appendix: Apache Kudu