Detailed Course Outline
Module 1: Introduction to Hadoop and HBase
- What Is Big Data?
- Introducing Hadoop
- Hadoop Components
- What Is HBase?
- Why Use HBase?
- Strengths of HBase
- HBase in Production
- Weaknesses of HBase
Module 2: HBase Tables
- HBase Concepts
- HBase Table Fundamentals
- Thinking About Table Design
Module 3: The HBase Shell
- Creating Tables with the HBase Shell
- Working with Tables
- Working with Table Data
Module 4: HBase Architecture Fundamentals
- HBase Regions
- HBase Cluster Architecture
- HBase and HDFS Data Locality
Module 5: HBase Schema Design
- General Design Considerations
- Application-Centric Design
- Designing HBase Row Keys
- Other HBase Table Features
Module 6: Basic Data Access with the HBase API
- Options to Access HBase Data
- Creating and Deleting HBase Tables
- Retrieving Data with Get
- Retrieving Data with Scan
- Inserting and Updating Data
- Deleting Data
Module 7: More Advanced HBase API Features
- Filtering Scans
- Best Practices
- HBase Coprocessors
Module 8: HBase on the Cluster
- How HBase Uses HDFS
- Compactions and Splits
Module 9: HBase Reads and Writes
- How HBase Writes Data
- How HBase Reads Data
- Block Caches for Reading
Module 10: HBase Performance Tuning
- Column Family Considerations
- Schema Design Considerations
- Configuring for Caching
- Dealing with Time Series and Sequential Data
- Pre-Splitting Regions
Module 11: HBase Administration and Cluster Management
- HBase Daemons
- ZooKeeper Considerations
- HBase High Availability
- Using the HBase Balancer
- Fixing Tables with hbck
- HBase Security
Module 12: HBase Replication and Backup
- HBase Replication
- HBase Backup
- MapReduce and HBase Clusters
Module 13: Using Hive and Impala with HBase
- Using Hive and Impala with HBase
Module 14: Appendix A: Accessing Data with Python and Thrift
- Thrift Usage
- Working with Tables
- Getting and Putting Data
- Scanning Data
- Deleting Data
- Counters
- Filters