In this course, you will learn concepts, strategies, and best practices for designing a cloud-based data warehousing solution using Amazon Redshift, the petabyte-scale data warehouse in AWS. We will demonstrate how to collect, store, and prepare data for the data warehouse by using other AWS services, such as Amazon DynamoDB, Amazon EMR, Amazon Kinesis Firehose, and Amazon Simple Storage Service (Amazon S3). We will also explore how to use business intelligence (BI) tools to perform analysis on your data.
Module 1: Introduction to Data Warehousing
- Relational databases
- Data warehousing concepts
- The intersection of data warehousing and big data
- Verview of data management in AWS
- Hands-on lab: Introduction to Amazon Redshift
Module 2: Introduction to Amazon Redshift
- Conceptual overview
- Real-world use cases
- Interactive demo: Touring the Amazon Redshift console
- Hands-on lab: Launching an Amazon Redshift cluster
- RA3 Nodes and AQUA architecture
- Amazon Redshift ML
Module 3: Launching clusters
- Building the cluster
- Connecting to the cluster
- Controlling access
- Database security
- Load data
- Practice lab: Load and query data in an Amazon Redshift cluster
- Optional lab: Launching an Amazon Redshift Cluster
Module 4: Designing the database schema
- Schemas and data types
- Columnar compression
- Data distribution styles
- Data sorting methods
- Hands-on lab: Optimizing database schemas
Module 5: Identifying data sources
- Data sources overview
- Amazon S3
- Amazon DynamoDB
- Amazon EMR
- Amazon Kinesis Data Firehose
- AWS Lambda Database Loader for Amazon Redshift
- Redshift Data API
- SUPER Data Type
- Interactive demo: Connecting your Amazon Redshift cluster using a Jupyter notebook with Data API
- Interactive demo: Analyzing semi-structured data using the SUPER data type
- Hands-on lab: Loading real-time data into an Amazon Redshift database
Module 6: Loading data
- Preparing Data
- Data Warehousing on AWS
- Loading data using COPY
- Maintaining tables
- Concurrent write operations
- Troubleshooting load issues
- Hands-on lab: Loading data with the COPY command
Module 7: Writing queries and tuning for performance
- Amazon Redshift SQL
- User-Defined Functions (UDFs)
- Factors that affect query performance
- The EXPLAIN command and query plans
- Workload Management (WLM)
- Interactive demo: Applying mixed workload management on Amazon Redshift
- Hands-on lab: Configuring workload management
Module 8: Amazon Redshift Spectrum
- Amazon Redshift Spectrum
- Configuring data for Amazon Redshift Spectrum
- Amazon Redshift Spectrum Queries
- Data Transformation
- Data Sharing
- Practice lab: Data analytics using Amazon Redshift Spectrum
- Practice lab: Data transformation and querying in Amazon Redshift
- Hands-on lab: Using Amazon Redshift Spectrum
Module 9: Maintaining clusters
- Audit logging
- Performance monitoring
- Events and notifications
- Hands-on lab: Auditing and monitoring clusters
- Resizing clusters
- Backing up and restoring clusters
- Resource tagging and limits and constraints
- Hands-on lab: Backing up, restoring and resizing clusters
- Optional: Analyzing and Visualizing Data
Module 10: Analyzing and visualizing data
- Power of visualizations
- Building dashboards
- Amazon QuickSight editions and features
This course is intended for the following job roles:
We recommend that attendees of this course have the following prerequisites:
- Familiarity with relational databases and database design concepts