Big Data on AWS

7 Labs · 85 Credits · 7h 15m

Use Case (Experienced) 9 big data on aws option 02

This quest is designed to teach you how to work with AWS services to perform big data analytics on the cloud.

Working with Amazon DynamoDB

[IMPORTANT: This lab requires you to use or create a Twitter account and application and use it's credentials in the lab in order to pull data into Amazon DynamoDB. Please review the Lab Guide for Twitter account instructions before starting this lab.] This lab introduces Amazon DynamoDB and walks you through basic operations such as creating, updating, querying, and deleting tables in Amazon DynamoDB. It will also show you how to change the provisioned throughput of the tables and see how that is reflected in the application. Note: This lab may take 10-11 minutes to setup and start. Please start to work on the first two exercises while the lab is building.

Icon  advanced advanced 10 Credits 45 Minutes

Launching Amazon EC2 Spot Instances with Auto Scaling and Amazon CloudWatch

This lab demonstrates how to launch Amazon EC2 Spot Instances to accelerate completion of tasks performed by existing On-Demand instances. It covers how to set up Auto Scaling to launch Spot Instances when the Spot price is low, incorporate CloudWatch metrics to monitor your instances, and scale down and terminate your Spot Instances when your job is complete.

Icon  advanced advanced 10 Credits 45 Minutes

Working with Amazon Redshift

The lab demonstrates how to use Amazon RedShift to create a cluster, load data, run queries and monitor performance. Note: Students will download a free SQL client as part of this lab.

Icon  advanced advanced 10 Credits 45 Minutes

Launching GeoServer on AWS

This lab will provide an introduction to running a Geospatial data server on AWS infrastructure. For this lab we will leverage the GeoServer product. GeoServer is a Java based, open source software server that allows users to view and edit geospatial data. It leverages open standards from the Open Geospatial Consortium (OGC) to facilitate flexible data sharing of geospatial information. The lab leads you through the steps to launch and configure an Ubuntu Linux virtual machine in the Amazon cloud. You will install GeoServer on this instance and load a dataset into to server. Prerequisites: To successfully complete this lab, you should be familiar with basic Linux server administration and comfortable using the Linux command-line tools. Some familiarity with database fundamentals and geospatial tools would be an advantage.

Icon  advanced advanced 10 Credits 50 Minutes

Exploring Google Ngrams with Amazon EMR

This lab demonstrates how to launch an Amazon Elastic MapReduce (EMR) cluster for Big Data processing and use Hive with SQL-style queries to analyze data. You will create a Hadoop cluster using Amazon EMR which will allow to run interactive Hive queries against data stored in Amazon S3. You will use Hive to normalize the data in a more useful way, and you will run queries to analyze the data.

Icon  expert expert 15 Credits 1 Hour

Building Real-Time Dashboards with Amazon Kinesis Dynamic Aggregators

This lab demonstrates how to create a variety of Analytic Dashboards which are continuously updated via the Amazon Kinesis Aggregators framework. You will learn how to create Amazon Kinesis Streams, how to create real-time aggregated datasets with Amazon Kinesis Aggregators and learn how to interact with this data using Amazon CloudWatch and custom dashboarding tools.

Icon  expert expert 15 Credits 55 Minutes

Advanced Amazon Redshift: Analytics and Amazon Machine Learning

In this lab, you will build a smart solution using Amazon Redshift and Amazon Machine Learning that predicts delays for flights originating in Chicago’s O’Hare international airport. You will learn how to analyze large amounts of data using Redshift. Then you will practice using Machine Learning to create a model that will predict flight delays. Prerequisites: To successfully complete this lab, you should be familiar with Redshift concepts by taking the introductory lab at Some knowledge of SQL and Python programming is required, although full solution code is provided. You should be comfortable using RDP to connect to a Windows server and using SQL client software. You should have at a minimum taken the “Introduction to Amazon Redshift” and “Introduction to Machine Learning” labs at Note: this lab must run (currently) in us-east-1 for the Machine Learning service. Be sure to check in the AWS console that you are running in us-east-1 (N. Virginia) and change to us-east-1 if necessary.

Icon  expert expert 15 Credits 1 Hour 45 Minutes