Distributed Computing and Big Data
Instructor: Venkatesh Vinayakarao
Term: Jan - Apr 2025
TA: Aniket Tiwari, Rohit Roy



Welcome to Distributed Computing and Big Data course! Massive increase in the availability of data has made the storage, management, and analysis extremely challenging. Various tools, technologies and frameworks have surfaced to help address this challenge. Apache Hadoop is one such framework that enables us to handle big data by making distributed computing easier. Concerns such as reliability, distributed file management and distributed processing are abstracted from us by hadoop. In this course, we shall start with understanding the characteristics of big data and the fundamental concepts of cloud computing. We will explore the hadoop ecosystem. Specifically, we will explore HDFS, Map-Reduce, Pig and NoSQL DB. Our objective is to understand how big data can be effectively handled. We will also briefly discuss web applications development, with a special focus on RESTful services. This is an introductory course focused on the breadth of the big data landscape.

Key Learning Objectives

At the end of this course, you should be able to:
  • Understand the fundamentals of distributed storage using Hadoop HDFS as an example.
  • Understand distributed processing fundamentals using map-reduce framework and pig scripts.
  • Understand NoSQL DB concepts using MongoDB and/or HBase.
  • Understand web services.

Lecture Schedule

Lecture #TopicReadingsSlides/Material
Part 1: Introduction
1Introduction to Big DataThe Complete Beginner's Guide To Big Data Everyone Can Understand
Basics About Cloud Computing
Lecture 1
Lecture 2
Lecture 3
2Hands-On Tutorial: A Tour of Big Data Stack with Cloudera VMCDH OverviewTutorial 1
3Distributed File SystemsFiles and Directories
File System
The Hadoop Distributed File System
Lecture 4
Lecture 4.1 (DC Model)
DFS (Lecture 4 Updated)
4Hands-On Tutorial: HDFSExploring the File System
HDFS Tutorial (old)
Tutorial 2
5Distributed Processing with Map-Reduce and PigOverview of Map-Reduce
Pig-Latin
Lecture 5
6Hands-On Tutorial: Map-ReduceTutorial 3
7Introduction to OOAD and UMLLecture 6
Lecture 7
Tut4-JavaCode
8Big Data Design PatternsChapter 1 from Thinking in Patterns
Map-Reduce Design Patterns
Lecture 8
9Apache PigLecture 9
10Hands-On Tutorial: Map-Reduce and PigMapReduce Tutorial
Pig Tutorial
Tutorial 5: MR [zip], [video], Pig [zip]
11NoSQL DBChapter 4 from BDA Book
Columnar Storage
NoSQL Explained
Lecture 10
12Hands-On Tutorial: MongoDB, HBaseMongoDB CRUD OperationsTut6-MongoDB
Exercise
Tut6-HBase
13Graph DB with Neo4jLecture 11 Neo4j Commands
14Web Application Development and Service Oriented ArchitectureWeb Application Development
Sections 1, 2 and 3 of Web Services
Lecture 12
Notes
Video
15Hands-On Tutorial: Apache Tomcat, JSON, RESTful ServicesBuilding Web Applications with Tomcat
RESTful Services
Hive and Solr (Not in syllabus)
Tut7-WS
Tut7-Code
Video
16Big Data - System Design, Products and PracticesLecture 13


Evaluation
InstrumentMax Marks
Mid Exam25%
Final Exam35%
Assignment (4*10%)40%

Pre-requisites
None.

Resources

Text
There is no prescribed text for this course.

References Optional Readings


If you are not having fun, you are not the best student you can be!