Information Retrieval
Instructor: Venkatesh Vinayakarao
Term: Aug - Dec 2018
Teaching Assistant: Srinivas P Y K L

Welcome to Information Retrieval (IR) course! It is difficult to imagine living without search engines. Availability of big data has necessitated a systematic study of retrieval techniques. Principles and practices of information retrieval have been a focus of both researchers and practitioners alike. This course is not about just search engines. It is about dealing with big data and retrieving information which opens up interesting applications for information technology. This course will introduce students to key parts of IR such as indexing techniques, challenges in query processing and well-known retrieval models.

Key Learning Objectives

At the end of this course, you should be able to:
  • Understand and apply text retrieval techniques to big data.
  • Understand and apply text indexing techniques.
  • Analyze and evaluate existing retrieval systems.

Lecture Resources

Lecture #TopicReadingsSlides/Material
1Boolean RetrievalChapter 1 from CPSLecture 1
Assignment 1
2Content Processing and EvaluationChapter 2 from CPSLecture 2
3Handling DictionariesChapter 3 from CPSLecture 3
4Index Construction and Index CompressionChapter 4,5 from CPSLecture 4
5Musings from the Real WorldNone.Lecture 5
6Implementing a Search Engine with LuceneNone.Tutorial 1 - Lucene
LuceneDemo: Import into eclipse as an archive file.
7Stemming - Porter/SnowballSnowball Stemmer
Porter Algorithm
Tutorial 2 - Manning's Slides on Stemming Import into eclipse as an archive file.
Lecture 6
8Vector Space ModelChapter 6 of CPSLecture 7 - Bonus Task and Mock Mid-Term
Mock Mid-Term Question Paper
Assignment 2
9Term Weighting
TF-IDF and Variants of TF-IDF
Chapter 6,7 of CPSLecture 8
Lecture 9
10Revisiting Indexes: Forward, Inverted, Positional, Permuterm and k-gram Indexes
Revisiting Query Processing:Query processing order for boolean queries
Lecture 10
Surprise Test 1
Lecture 11
Surprise Test 2
Mid Term
11Zones and Fields, Term Weights
Revisiting Dictionary Compression
Chapter 6 of CPSLecture 12
Lecture 13
Lecture 14
12IR EvaluationChapter 8 of CPSLecture 17
13Relevance FeedbackChapter 9 of CPSLecture 15
14Pseudo-Relevance FeedbackChapter 9 of CPSLecture 16
Assignment 3
15Probabilistic RetrievalChapter 11 of CPSLecture 18
Lecture 19
16Language Models for IR
Revisiting Cloud Computing and Distributed Indexing: Hadoop, Map-Reduce, Pig
Chapter 12 of CPSLecture 20
17Advanced Topics in IR (Web Basics, Web Crawling, Link Analysis)Chapter 19,20,21 of CPSLecture 21
Final Exam

InstrumentMax Marks
Final Exam35%
Assignment 12%
Assignments 2, 3 (4% each)8%

Students must bring strong programming skills, preferrably in Java.

The project component is mandatory. Students may form groups of up to three. The objective is to learn by building a search engine yourself. Project will be evaluated in three parts:
  • A two page technical report covering the approach.
  • A live demo of the project.
  • A 15 minute presentation.

  • [CPS] An Introduction to Information Retrieval. Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze.
  • [BDT] Search Engines: Information Retrieval in Practice. Bruce Croft, Donald Metzler, Trevor Strohman

If you are not having fun, you are not the best student you can be!