|Instructor||: Venkatesh Vinayakarao|
|Term||: Aug - Dec 2018|
|Teaching Assistant||: Srinivas P Y K L|
Welcome to Information Retrieval (IR) course! It is difficult to imagine living without search engines. Availability of big data has necessitated a systematic study of retrieval techniques. Principles and practices of information retrieval have been a focus of both researchers and practitioners alike. This course is not about just search engines. It is about dealing with big data and retrieving information which opens up interesting applications for information technology. This course will introduce students to key parts of IR such as indexing techniques, challenges in query processing and well-known retrieval models.
Key Learning Objectives
At the end of this course, you should be able to:
- Understand and apply text retrieval techniques to big data.
- Understand and apply text indexing techniques.
- Analyze and evaluate existing retrieval systems.
|1||Boolean Retrieval||Chapter 1 from CPS||Lecture 1|
|2||Content Processing and Evaluation||Chapter 2 from CPS||Lecture 2|
|3||Handling Dictionaries||Chapter 3 from CPS||Lecture 3|
|4||Index Construction and Index Compression||Chapter 4,5 from CPS||Lecture 4|
|5||Musings from the Real World||None.||Lecture 5|
|6||Implementing a Search Engine with Lucene||None.||Tutorial 1 - Lucene|
LuceneDemo: Import into eclipse as an archive file.
|7||Stemming - Porter/Snowball||Snowball Stemmer|
|Tutorial 2 - Manning's Slides on Stemming|
StemmingDemo.zip: Import into eclipse as an archive file.
|8||Vector Space Model||Chapter 6 of CPS||Lecture 7 - Bonus Task and Mock Mid-Term|
Mock Mid-Term Question Paper
TF-IDF and Variants of TF-IDF
|Chapter 6,7 of CPS||Lecture 8|
|10||Revisiting Indexes: Forward, Inverted, Positional, Permuterm and k-gram Indexes|
Revisiting Query Processing:Query processing order for boolean queries
Surprise Test 1
Surprise Test 2
|11||Zones and Fields, Term Weights |
Revisiting Dictionary Compression
|Chapter 6 of CPS||Lecture 12|
|12||IR Evaluation||Chapter 8 of CPS||Lecture 17|
|13||Relevance Feedback||Chapter 9 of CPS||Lecture 15|
|14||Pseudo-Relevance Feedback||Chapter 9 of CPS||Lecture 16|
|15||Probabilistic Retrieval||Chapter 11 of CPS||Lecture 18|
|16||Language Models for IR|
Revisiting Cloud Computing and Distributed Indexing: Hadoop, Map-Reduce, Pig
|Chapter 12 of CPS||Lecture 20|
|17||Advanced Topics in IR (Web Basics, Web Crawling, Link Analysis)||Chapter 19,20,21 of CPS||Lecture 21|
|Assignments 2, 3 (4% each)||8%|
Students must bring strong programming skills, preferrably in Java.
The project component is mandatory. Students may form groups of up to three. The objective is to learn by building a search engine yourself. Project will be evaluated in three parts:
- A two page technical report covering the approach.
- A live demo of the project.
- A 15 minute presentation.
- [CPS] An Introduction to Information Retrieval. Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze.
- [BDT] Search Engines: Information Retrieval in Practice. Bruce Croft, Donald Metzler, Trevor Strohman
If you are not
having fun, you are not the best student you can be!