Welcome to Information Retrieval (IR) course! It is difficult to imagine living without search engines. Availability of big data has necessitated a systematic study of retrieval techniques. Principles and practices of information retrieval have been a focus of both researchers and practitioners alike. This course is not about just search engines. It is about dealing with big data and retrieving information which opens up interesting applications for information technology. This course will introduce students to key parts of IR such as indexing techniques, challenges in query processing and well-known retrieval models.
Key Learning Objectives
At the end of this course, you should be able to:
- Understand and apply text retrieval techniques to big data.
- Understand and apply text indexing techniques.
- Analyze and evaluate existing retrieval systems.
Lecture 1 - Boolean Retrieval
: Deadline Passed.
Lecture 2 - Index Construction and Evaluation
: To be announced.
Part 1: Content Processing and Indexing
Boolean Retrieval, Content Processing - Tokeniation, Lemmatization, Stop Words, Stemming, Normalization, Indexing - Index Construction, Index Compression - Zipf's law, Heap's Law, Posting Lists.
Part 2: Relevance and Retrieval Models
Term Weighting - TF-IDF, Vector Space Model, TF-IDF Variants.
Part 3: Evaluating and Improving Retrieval Systems
Evaluation, Relevance Feedback, Query Expansion.
Note that depending on the number of student registrations, the number of assignments, score distribution for assignments and project might change.
|Assignments 2, 3 (4% each)||8%|
Students must bring strong programming skills, preferrably in Java.
The project component is mandatory. Students may form groups of up to three. The objective is to learn by building a search engine yourself. Project will be evaluated in three parts:
- A two page technical report covering the approach.
- A live demo of the project.
- A 15 minute presentation.
- An Introduction to Information Retrieval. Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze.
- Search Engines: Information Retrieval in Practice. Bruce Croft, Donald Metzler, Trevor Strohman