SI760-001, LING792-004, EECS597-001
http://tangra.si.umich.edu/~radev/760
Fall 2002
Fridays, 2-5 PM (two 75-minute lectures)
409 West Hall
A survey of techniques used in language studies and information processing. Students will learn how to explore and analyze textual data in the context of Web-based information retrieval systems. At the conclusion of the course, students will be able to work as information designers and analysts.
Each class represents a 75 minute lecture.
1. The study of Language. Linguistic Fundamentals.
2. Mathematical and Probabilistic Fundamentals. Descriptive
Statistics. Measures of central tendency. The z score. Hypothesis
testing.
3. Information theory. Entropy, joint entropy, conditional
entropy. Relative entropy and mutual information. Chain rules. The
entropy of English.
4. Working with corpora. N-grams.
5. Language models. Noisy channel models. Hidden Markov Models.
6. Cluster analysis. Clustering of terms according to semantic
similarity. Distributional clustering.
7. Collocations. Syntactic criteria for collocability.
8. Literary detective work. The statistical analysis of writing
style. Decipherment and translation.
9. Information Retrieval
10. Text summarization. Single-document summarization. Multi-document
summarization. Maximal Marginal Relevance. Cross-document
structure theory. Trainable methods.
11. Information Extraction. Message understanding.
12. Question Answering. Semantic representation. Predictive annotation.
13. Word sense disambiguation and lexical acquisition. Supervised
disambiguation. Unsupervised disambiguation. Attachment
ambiguity. Computational lexicography.
14. Other topics. Text alignment. Word alignment. Statistical machine
translation. Statistical text generation. Discourse
segmentation. Text categorization.
Required books:
Reference readings:
A small number of articles will be assigned to complement the major readings. These articles will be primarily from ACL, AAAI, SIGIR proceedings and/or the following journals: Computational Linguistics, Information Retrieval, Artificial Intelligence.