SI/EECS 767
Advanced Natural Language Processing and Information Retrieval
Winter 2006
Wednesdays 9-12 AM in SI North (North Campus)
Instructor: Dragomir R. Radev
The course will be based on weekly reading assignments. There will be
no exams. Each week, one of the students in the class will present
background papers related to the weekly topic, then we will be all
discussing the 2-3 assigned papers (marked with asterisks) for the
week.
Tentative course syllabus
(not in order)
- Document models
-
(*) Jay M. Ponte and W. Bruce Croft. A language modelling approach to information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 275-281, 1998. ACM Press.
[69]
-
Abraham Bookstein and Don Swanson. Probabilistic models for automatic indexing. Journal of the American Society for Information Science, 25(5):312-318, September/October 1974.
[11]
-
Xiaoyong Liu and W. Bruce Croft. Passage retrieval based on language models. In Proceedings of the 11th International Conference on Information and Knowledge Management, pages 375-382, 2002. ACM Press.
[48]
-
(*) Kenneth W. Church. Empirical estimates of adaptation: The chance of two Noriegas is closer to p/2 than p2. In Proceedings of the 18th International Conference on Computational Linguistics, pages 180-186, 2000. Association for Computational Linguistics.
[17]
-
Kenneth W. Church and William Gale. Poisson mixtures. Natural Language Engineering, 1(2):163-190, 1995.
[18]
-
(*) Chengxiang Zhai and John D. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems, 22(2):179-214, April 2004.
[95]
-
Stephen E. Robertson and S. Walker. Some simple effective approximation to the 2-Poisson model for probabilistic weighted retrieval. In W. Bruce Croft and C. J. Van Rijsbergen, editors, Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 232-241, 1994. Springer-Verlag.
[76]
-
John D. Lafferty and Chengxiang Zhai. Document language models, query models, and risk minimization for informational retrieval. In Proceedings of the 24th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 111-119, 2001. ACM Press.
[44]
- Maximum entropy methods
-
(*) Adam Berger, Stephen A. Della Pietra, and Vincent Della Pietra. A maximum entropy approach to natural language processing. Computational Linguistics, 22(1):39-71, March 1996.
[5]
-
Adwait Ratnaparkhi. Learning to parse natural language with maximum entropy models. Machine Learning, 34(1-3):151-175, 1999.
[74]
-
Adam Berger. A brief Maxent tutorial. [4]
-
(*) Adwait Ratnaparkhi. A simple introduction to maximum entropy models for natural language processing. Technical Report, Institute for Research in Cognitive Science, University of Pennsylvania, 1997. Technical Report # IRCS-97-08.
[72]
- Graph min-cuts
-
(*) Avrim Blum and Shuchi Chawla. Learning from labeled and unlabeled data using graph mincuts. In Proceedings of the 18th International Conference on Machine Learning, pages 19-26, 2001. Morgan Kaufmann.
[9]
-
(*) Bo Pang and Lillian Lee. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd Meeting of the Association for Computational Linguistics, pages 271-278, 2004. MIT Press.
[67]
-
Gary William Flake, Kostas Tsioutsiouliklis, and Robert E. Tarjan. Clustering methods based on minimum cut trees. Technical Report 2002-06, NEC Research Institute, Princeton University, New Jersey, 2002.
[26]
-
(*) Hongyuan Zha, Xiaofeng He, Chris Ding, Horst Simon, and Ming Gu. Bipartite graph partitioning and data clustering. In Proceedings of the 10th International Conference on Information and Knowledge Management, pages 25-32, 2001. ACM Press.
[94]
- Semi-supervised and transductive learning
-
(*) Avrim Blum and Tom Mitchell. Combining labeled and unlabeled data with co-training. In Proceedings of the 11th Annual Conference on Computational Learning Theory, pages 19-26, 2001. Morgan Kaufmann.
[10]
-
(*) Thorsten Joachims. Transductive learning via spectral graph partitioning. In Tom Fawcett and Nina Mishra, editors, Proceedings of the 20th International Conference on Machine Learning, pages 290-297, 2003. AAAI Press.
[36]
-
David Yarowsky. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd Meeting of the Association for Computational Linguistics, pages 189-196, 1995. Morgan Kaufmann.
[92]
-
David Yarowsky. Decision lists for lexical ambiguity resolution: Application to accent restoration in Spanish and French. In Proceedings of the 32nd Meeting of the Association for Computational Linguistics, pages 88-95, 1994. Morgan Kaufmann.
[91]
- Statistical machine translation
-
(*) Peter F. Brown, John Cocke, Stephen A. Della Pietra, Vincent J. Della Pietra, Frederick Jelinek, John D. Lafferty, Robert L. Mercer, and Paul S. Roossin. A statistical approach to machine translation. Computational Linguistics, 16(2):79-85, June 1990.
[13]
-
(*) Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263-311, June 1993.
[14]
-
Franz Josef Och. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Meeting of the Association for Computational Linguistics, pages 160-167, 2003. MIT Press.
[63]
-
Franz Josef Och. GIZA++: Training of statistical translation models, 2000.
[61]
-
Franz Josef Och and Hermann Ney. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the 40th Meeting of the Association for Computational Linguistics, pages 295-302, 2002. MIT Press.
[65]
-
(*) Franz Josef Och and Hermann Ney. The alignment template approach to statistical machine translation. Computational Linguistics, 30(4):417-449, December 2004.
[66]
-
Yaser Al-Onaizan, Jan Curin, Michael Jahr, Kevin Knight, John D. Lafferty, Dan Melamed, Franz Josef Och, David Purdy, Noah A. Smith, and David Yarowsky. Statistical machine translation: Final report. Technical report, The Center for Language and Speech Processing, John Hopkins University, Baltimore, Maryland, USA, September 1999.
[1]
-
Kevin Knight. A statistical machine translation tutorial workbook. Online Tutorial, August 1999.
[40]
-
Ulrich Germann, Michael Jahr, Kevin Knight, Daniel Marcu, and Kenji Yamada. Fast decoding and optimal decoding for machine translation. In Proceedings of the 39th Meeting of the Association for Computational Linguistics, pages 228-235, 2001. MIT Press.
[27]
- Syntax for MT
-
Franz Josef Och, Daniel Gildea, Sanjeev Khudanpur, Anoop Sarkar, Kenji Yamada, Alex Fraser, Shankar Kumar, Libin Shen, David Smith, Katherine Eng, Viren Jain, Zhen Jin, and Dragomir Radev. A smorgasbord of features for statistical machine translation. In Proceedings of the Human Language Technology Conference.North American chapter of the Association for Computational Linguistics Annual Meeting, pages 161-168, 2004. MIT Press.
[64]
-
(*) Kenji Yamada and Kevin Knight. A syntax-based statistical translation model. In Proceedings of the 39th Meeting of the Association for Computational Linguistics, pages 523-530, 2001. MIT Press.
[89]
-
(*) Philipp Koehn, Franz Josef Och, and Daniel Marcu. Statistical phrase-based translation. In Proceedings of the Human Language Technology Conference/North American Chapter of the Association for Computational Linguistics Annual Meeting, 2003. MIT Press.
[41]
-
Kenji Yamada and Kevin Knight. A decoder for syntax-based statistical machine translation. In Proceedings of the 40th Meeting of the Association for Computational Linguistics, pages 303-310, 2002. MIT Press.
[90]
-
Eugene Charniak, Kevin Knight, and Kenji Yamada. Syntax-based language models for statistical machine translation. In Proceedings of the 8th International Workshop on Parsing Technologies (IWPT '03), April 23-25, 2003.
[16]
- Text classification
-
(*) Thorsten Joachims. Text categorization with support vector machines: Learning with many relevant features. In Claire Nedellec and Celine Rouveirol, editors, Proceedings of the 10th European Conference on Machine Learning, volume 1398 of Lecture Notes in Computer Science, pages 137-142. Springer-Verlag, 1998.
[34]
-
(*) Thorsten Joachims. A statistical learning model of text classification for support vector machines. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, volume 24, pages 128-136, 2001. ACM Press.
[35]
-
(*) Fabrizio Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys, 34(1):1-47, March 2002.
[77]
-
(*) Andrew McCallum and Kamal Nigam. A comparison of event models for naive Bayes text classification. In Mehran Sahami, editor, Proceedings of the AAAI-98 Workshop on Learning for Text Categorization, pages 41-48, 1998. AAAI Press.
[51]
-
(*) Tong Zhang and Frank J. Oles. A probability analysis on the value of unlabeled data for classification problems. In Pat Langley, editor, Proceedings of the 17th International Conference on Machine Learning, pages 1191-1198, 2000. Morgan Kaufmann.
[96]
- Spectral methods
-
(*) Sepander D. Kamvar, Dan Klein, and Christopher D. Manning. Spectral learning. In Georg Gottlob and Toby Walsh, editors, Proceedings of the 18th International Joint Confernece on Artificial Intelligence, pages 561-566, 2003. Morgan Kaufmann.
[37]
-
(*) Inderjit S. Dhillon. Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 269-274, 2001. ACM Press.
[22]
-
(*) Hongyuan Zha. Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering. In Proceedings of the 25th Annual International ACM SIGIR COnference on Research and Development in Information Retrieval, pages 113-120, 2002. ACM Press.
[93]
- Web modeling
-
(*) Filippo Menczer. Growing and navigating the small world web by local content. Proceedings of the National Academy of Sciences of the United States of America, 99(22):14014-14019, October 29, 2002.
[54]
-
(*) Alex Fabrikant, Elias Koutsoupias, and Christos H. Papadimitriou. Heuristically optimized trade-offs: A new paradigm for power laws in the internet. In Peter Widmayer, Francisco Triguero Ruiz, Rafael Morales Bueno, Matthew Hennessy, Stephen Eidenbenz, and Ricardo Conejo, editors, Proceedings of the 29th International Colloquium on Automata, Languages and Programming, volume 2380 of Lecture Notes in Computer Science, pages 110-122, 2002. Springer.
[25]
-
Filippo Menczer. Evolution of document networks. Proceedings of the National Academy of Sciences of the United States of America, 101(1):5261-5265, April 6, 2004.
[55]
- Random walks
-
(*) Kristina Toutanova, Christopher D. Manning, and Andrew Y. Ng. Learning random walk models for inducing word dependency distributions. In Carla E. Brodley, editor, Proceedings of the 21st International Conference on Machine Learning, 2004. ACM Press.
[84]
-
(*) Gunes Erkan and Dragomir Radev. LexRank: Graph-based lexical centrality as salience in text summarization. Journal of Artifical Intelligence Research, 22:457-479, 2004.
[24]
- Information extraction
-
(*) Daniel M. Bikel, Richard Schwartz, and Ralph M. Weischedel. An algorithm that learns what's in a name. Machine Learning, 34(1-3):211-231, 1999.
[8]
-
Kristie Seymore, Andrew McCallum, and Roni Rosenfeld. Learning hidden Markov model structure for information extraction. In Mary Elaine Califf, editor, Proceedings of the AAAI-99 Workshop on Machine Learning for Information Extraction, 1999. AAAI Press.
[78]
- Conditional random fields
-
(*) Stephen Della Pietra, Vincent J. Della Pietra, and John D. Lafferty. Inducing features of random fields. IEEE Transactions Patterns Analysis and Machine Intelligence, 19(4):380-393, April 1997.
[68]
-
(*) John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Carla E. Brodley and Andrea Pohoreckyj Danyluk, editors, Proceedings of the 18th International Conference on Machine Learning, pages 282-289, 2001. Morgan Kaufmann.
[45]
- Summarization/paraphrasing/textual entailment
- Web as corpus
-
(*) Frank Keller, Maria Lapata, and Olga Ourioupina. Using the web to overcome data sparseness. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing, 2002.
[38]
-
(*) Mirella Lapata and Frank Keller. The web as a baseline: Evaluating the performance of unsupervised web-based models for a range of NLP tasks. In Proceedings of the Human Language Technology Conference/North American Chapter of the Association for Computational Linguistics Annual Meeting, pages 121-128, 2004.
[46]
- Graph-based learning
-
Risi Imre Kondor and John D. Lafferty. Diffusion kernels on graphs and other discrete input spaces. In Claude Sammut and Achim G. Hoffmann, editors, Proceedings of the 19th International Conference on Machine Learning, pages 315-322, 2002. Morgan Kaufmann.
[42]
-
(*) Xiaojin Zhu. Semi-Supervised Learning with Graphs. Ph.D. Thesis, May, 2005.
[99]
-
Martin Szummer and Tommi S. Jaakkola. Kernel expansions with unlabeled examples. In Todd K. Leen, Thomas G. Dietterich, and Volker Tresp, editors, Proceedings of the 14th Conference on Advances in Neural Information Processing Systems, NIPS13, pages 626-632, 2000. MIT Press.
[82]
-
Martin Szummer and Tommi S. Jaakkola. Partially labeled classification with Markov random walks. In Thomas G. Dietterich, Suzanna Becker, and Zoubin Ghahramani, editors, Proceedings of the 15th Conference on Advances in Neural Information Processing Systems, NIPS14, pages 945-952, 2001. MIT Press.
[83]
- Web ranking methods
-
Andrew Y. Ng, Alice X. Zheng, and Michael I. Jordan. Link analysis, eigenvectors and stability. In Bernhard Nebel, editor, Proceedings of the 17th International Joint Conference on Artificial Intelligence, pages 903-910, 2001. Morgan Kaufmann.
[57]
-
Sergey Brin and Lawrence Page. The anatomy of a large-scale hypertextual Web search engine. In Proceedings of the 7th International Conference on the World Wide Web, pages 107-117, 1998. Elsevier Science B. V.
[12]
-
Arvind Arasu, Jasmine Novak, Andrew Tomkins, and John Tomlin. PageRank computation and the structure of the web: Experiments and algorithms. In Proceedings of the 11th International Conference on the World Wide Web, 2002. ACM Press.
[2]
-
(*) Monica Bianchini, Marco Gori, and Franco Scarselli. Inside PageRank. ACM Transactions on Internet Technology, 5(1):92-128, 2002. ACM Press.
[6]
-
(*) David Cohn and Huan Chang. Learning to probabilistically identify authoritative documents. In Pat Langley, editor, Proceedings of the 17th International Conference on Machine Learning, pages 167-174, 2000. Morgan Kaufmann.
[19]
-
Cynthia Dwork, Ravi Kumar, Moni Naor, and D. Sivakumar. Rank aggregation methods for the Web. In Proceedings of the 10th International Conference on the World Wide Web, pages 613-622, 2001. ACM Press.
[23]
-
Alberto O. Mendelzon and Davood Rafiei. What do the neighbours think? Computing web page reputations. IEEE Data Engineering Bulletin, 23(3):9-16, 2000.
[417]
- Latent semantic indexing
-
Scott C. Deerwester, Susan T. Dumais, Thomas K. Landauer, George W. Furnas, and Richard A. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391-407, September 1990.
[21]
-
(*) Thomas Hofmann. Probabilistic latent semantic indexing. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 50-57, 1999. ACM Press.
[31]
-
(*) Thomas Hofmann. Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 42(1-2):177-196, 2001.
[32]
-
Thomas Hofmann, Jan Puzicha, and Michael I. Jordan. Learning from dyadic data. In Sara A. Solla, Todd K. Leen, and Klaus-Robert Muller, editors, Proceedings of the 13th Conference on Advances in Neural Information Processing Systems, NIPS12, pages 466-472, 1999. MIT Press.
[33]
-
(*) David Cohn and Thomas Hofmann. The missing link - a probabilistic model of document content and hypertext conectivity. In Todd K. Leen, Thomas G. Dietterich, and Volker Tresp, editors, Proceedings of the 14th Conference on Advances in Neural Information Processing Systems, NIPS13, pages 430-436, 2000. MIT Press.
[20]