Fifth Annual Symposium on Document Analysis and Information Retrieval April 15 - 17, 1996 Alexis Park Resort Las Vegas, Nevada Sponsored by the Information Science Research Institute and The Howard R. Hughes College of Engineering University of Nevada, Las Vegas Symposium Chair Henry S. Baird, AT&T Bell Laboratories Invited Speakers Hans-Peter Frei, Union Bank of Switzerland Michael Lesk, Bellcore Juergen Schuermann, Daimler Benz Research Center Debate Teams Henry S. Baird, AT&T Bell Laboratories Robert Haralick, University of Washington Daniel Lopresti, Panasonic Technologies, Inc. George Nagy, Rensselaer Polytechnic Institute Document Analysis Committee Andreas Dengel, Chair, German Research Center for Artificial Intelligence (DFKI) Norbert Bartneck, Daimler Benz Research Center Hiromichi Fujisawa, Hitachi Central Research Lab Jonathan Hull, Ricoh California Research Center Junichi Kanai, University of Nevada, Las Vegas Larry Spitz, Daimler Benz Research Center Suzanne Taylor, Loral Research Laboratory Karl Tombre, INRIA Lorraine Information Retrieval Committee Jan Pedersen, Chair, Xerox Palo Alto Research Center Susan Dumais, Bellcore Stephen Gallant, Belmont Donna Harman, National Institute of Standards & Technology Marti Hearst, Xerox Palo Alto Research Center David Lewis, AT&T Bell Laboratories Peter Schauble, Swiss Federal Institute of Technology Kazem Taghva, University of Nevada, Las Vegas Yiming Yang, Mayo Clinic/Foundation Symposium Manager Debbie Wallace University of Nevada, Las Vegas Information Science Research Institute 4505 Maryland Parkway, Box 454021 Las Vegas, NV 89154-4021 (702)895-3338 fax:(702)895-1183 sdair@isri.unlv.edu CONFERENCE SCHEDULE Sunday, April 14, 1996 7:00pm - 10:00pm Alexis Park Resort Reception and Registration Monday, April 15, 1996 7:00am - 11:00am Alexis Park Resort Registration 8:15am - 8:30am Alexis Park Resort Welcome Henry S. Baird, Symposium Chair AT&T Bell Laboratories William R. Wells, Dean Howard R. Hughes College of Engineering University of Nevada, Las Vegas Kazem Taghva, Associate Director Information Science Research Institute University of Nevada, Las Vegas 8:30am - 9:15am Alexis Park Resort Invited Speaker Substituting Images for Books: Library Economics, Technology, and Politics Michael Lesk Bellcore 9:15am - 10:15am Alexis Park Resort Session 1 Maximum Spanning Trees for Text Segmentation Antonio P. Dias; Harvard University In-house Mail Distribution by Automatic Address and Content Interpretation Thomas Bruckner, Peter Suda, Hans Ulrich Block, Gerd Maderlechner; Siemens AG, Corporate Research and Development 10:15am - 10:30am Alexis Park Resort Refreshment Break 10:30am - 12:00pm Alexis Park Resort Session 2 USeg: A Retargetable Word Segmentation Procedure for Information Retrieval Jay M. Ponte, W. Bruce Croft; University of Massachusetts Text Categorization: A Symbolic Approach Isabelle Moulinier, *Gailius Raskinis, Jean-Gabriel Ganascia; University of Paris, *Vtautas Magnus University Support Tools for Visual Information Management Gokhan Kutlu, Bruce A. Draper, Eliot B. Moss, Edward M. Riseman; University of Massachusetts 12:00pm - 1:15pm Lunch Alexis Park Resort 1:15pm - 2:00pm Alexis Park Resort Invited Speaker Text Recognition - From Pixels to Meaning Juergen Schuermann Daimler Benz Research Center 2:00pm - 3:30pm Alexis Park Resort Session 3 Edit Distance of Regular Languages Horst Bunke; University of Bern Language Identification: Examining the Issues Penelope Sibun, *Jeffrey C. Reynar; Northwestern University, *University of Pennsylvania Fast Decision Tree Ensembles for Optical Character Recognition Harris Drucker; AT&T Bell Laboratories 3:30pm - 3:45pm Alexis Park Resort Refreshment Break 3:45pm - 5:15pm Alexis Park Resort Session 4 Length Normalization in Degraded Text Collections Amit Singhal, Gerard Salton, Chris Buckley; Cornell University Extraction of Thematically Relevant Text from Images Francine R. Chen, Dan S. Bloomberg; Xerox Palo Alto Research Center Measuring the Effects of Data Corruption on Information Retrieval Elke Mittendorf, Peter Schauble; Swiss Federal Institute of Technology (ETH) 6:00pm - 10:00pm Happy Hour Dinner Boyd Dining Room, Frank and Estella Beam Hall, William F. Harrah College of Hotel Adminstration, UNLV Tuesday, April 16, 1996 7:30am - 11:00am Alexis Park Resort Registration 8:00am - 8:45am Alexis Park Resort Invited Speaker Information Retrieval - From Academic Research to Practical Applications Hans-Peter Frei Union Bank of Switzerland 8:45am - 10:15am Alexis Park Resort Session 5 Keyword-Based Browsing and Analysis of Large Document Sets Ido Dagan, Ronen Feldman, *Haym Hirsh; Bar-Ilan University, *Rutgers University Tailoring a Retrieval System for Naive Users Adrienne J. Kleiboemer, Manette B. Lazear, *Jan O. Pedersen; MITRE Corporation, *Xerox Palo Alto Research Center Improving Full-Text Precision on Short Queries using Simple Constraints Marti A. Hearst; Xerox Palo Alto Research Center 10:15am - 10:30am Alexis Park Resort Refreshment Break 10:30am - 12:00pm Alexis Park Resort Session 6 Degraded Character Image Restoration John D. Hobby, Henry S. Baird; AT&T Bell Laboratories Automatically-Generated High-Reliability Features for Dichotomies of Printed Characters George Nagy, Xiaoyin Wang; Rensselaer Polytechnic Institute Retrieval Strategies for Noisy Text Daniel Lopresti, Jiangying Zhou; Panasonic Technologies, Inc. 12:00pm - 1:15pm Lunch Alexis Park Resort 1:15pm - 2:00pm Alexis Park Resort Team Debate "Defect Models are Important to Advance the State-of-the-Art of Optical Character Recognition" Affirmative Team: Henry S. Baird AT&T Bell Laboratories Robert Haralick University of Washington Negative Team: Daniel Lopresti Panasonic Technologies, Inc. George Nagy Rensselaer Polytechnic Institute Moderator: Tom Nartker Information Science Research Institute 2:00pm - 3:30pm Alexis Park Resort Session 7 A General-Purpose Japanese Optical Character Recognition System Sargur N. Srihari, Geetha Srikantan, Tao Hong, Brian Grom; State University of New York at Buffalo, Center of Excellence for Document Analysis and Recognition OCR and Voting Shell Fulfilling Specific Text Analysis Requirements Thorsten Jager; German Research Center for Artificial Intelligence (DFKI) Histograms to Evaluate OCR Accuracy and OCR Coupling Philippe Lefevre; EDF-Direction des Etudes et Recherches 3:30pm - 3:45pm Alexis Park Resort Refreshment Break 3:45pm - 5:15pm Alexis Park Resort Session 8 Logotype Detection in Compressed Images using Alignment Signatures A. Lawrence Spitz; Daimler Benz Research and Technology Center Reliable Recognition of Handwritten Marks in Checkboxes B. Latanzio, A. Garzotto; Swiss Life Information Systems Research Generalized Form Registration Using Structure-Based Techniques Michael D. Garris, Patrick J. Grother; National Institute of Standards and Technology 5:15pm Alexis Park Resort Symposium Adjourn Wednesday, April 17, 1996 8:20am - 8:30am Alexis Park Resort ISRI Welcome Thomas A. Nartker, Director Information Science Research Institute Howard R. Hughes College of Engineering University of Nevada, Las Vegas 8:30am - 9:45am Alexis Park Resort The Fifth Annual Test of OCR Accuracy Steve Rice Information Science Research Institute 9:45am - 10:00am Alexis Park Resort Refreshment Break 10:00am - 12:00pm Alexis Park Resort ISRI Research Reviews ISRI Staff Invited Speakers Hans-Peter Frei is the head of UBILAB, the Information Technology Research and Innovation Laboratory of the Union Bank of Switzerland (UBS). Dr. Frei holds a diploma in mathematics and a Ph.D. in computer science from the University of Zurich. Before joining UBS, he was a professor of computer science and chairman of the Department of Computer Science at ETH, the Swiss Federal Institute of Technology in Zurich, Switzerland. Prior to that he was the head of a management support unit of a large Swiss insurance company. Dr. Frei has held several research positions with various research institutions, such as HumRRO, IBM Research, Xerox PARC, University of Melbourne, and ICSI of the UC Berkeley. His research interests focus on interactive systems and in particular on information and document processing. Michael Lesk received the Ph.D. degree in Chemical Physics in 1969. He joined the computer science research group at Bell Laboratories, where he worked until 1984. Since 1984 he has managed the computer science research group at Bellcore. Dr. Lesk is best known for work in electronic libraries, including the CORE project for chemical information, and for writing some Unix system utilities including those for table printing (tbl), lexical analyzers (lex), and inter-system mail (uucp). His other technical interests include document production and retrieval software, computer networks, computer languages, and human-computer interfaces. Dr. Lesk has been chair of the Association for Computing Machinery's special interest groups on Language Analysis and on Information Retrieval. During 1987 he was Senior Visiting Fellow of the British Library, and he is currently Visiting Professor of Computer Science at University College London. Juergen Schuermann received the Dipl.-Ing. degree in Communications Engineering in 1960 and the Dr.-Ing. degree in 1968, both from the Technical University in Berlin, Germany. In 1963 Dr. Schuermann joined the Telefunken Research Laboratories in Ulm, Germany, which later became part of Daimler-Benz Research. Since 1974 he has been teaching Pattern Recognition at the Technical University of Darmstadt where he has served as Honorary Professor since 1981. Presently he is heading the Pattern Understanding Group of the Information Technology Department at Daimler-Benz Research embracing efforts in Text, Speech and Image Understanding. Together with his research group and the respective development departments he has been closely involved in the development of document understanding systems - especially in the postal business (AEG-ElectroCom) and in speech understanding systems, vision based driver assistance systems, and imaging radar systems for traffic applications. Dr. Schuermann is the general chair of the forthcoming International Conference in Document Analysis and Recognition ICDAR'97, to be held in August 1997 in Ulm Germany. Debate Teams Henry S. Baird is a Member of Technical Staff at the Computing Science Research Center, AT&T Bell Laboratories, Murray Hill, New Jersey. His research focuses on the design and analysis of algorithms for machine vision with emphasis on the interpretation of images of printed documents. Dr. Baird is an Area Editor for the journal Computer Vision and Image Understanding. In 1989-91, he was an Associate Editor of IEEE Transactions on Pattern Analysis and Machine Intelligence. He was principal organizer of the 1990 IAPR Workshop on Syntactic and Structural Pattern Recognition. His Princeton University Ph.D. thesis on algorithms for image matching won a 1984 ACM Distinguished Dissertation Award and was published by the MIT Press. In 1976, his Master's thesis gave the first complete description of the sweep-line algorithm, a fundamental technique in computational geometry. Dr. Baird is a senior member of the IEEE, a member of ACM, and active in the IAPR. Bob Haralick is the Boeing Clairmont Egtvedt Professor in Electrical Engineering at the University of Washington. His recent work is in shape analysis and extraction using the techniques of mathematical morphology, robust pose estimation, techniques for making geometric inferences from perspective projection information, propagation of random perturbations through image analysis algorithms, and in document image analysis. Dr. Haralick joined the faculty of the Electrical Engineering Department at the University of Kansas from 1975 to 1978. In 1979 he joined the EE Department at Virginia Polytechnic Institute where he was Professor and Director of the Spatial Data Analysis Laboratory. From 1984 to 1986, he served as Vice President of Research at Machine Vision International in Ann Arbor, MI. Professor Haralick is a Fellow of IEEE for his contributions in computer vision and image processing. He is a Fellow of the IAPR for his contributions in image processing, computer vision and mathematical morphology. He has served on the Editorial Board of IEEE PAMI and is a past associate editor of IEEE Systems, Man, and Cybernetics and IEEE Image Processing. He currently serves on the Editorial board of Real Time Imaging and is an associate editor for Journal of Electronic Imaging. Dr. Haralick received a B.A. in Mathematics from the University of Kansas in 1964, a B.S. degree in Electrical Engineering in 1966 and an M.S. degree in Electrical Engineering in 1967. He completed his Ph.D. at the University of Kansas in 1969. Daniel Lopresti received the A.B. degree in Mathematics from Dartmouth College in 1982, and the Ph.D. degree in Computer Science from Princeton University in 1987. From 1986 until 1991, he was on the faculty of the Computer Science Department at Brown University. In 1991 he joined the newly-formed Matsushita Information Technology Laboratory as a Senior Scientist and leader of the Carbon Project. His research interests include document analysis, information retrieval, parallel VLSI architectures, and computational aspects of molecular biology. George Nagy received the B.Eng. and M.Eng. degrees from McGill University, and the Ph.D. in Electrical Engineering from Cornell University in 1962. For the next ten years Dr. Nagy conducted research on various aspects of pattern recognition and OCR at the IBM T.J. Watson Research Center in Yorktown Heights. From 1972 to 1985 he was Professor of Computer Science at the University of Nebraska - Lincoln, and worked on remote sensing applications, geographic information systems, computational geometry, and human-computer interfaces. Since 1985 he has been Professor of Computer Engineering at Rensselaer Polytechnic Institute. Dr. Nagy has held visiting appointments at the Stanford Research Institute, Cornell, the University of Montreal, the National Scientific Research Institute of Quebec, the University of Genoa and the Italian National Research Council in Naples and Genoa, AT&T Bell Laboratories, IBM Almaden, McGill University, and the Information Science Research Institute at UNLV. In addition to document image analysis and character recognition, his interests include solid modeling, finite-precision spatial computation, and computer vision. Registration Pre-Registration: before March 15, 1996 On-site Registration: Sunday, April 14, 7:00pm to 10:00pm Monday, April 15, 7:00am to 11:00am Tuesday, April 16, 7:30am to 11:00am Location: Alexis Park Resort Cost: $425.00 before March 15, 1996 $500.00 after March 15, 1996 Dinner Monday April 15, 1996 The College of Hotel Administration at the University of Nevada, Las Vegas is one of the finest programs of its type in the nation, and has an international reputation as well. We are delighted to have the students from the College's Food and Beverage Management Department prepare and serve an outstanding dinner for symposium guests on Monday evening from 6:00pm to 10:00pm. The dinner will be held in the Boyd Dining Room in Frank and Estella Beam Hall. The cost is $20 per person. For resevations please fill out the section on the attached symposium registration form. Hotel Accommodations Alexis Park Resort, located near the center of the Las Vegas strip, is the host hotel for the 1996 Symposium. If you choose to stay at the Alexis Park Resort, please make hotel reservations no later than March 14 to ensure room availability. A reservation form is included in this advance program for your convenience. Due to convention season in Las Vegas, ROOMS WILL FILL UP QUICKLY AT ALL HOTELS. Please make hotel reservations as soon as possible. Should you choose to stay at a hotel other than the host hotel, the Las Vegas Convention and Visitors Authority can give hotel information and make all hotel room reservations throughout the city of Las Vegas. For more information please call the Las Vegas Convention and Visitors Authority 1-800-332-5333. Fifth Annual Symposium on Document Analysis and Information Retrieval INFORMATION SCIENCE RESEARCH INSTITUTE University of Nevada, Las Vegas April 15-17, 1996 Conference Registration Form Name: ________________________________________________________________________ Title: _______________________________________________________________________ Company: _____________________________________________________________________ Address: _____________________________________________________________________ City: ________________________________________________________________________ State/Country: ______________________________________ Zip: ___________________ Telephone: ___________________________Fax: ___________________________________ E-mail Address: ______________________________________________________________ Registration Fees Pre-Reg Regular Amount before 3/15/96 after 3/15/96 Conference Registration $425.00 $500.00 $____________ Includes lunch Monday, 4/15/96; and lunch Tuesday, 4/16/96) Monday Dinner (per person) $ 20.00 $____________ Conference Proceedings (Extra Proceedings) $ 50.00 $____________ (One Proceedings is included as part of the registration fee) 1995 CD-ROM $100.00 $____________ (1995 Conference Proceedings and Annual Report) 1992, 1993 and 1994 CD-ROM $100.00 $____________ (1992, 1993 and 1994 Conference Proceedings and 1993 and 1994 Annual Report) TOTAL AMOUNT DUE: $____________ Enclosed is my payment payable by (check one): Check/Money Order _____ Mastercard _____ VISA _____ Discover _____ Make checks/money orders payable to: UNLV Board of Regents. All checks must be in U.S. Dollars and drawn on a U.S. Bank. For payment by credit card please fill out the following information: Credit Card Number:_______________________________ Expiration Date:__________ Please Print Name (as it appears on card):____________________________________ I authorize ISRI/UNLV to debit my account for the TOTAL AMOUNT DUE: signature: ___________________________________ Mail completed conference registration form and payment to: Symposium Manager Information Science Research Institute Telephone (702)895-4571 University of Nevada, Las Vegas Fax (702)895-1183 4505 Maryland Parkway Email sdair@isri.unlv.edu Box 454021 Las Vegas, NV 89154-4021 Alexis Park Resort Hotel Registration Form P.O. Box 95698 Las Vegas, NV 89193-5698 Rooms reserved under the name: SDAIR '96 Mail your reservation directly to Alexis Park Resort or call Room Reservations: (800)582-2228 Fax: (702)796-4334 Reservations received after March 14, 1996 will be accepted on a space available basis only. Please reserve accommodations for: Name: ________________________________________________________________________ Home Address: ________________________________________________________________ City: _____________________ State/Country: __________________ Zip: ___________ Company Name: ________________________________________________________________ Business Address: ____________________________________________________________ City: _____________________ State/Country: __________________ Zip: ___________ Business Phone: ______________________________________________________________ SINGLE OCCUPANCY - $100.00 (+8% tax) TRIPLE OCCUPANCY - $115.00 (+8% tax) DOUBLE OCCUPANCY - $100.00 (+8% tax) QUAD OCCUPANCY - $130.00 (+8% tax) Will Arrive: _____________________________, 1996 Time: ____________________ Will Depart: _____________________________, 1996 Time: ____________________ Enclosed is my deposit payable by (check one): Check _____ Mastercard _____ JCB _____ Visa _____ American Express _____ Carte Blanche _____ Discover _____ Diners Club _____ Credit Card Number: __________________________________________________________ Expiration Date: _____________________________________________________________ Print name as it appears on card: ____________________________________________