Conference Information

                Hardcopy Document Processing Workshop
          Conference on Information and Knowledge Management
                              CIKM 2004
                          November 12, 2004
                        Hyatt Arlington Hotel
                           Washington D.C.
Workshop Topic

A stated purpose of the CIKM 2004 is to identify challenging problems
facing the development of future knowledge and information A major
(and growing) challenge for industry and government is need to access
and process the content of hardcopy documents. The focus of the
workshop is to present current research and development in addressing
this challenge. Topics would include but not be limited to:

Information retrieval in the image domain
Information extraction in the image domain
Optical correlation for processing of text
Information retrieval of noisy OCR documents
Innovative OCR techniques
Robust OCR for degraded images
Innovative OCR techniques
OCR post-processing techniques
Automatic categorization of noisy OCR documents
Clustering of noisy OCR documents
Entity extraction in noisy OCR documents
Processing of text (overlaid and in-scene text) in video images
Visualization of noisy OCR document collections
Processing of fax documents
Machine translation of noisy OCR
Automatic summarization of noisy OCR
Forms recognition of noisy text
Duplicate Detection

Relevance of Topics

During the 1990s, CEOs realized that their intellectual assets
represented the majority of corporate wealth. The discipline of
knowledge management was born to leverage these
resources. Intellectual assets are primarily contained in hardcopy
document collections. Many of these collections were scanned, indexed,
OCRed and placed upon corporate intranets to be employed to gain
competitive advantage.companiessystems.

Businesses, particularly highly regulated industries such as
pharmaceutical, environmental, and transportation, generate hardcopy
records that must be retrievable to demonstrate compliance. Accurate
document retrieval requires sufficient indexing.  Unfortunately,
sufficient indexing requires a priori knowledge of future, unknown
requirements.

Many government applications need to retrieve and process hardcopy
documents on an on-going basis. Law enforcement and national defense
organizations have a critical need to process hardcopy document
content. In many cases, documents must be exploited in near-real time
for their content to be actionable. Further, documents of interest to
the Government tend to be very noisy and often contain multiple
handwritten annotations or other marks.

Currently, the only viable solution is to be able to retrieve and
process the content OCRed documents. In virtually all of these
examples, the cost, in either time or capital, of correcting OCR is
prohibitive, and therefore either OCR accuracy must be improved, the
ability to process noisy OCR must be improved, or new, innovative
techniques must be developed to process text in the image domain. The
ability to process hardcopy documents is a challenge of international
importance and an appropriate workshop topic for this CIKM Conference.

Target Audience:

The focus of the Hardcopy Document Processing Workshop is to bring
together the text processing and information retrieval research
communities along with the users who face the challenge of processing
information from hardcopy documents. The purpose will be to gain a
better understanding of the current state-of-the-art and needs of the
user community by exchanging of ideas. The target audience will be
mixture of academia and researchers from the user communities.

Organizing Committee

Kirk Lubbes - Workshop Chair, SAIC Division Chief Scientist
Dr. David Doermann - Co-Director, Laboratory for Language and Media
    Processing, University of Maryland
Ramana Rao - Founder, CTO and Senior VP of InXight Software
Dr. Kazem Taghva - Associate Director, Information Science Research
    Institute, University of Nevada, Las Vegas
Dr. Wei Gen Yee - Assistant Professor, Computer Science, Illinois
    Institute of Technology
Dr. Shih-Fu Chang - Professor, Electrical Engineering, Columbia
    University
Dr. Nancy Chinchor - Program Manager, Advanced Technology Programs, CIA
Prem Natarajan -Distinguished Member of Technical Staff, Speech and
     Language Department, BBN Technologies
Dr. Heather McCallum-Bayliss -Senior Computational Linguist, Lockheed
    Martin
Dr. Kristen Summers - Senior Scientist, CACI

Agenda

- Introduce Workshop agenda
- Keynote speeches on the challenge
- Papers of ten to twenty minutes duration followed by topical discussions
- End with a panel discussion

Submission Requirements and Evaluation Criteria

Three to four page abstracts will be submitted for review by the
organizing committee.  Presentations should be planned for being ten
to twenty minutes in length. Papers should address research agendas,
on-going research, novel solutions, or case studies related to
application of state-of-the-art technologies outlined
above. Presentations are to be advanced technology oriented, not
marketing or advertising oriented. Abstracts are to be submitted
preferably in .pdf format (MS Word format is acceptable if .pdf is not
available) by e-mail to klubbes@recordsengineering.com.

Final Submissions

Final submissions of approved papers/presentations are due by 8
September. They may be in form of technical papers, preferably in .pdf
format (MS Word format is acceptable if .pdf is not available), or
presentations, in MS PowerPoint format. Papers and presentations are
to be e-mailed to klubbes@recordsengineering.com.

Schedule

- Approval of Hardcopy Document Processing Workshop by 1 June by CIKM
   Workshop Chair
- Submission of 3-4 page abstracts by 15 July to Organization Committee
- Approvals from Organization Committee by 1 August
- Submission of camera-ready copy of papers submitted by 8 September