"DjVu (pronounced "déjà vu") is a digital document format with advanced compression technology and high performance value. DjVu allows for the distribution on the Internet and on DVD of very high resolution images of scanned documents, digital documents, and photographs. DjVu viewers are available for the web browser, the desktop, and PDA devices."
OCRopus is a state-of-the-art document analysis and OCR system, featuring pluggable layout analysis, pluggable character recognition, statistical natural language modeling, and multi-lingual capabilities. The OCRopus engine is based on two research projects: a high-performance handwriting recognizer developed in the mid-90's and deployed by the US Census bureau, and novel high-performance layout analysis methods. OCRopus is development is sponsored by Google and is initially intended for high-throughput, high-volume document conversion efforts. We expect that it will also be an excellent OCR system for many other applications.
"This tool is powered by John MacFarlane's amazing Pandoc. If you want to convert files that are too large to upload or while disconnected from the internet, learning to use Pandoc is a worthwhile investment."
"There are several advanced table formatting techniques to improve the display or editing of wikitables in Wikipedia. Most of the tips involve use of standard text-editors. While some special software packages exist, to allow customized editing, they are typically not available when travelling to other computers for wiki-editing.
"