JISCMail - JISC-REPOSITORIES Archives

The following development from Google could have a big impact on institutional repositories. PDFs from scanned documents and/or from low-end software are often just images that can be read by humans, but cannot be searched by keyword or indexed by search engines. I am sure that most if not all repositories hold such PDFs. This Google initiative will unfurl the cloak of invisibility from them.

Peter Millington

SHERPA, University of Nottingham

* Google sheds light on 'Dark Web' by searching scanned documents
http://cwflyris.computerworld.com/t/3821061/247711/148332/2/

Using optical character recognition (OCR) technology, Google's search
engine now can convert scanned PDF documents into text that can be
searched and indexed, the company said. Thus, government reports,
academic papers and other scanned documents can now show up in search
results. Search engines generally interpret PDF documents as images of
text rather than text.

This message has been checked for viruses but the contents of an attachment may still contain software viruses, which could damage your computer system: you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation.