JISCMail - JISC-REPOSITORIES Archives

The following development from Google could have a big impact on
institutional repositories. PDFs from scanned documents and/or from
low-end software are often just images that can be read by humans, but
cannot be searched by keyword or indexed by search engines. I am sure
that most if not all repositories hold such PDFs. This Google initiative
will unfurl the cloak of invisibility from them.
 
Peter Millington
SHERPA, University of Nottingham
 
* Google sheds light on 'Dark Web' by searching scanned documents
http://cwflyris.computerworld.com/t/3821061/247711/148332/2/
<http://cwflyris.computerworld.com/t/3821061/247711/148332/2/> 

Using optical character recognition (OCR) technology, Google's search
engine now can convert scanned PDF documents into text that can be
searched and indexed, the company said. Thus, government reports,
academic papers and other scanned documents can now show up in search
results. Search engines generally interpret PDF documents as images of
text rather than text.


This message has been checked for viruses but the contents of an attachment
may still contain software viruses, which could damage your computer system:
you are advised to perform your own checks. Email communications with the
University of Nottingham may be monitored as permitted by UK legislation.