Tuesday, November 9, 2010

Second look at Google Books

Ryan James, who works at UHM's Hamilton Library, presented the findings from his Google Books research project.  The purpose of his study was to discover the rate of error in two categories:  legibility and metadata.

Legibility

Legibility: essentially whether or not the words on a page are readable. 
James categorized errors as major (the page or words on the page are completely unreadable) or minor (the page or words are difficult to read, but can be deciphered). 

This part of the study looked at the first 50 pages of 50 books (2500 pages).  James found that less than 1% of the pages contained major or minor errors.  These errors were made up mostly of fingers obscuring the pictures and gutter problems.   

Metadata

To test the accuracy of the metadata, James looked at 4 categories-- title, author, publisher, publication date-- and tested 400 books.  He found that 36.75 percent of the total areas checked were in error.  He also found that very few of the books had more than one error their metadata.  As an example of metadata error, James showed a test he did using Edgar Allan Poe.  He searched Google Books for books by Poe with a publication date of 1809 or earlier.  He got two results, one published in 1800, and one published in 1669.  Since Poe was born in 1809, these dates are obviously in error.

During the question/answer period, a librarian in the audience identified herself as working at the University of Michigan Libraries-- a library system that is cooperating with the Google Books project -- and said that, in her experience, Google is conscientious about correcting errors when they are brought to their attention.  She also pointed out that the project is in process and that Google is constantly updating and rescanning unsatisfactory images. 

James was also asked what rate of error is, in his view, acceptable.  He responded that that depends on the purpose of use.  In his opinion the ideal would be less than 0.5 percent.

The ultimate result of James' research seems to be that Google Books does contain a significant number of errors with no known (to James) mechanism for quality control.

-- Stacy Judy

No comments:

Post a Comment