Translate

Tuesday, April 24, 2018

Google Books



Google Books as a General Research Collection 

The current study attempts to measure the extent to which “full view” volumes contained in Google Books constitute a viable generic research collection for works in the public domain, using as a reference collection the catalog of a major nineteenth-century research library and using as control collections—against which the reference catalog also would be searched—the online catalogs of two other major research libraries: one that was actively collecting during the same period and one that began actively collecting at a later date. A random sample of 398 entries was drawn from the Catalogue of the Library of the Boston Athenæum, 1807–1871, and searched against Google Books and the online catalogs of the two control collections to determine whether Google Books constituted such a viable general research collection. 

The Library Project—and the discretion given the libraries in determining which volumes would be digitized—raises an interesting question: To what extent is Google creating a research collection? Coyle has suggested that the manner in which collections are being selected for inclusion in the Library Project—many being taken en bloc from low-use remote storage facilities—makes it difficult to characterize Google Books as a “collection” in the accepted sense, though for better or worse “it will become a de facto collection because people will begin using it for research.”

On the negative side of the ledger, two significant caveats must be recalled. The digitized images of individual pages are not always reliable—poor scanning can occasionally be so extensive as to render a digitized volume unusable—and folded maps and other illustrative matter are routinely scanned in their folded state, rendering them useless for research. One can reasonably expect that these flaws will be corrected over time, at least for high-demand texts: The users of the texts will insist on it and, at any rate, the libraries involved are committed to it. Measuring the extent of this problem was not within the scope of the current study, but an extremely useful future research project would try to do so.

On the positive side, Google Books provides full-text indexing, something of incalculable value that would have been inconceivable had these volumes not been scanned. This indexing allows one to search both within individual volumes and across the entire collection, facilitating text based research in general, but especially historical research and the comparison of variant texts. While this indexing is dependent in individual cases on the quality of the original page scan and the fidelity of the OCR rendering, in the aggregate the amount of hidden content that is thus exposed far exceeds the amount that remains hidden (or imperfectly rendered via OCR). Additionally, Google Books is serving as a huge laboratory for what is called “document image understanding”—the increasingly sophisticated probabilistic analysis of page images to facilitate indexing, interpretation, and other activities.

As noted above, in the past, collections of works in the public domain—especially older English language works—were microfilmed by commercial firms in collaboration with various research libraries. The resulting microform collections have subsequently been digitized, either by the firms that did the original microfilming or by successor firms, and made available on a subscription basis. As Google Books and other mass digitization projects continue their progress through various research library collections, the viability of these preexisting collections may increasingly come into question as subscribing institutions weigh their annual use of these materials against the annual charges they pay for access. 

Currently only a small fraction of the materials in Google Books—perhaps 15 percent—is thought to be in the public domain. The great bulk is still protected by copyright, including a large but unknown number of so-called orphan works for which it is difficult or impossible to locate the current copyright holder.

I think that Google Books is a great source for research provided the scanned information is clear and legible. Although, it does seem that it is a work in progress that still needs to be refined. After visiting the Richard Nixon Presidential Library and Museum, I learned that a significant amount of time and work goes into the inspection and scanning of paperwork and documents. I believe the overall concept and informational access Google Books provides to libraries, schools and the general public is of great service and outweighs any cons. An example of the benefits of Google books is featured in this video below.
The British Library and Google to make 250,000 books available to all


WORKS CITED
Jones, E. (2010). Google Books as a General Research Collection. Library Resources & Technical Services54(2), 77-89

Karen Coyle, “Google Books as Library,” online posting, Nov. 22, 2008, Coyle’s InFormation, http://kcoyle.blogspot.com/2008_11_01_archive.html (accessed Sept. 18, 2009).

The HathiTrust instructions at http://babel .hathitrust.org/cgi/mb?a=page;page=help (accessed Sept. 18, 2009).

Robert B. Townsend, “Google Books: Is It Good for History?” Perspectives Online
no. 6 (Sept. 2007), www .historians.org/perspectives/issues/2007/0709/0709vie1.cfm (accessed Sept. 18, 2009); Musto, “Google Books Mutilates the Printed Past.”

Faisal Shafait et al., “Background Variability Modeling for Statistical Layout Analysis,” Proceedings of the 19th International Conference on Pattern Recognition, December 8–11, 2008, Tampa, Florida, USA, (2008) IEEE Computer Society, 2008, doi:10.1109/ICPR.2008.4760964, http://dx.doi.org/10.1109/ICPR.2008.4760964 (accessed Sept. 18, 2009)

https://www.youtube.com/watch?v=mDALzaRxWq0

Palomar College Library: Academic Search Complete


x

2 comments:

  1. Linda, I like that you tied in our field trip to the Nixon Presidential Library. You're right to point out the amount of time it takes to create a digital rendering worthy of being used. In addition to time though, I imagine it also takes some money too. Great writing!

    ReplyDelete
  2. Thanks Danielle. Your absolutely right the cost is also another big factor to consider.

    ReplyDelete