Show simple item record

dc.contributor.author Weiss, Andrew en
dc.contributor.author James, Ryan en
dc.date.accessioned 2016-06-30T21:06:55Z en
dc.date.available 2016-06-30T21:06:55Z en
dc.date.issued 2015 en
dc.identifier.citation 2015 International Conference on Culture and Computing (Culture and Computing), Kyoto, p. 57-63. en
dc.identifier.uri http://hdl.handle.net/10211.3/173284 en
dc.description Proceedings for Culture and Computing 2015; includes post-print version of proceedings and slides from oral presentation. en
dc.description.abstract In previous studies, Weiss and James have examined the impact of Massive Digital Libraries (MDLs) on the development of libraries in terms of copyright, metadata, accessibility and diversity. This paper continues these investigations by presenting the results of a study conducted in 2013-2014 that examines the coverage and accessibility of Japanese language books in two MDLs, Google Books and HathiTrust. A random sample of 5000 Japanese-language books with publication dates prior to 1943 was extracted from the OCLC WorldCat database; of these another 800 were randomly selected and 400 titles were examined. The titles were queried in both Google Books and HathiTrust. The texts were then examined for their level of typical user access, their accuracy in metadata and their quality of scans. Despite their likely public domain status within Japan and in the United States, 0.2% (N=1) of the sampled texts were visible in Google Books as full texts. While 12.5% (N=50) of the sample were visible in HathiTrust. Within the full view texts, errors in scanning and metadata were identified, including problems with legibility ("moji tsubure") in 68% of visible texts; distorted content (including slanted and upside-down pages) in 90%; motion or blur of turning pages captured by digital cameras in 48%; extra-textual objects (3-D items not part of text; i.e. fingers, hands, book holders, etc.) in 94%; and use of heavily-defaced, dirty or fragile source material in 28%. The most common metadata errors were missing bibliographic information, especially missing page numbers (in 18% of texts) and incomplete tables of contents (in 22%); and problems associated with poor OCR, especially unusable keywords and common phrases (in 50% of texts) that appear to be random words, articles, and unpronounceable symbols. en
dc.format.extent 7 pages + 39 slides en
dc.language.iso en en
dc.publisher International Conference on Culture and Computing en
dc.publisher IEEE en
dc.rights Copyright 2015 en
dc.rights.uri http://creativecommons.org/licenses/by-nc-sa/3.0/us/ en
dc.subject Digital libraries en
dc.subject Google Books en
dc.subject HathiTrust en
dc.subject Internet Archive en
dc.subject Massive Digital Libraries (MDLs) en
dc.subject Japanese language en
dc.title Comparing the access to and legibility of Japanese language texts in Massive Digital Libraries en
dc.type Proceedings en
dc.type Presentation en
dc.rights.license Attribution-NonCommercial-ShareAlike 3.0 United States en
dc.identifier.orcid orcid.org/0000-0002-8900-2779 en


Files in this item

Icon
Icon

This item appears in the following Collection(s)

Show simple item record

Copyright 2015 Except where otherwise noted, this item's license is described as Copyright 2015

Search DSpace


My Account

RSS Feeds