Comparing the access to and legibility of Japanese language texts in Massive Digital Libraries

Andrew Weiss; Ryan James

Conference Proceeding

Comparing the access to and legibility of Japanese language texts in Massive Digital Libraries

In previous studies, Weiss and James have examined the impact of Massive Digital Libraries (MDLs) on the development of libraries in terms of copyright, metadata, accessibility and diversity. This paper continues these investigations by presenting the results of a study conducted in 2013-2014 that examines the coverage and accessibility of Japanese language books in two MDLs, Google Books and HathiTrust. A random sample of 5000 Japanese-language books with publication dates prior to 1943 was extracted from the OCLC WorldCat database; of these another 800 were randomly selected and 400 titles were examined. The titles were queried in both Google Books and HathiTrust. The texts were then examined for their level of typical user access, their accuracy in metadata and their quality of scans. Despite their likely public domain status within Japan and in the United States, 0.2% (N=1) of the sampled texts were visible in Google Books as full texts. While 12.5% (N=50) of the sample were visible in HathiTrust. Within the full view texts, errors in scanning and metadata were identified, including problems with legibility ("moji tsubure") in 68% of visible texts; distorted content (including slanted and upside-down pages) in 90%; motion or blur of turning pages captured by digital cameras in 48%; extra-textual objects (3-D items not part of text; i.e. fingers, hands, book holders, etc.) in 94%; and use of heavily-defaced, dirty or fragile source material in 28%. The most common metadata errors were missing bibliographic information, especially missing page numbers (in 18% of texts) and incomplete tables of contents (in 22%); and problems associated with poor OCR, especially unusable keywords and common phrases (in 50% of texts) that appear to be random words, articles, and unpronounceable symbols.

Date

2015

Resource Type

Creator

Campus

Northridge

Publisher

Subjects

Date Accessioned

2016-06-30T21:06:55Z

Handle

http://hdl.handle.net/10211.3/173284

["Made available in DSpace on 2016-06-30T21:06:55Z (GMT). No. of bitstreams: 0 Previous issue date: 2015"]

Language

English

Bibliographic Citation

2015 International Conference on Culture and Computing (Culture and Computing), Kyoto, p. 57-63.

Notes

Conference paper or proceedings
Proceedings for Culture and Computing 2015; includes post-print version of proceedings and slides from oral presentation.

Rights Note

Thumbnail	Title	Date Uploaded	Visibility	Actions
	ICCC-weiss-james-MDLs-japan-v2.pdf	2020-09-24	Public	Download
	ICCC-2015-presentation-apweiss-v2.pptx	2020-09-24	Public	Download

Downloadable Content

Comparing the access to and legibility of Japanese language texts in Massive Digital Libraries