Demonstrate understanding of basic principles and standards involved in organizing information such as classification and controlled vocabulary systems, cataloging systems, metadata schemas or other systems for making information accessible to a particular clientele
Introduction
Information organization is the process of indexing or classifying information for easy retrieval later (Detlor, 2010). Information professionals (IP) use controlled vocabularies, like subject headings and thesauri, to select industry/discipline-specific metadata for creating records that help users find items and also understand relationships between search terms, results, and adjacent subjects. Prominent examples of controlled vocabularies are the Library of Congress Subject Headings (LCSH), Getty’s Art and Architecture Thesaurus (AAT), and the National Library of Medicine’s Medical Subject Headings (MeSH).
Information can change a user’s relationship with knowledge (Boisot & Canals, 2004). So to provide records with least amounts of bias, when organizing their collections IPs must become proficient in identifying where the quality and scope of information can be altered or lost. This happens often when metadata are cross-walked from one encoding scheme to another. A common headache is when information in the Dublin Core Metadata Element Set (DC), a standardized set of 15 elements to describe resources, is transferred to the Metadata Object Description Schema (MODS) that has more elements for a richer, more complex item description.
XML (Extensible Markup Language) is an open standard –accessible by all- language and file format that the Library of Congress uses to develop MODS. Due to the format being both human- and machine-readable, it is regularly used by metadata creators and catalogers for cultural heritage digital collections. A common concern is how these collections tag their elements with different terms, making their XML documents unusable outside of their institutions and communities.
Thesauri are references for finding how terms are related to each other, with information customarily organized by taxonomical classification that express hierarchies and conceptual networks. Usability issues abound when they are created for use in more than one language at a time.
Crosswalking from DC to MODS
The Library of Congress has a webpage dedicated to mappings and stylesheets for users needing to crosswalk from any version of MODS to (unqualified) DC then back again (Library of Congress, 2022). (It also includes guides for converting to and from MARC [Machine-Readable Cataloging] standards.) It does not matter in which direction the crosswalk happens, some metadata will get cut due to the receiving schemas’ description elements being too refined or broad to adapt them.
Thus, IPs must gain clear insight of their collections’ or catalogs’ intended users through incorporating user research, surveys, and senior colleagues’ continuous awareness of the user population into how they organize metadata records. Doing so would let IPs know how and how far to manipulate terms, or when it is acceptable to remove it.
Variability in XML documents
XML documents can be usable and interchangeable among different institutions and communities if their description elements are tagged with identical sets of controlled/allowed terms (Miller, 2011). However, successive IPs, as they inherit their organizations’ information systems, will naturally generate new tags to match their developing user populations. This will render data sets describing the same items more and more unusable outside their respective institutions.
To alleviate this, IPs can continually update themselves on metadata and encoding scheme standards.
Usability issues in multilingual thesauri
In thesauri, terms in the hundreds up to thousands are selected then related to each other in the context of the reference work’s target audience. Establishing this framework in one language is monumental enough; to endeavor this across two or more will require IPs with strong multilingual and cross-cultural backgrounds, and experience researching current affairs that pertain to their thesauri’s related disciplines.
Without such a qualified team to build and maintain it, a multilingual thesaurus could fail to serve intended users by excluding entire populations due to ignoring regional dialects or an inability to integrate contrasting cultures.
Evidence
INFO 281 – Metadata for Digital Collections – Dublin Core (DC) and MODS Assignments
These were related assignments on two encoding schemes for INFO 281, Metadata for Digital Collections. Working from the record “Civilian Exclusion Order No. 96, 1942” from the Online Archive of California (OAC), I encoded the metadata first onto DC, then crosswalked that record onto MODS.
A significant struggle from creating the DC record was establishing the Source(s), from which controlled vocabularies the terms were selected. Input to this element would give users context for the terms and what the record represented. Eventually, I decided on “lcsh” (Library of Congress Subject Headings), “local” (vocabulary specifically for the Japanese Internment Collection), and “tgm I” (Thesaurus for Graphic Materials I: Subject Terms).
The most difficult choices for the crosswalk involved deciding which MODS Subject Authority sub-elements to use for the terms under Subjects from the DC record. The sub-elements gave texture and hierarchy to the terms. They were no longer simply “topics”, but also “name/namePart”, “hierarchicalGeographic”, and such. Careless terms pairing could result in changing the context of the information, or the record not retrieved during search. I was close to giving up on “1942-1945”, then realized it could work as a “namePart”.
These assignments together are evidence I understand how to crosswalk metadata between encoding schemes, and why it is important to transfer over as many terms as possible, even if it means into other elements/sub-elements.
INFO 281 – Metadata for Digital Collections – XML Schema
This was a discussion post on XML for encoding schemes for INFO 281, Metadata for Digital Collections. I explained that well-formed XML is simply something that looks good to local creators and users, while valid XML is one that uses a DTD (document type definition) or other established schema to define and validate element types and attributes. I also discussed how generic XML does not have a MODS root element start or ending tag and, unlike MODS XML, Qualified Dublin Core XML has, instead of nesting, namespaces prefixed with “dc” or “dcterms”.
To create and maintain XML documents that can be used across multiple communities, not only must IPs stay updated on controlled vocabulary and metadata trends, they should also be consistent in adhering to XML standards.
This discussion post is evidence I understand how standards, widely and locally established, for XML documents can help organize an integrated information network that connect institutions and communities.
INFO 247 – Vocabulary Design – Thesaurus Evaluation
This is an evaluation of The International Thesaurus of Refugee Terminology for INFO 247, Vocabulary Design. Pieces of the Thesaurus was archived by the Bodleian Libraries of Oxford University in the United Kingdom. It had 28 main facets, many sub-facets, and was multilingual in English, French, and Spanish.
I, then, discussed issues with multilingual thesauri. A social issue would be including languages that make the reference work accessible to interested populations. For example, a chocolate thesaurus could include Vietnamese because Vietnam has a developing chocolate industry. A cultural one would concern dialects. Many languages like Arabic and French do not have a unifying authority to set/enforce a standard. So when selecting terms, IPs must account for thesauri’s intended user population to avoid confusion or offense. There are also difficulties when integrating cultures. In English/Chinese thesauri, “rice” would be a NT (narrow term) of “meal”, whereas for the Chinese it would be a UF (use for). A SN (scope note) could be used to clarify the difference.
This evaluation is evidence I understand those creating and maintaining multilingual thesauri face many issues in order to produce a work that can grow a user’s knowledge while remaining unbiased.
Conclusion
Because information can change a user’s relationship with knowledge, it is important for IPs to follow established standards and their own experience-derived principles when organizing information. The quality and scope of search results rely on, among other measures, how metadata is presented, data-encoded documents are structured, and carefully thesauri creators consider and relate terms. To continue updating myself on metadata and XML standards, I will check in twice a year with the Library of Congress online. To do the same for thesauri, I will join the SLA’s (Special Libraries Association) Taxonomy Community.
References
Boisot, M. & Canals, A. (2004). Data, information and knowledge: have we got it right? Journal of Evolutionary Economics, 14(1), 43-67. https://doi.org/10.1007/s00191-003-0181-9
Detlor, B. (2010). Information management. International Journal of Information Management. 30(2), 103-108. https://doi.org/10.1016/i.ijnfomgt.2009.12.001
Library of Congress. (September 26, 2022). Metadata object description schema (MODS): Conversions. https://www.loc.gov/standards/mods/mods-conversions.html
Miller, S.J. (2011). Metadata for digital collections. Neal-Schuman Publishers, Inc.
For Competency H essay, please click ‘Previous Post’ below.