Design, query, and evaluate information retrieval systems
Introduction
Summer of 2001 at a camp for science and mathematics, I encountered a digital database for the first time, one serving astrophysical studies at a public university in California. Unfortunately, four weeks was not enough for me to learn to use it, despite two astrophysics professors and their graduate students guiding me through defining queries, and searching and uploading data.
Unable to ask the right questions, I had settled for half-heartedly calculating the trajectory of the asteroid Melpomene to see if it would hit Earth “anytime soon”. According to Quintarelli (2014), this was because I was overwhelmed by the database’s structure and the deluge of results by the singular Boolean search feature. Thus, it is important for databases to be designed to suit their target users’ investigative preferences.
To Hong et al. (2009), context is what can represent how users manage an application, like one used to retrieve information from a database. Had someone explained more to me about why Linux was the programming language used to access the database, the system’s most frequent users, or data-types searched, I would have had more context on what other queries I might have leveraged and how to make better use of my search results.
Stefanidis et al. (2011) defined “personalization systems” as mechanisms that give users only significant search results, a retrieval that is a near-1.0 for both recall and precision. In 2001, the astrophysicists at the university that summer did not need such a feature, because as subject experts they would have known if the database had stopped retrieving relevant results. And though myself and fellow campers did- and current users do need it to use some of today’s exponentially burgeoning information systems, there are data repositories designed so closely to intended users’ needs any kind of search, from browse to keyword, can yield satisfaction.
Design
Databases designed according to the intended user’s search preferences will keep users from feeling overwhelmed and help them develop smarter research queries. This applies to assigning appropriate encoding schemas that will determine the collections’ level of analysis and information hierarchy, and thus what search features could be provided. Depending on the context –of the research, level of subject familiarity, or comfort with technology-, users may want to browse, use advanced search, or any approach in between. So databases should be designed to provide a mix of search features to anticipate the user and what context they bring to the query.
Query
Since starting my MLIS, I have preferred to begin my research with OneSearch, the King Library’s meta-database feature, paring down results to the double digits by selecting “articles”, “full text available”, then “peer-reviewed”. I will also scan an item’s reference list for related papers and databases.
Often, once I know which discipline or subject to research, I vet databases by reading their product pages. They usually tell me what needs the database was designed to support and what type of documents, number of publications, and approximate number of records it hosts. If there are Title- and/or Coverage Lists, I would look through to get an idea of what is available. This way, I have some context for what to expect on my search results, and how the database could support a query expansion.
Evaluate
Stefanidis et al.’s (2011) experimental design to define a “personalization system” made user-centered sense. Creating 12 default profiles for users to modify served to establish which attributes were being used to create a personalized contextual user experience, while informing users on what to expect of the database. However, not every organization can fund such a complex instrument into their database designs, or need to for users to easily find what they want.
Sometimes, a basic search accompanied by a list of links to topic-dedicated pages are enough to make a database feel intuitive, especially if its design had already incorporated context relevant to its target audience.
Evidence
INFO 281 – Metadata for Digital Databases – Discussion: TEI (Text Encoding Initiative) and EAD (encoded archival description)
This was a discussion post for INFO 281, Metadata for Digital Databases, where I explained how I had felt an online archive had not assigned a proper EAD schema for their 1906 San Francisco Earthquake and Fire Digital Collection. Pointing to both human- and machine-readable formats, I discussed how they were displayed neither in a hierarchy nor showed any form of analytical progression; it had looked only as organized as a shopping list. This would be no help to users who are unfamiliar with the 1906 Earthquake and Fire or how natural disasters affect metropolitan areas. To fix this, I suggested the items could further be categorized by periods (like “disaster-still on fire” or “disaster-fire’s out”) or scale of destruction by subject (“entire wrecked neighborhood” or “one burning house”).
A California university’s library website on the collection is easier to browse, more user-friendly for those unfamiliar with online searching. However, some of the browsing lists were long enough to intimidate me, an experienced searcher.
This discussion post is evidence I understand the importance of databases designed with encoding schemas that can provide multiple search options. This way, common query barriers like subject familiarity or comfort with technology could be lowered.
INFO 244 – Online Searching – Database Review
I reviewed two online databases for INFO 244, Online Searching.
The first’s product page stated it was designed to support research in the food industry, contained journals, monographs, magazines, trade publications, and industry and market reports, and hosted over 3.4 million records across 750 publications. Its Full Text Subject Title- and Database Coverage Lists lets browsers know the wide range of topics covered and the countries participating in the research.
It took some searching to find out San Jose State University subscribes to the second one, which is more of a repository with 35 databases under seven categories: Case Law, International Resources, International Treaties and Agreements, Journals and Periodicals, Special Collections, U.S. Federal Content, and U.S. State Content. Highlighted databases were Civil Rights and Social Justice, COVID-19: Pandemics Past and Present, Executive Privilege, Military and Government, NOMOS: American Society for Political and Legal Philosophy, and Open Society Justice Initiative.
I concluded that these two databases’ front matter –like a book’s author, publisher, copyright, and chapters- provided precise context of their contents by clearly stating who intended users were, and what information one can expect to find. This way, users will better understand how to manipulate and leverage search results into more dynamic research questions.
These reviews are evidence I have learned to first examine a database’s attributes and features for context on the information available so, after querying, I can better refine and incorporate search results.
INFO 202 – Information Retrieval System Design – Using Websites
This was a discussion where I reviewed a website’s quality of information retrieval for INFO 202, Information Retrieval System Design. I narrated my history of visiting this mostly chronologically ordered news database that reported on pop culture, traditional headlines, shopping, animals, and science, and had video series covering many topics.
Their search capabilities were basic-keyword, from a magnifying glass at top right, and browsing pages accessible from a hamburger menu top left beside the site logo. Each button and link made it clear where they were headed. It took me only seconds to find something I would like, and a little longer for anything more specific. Also, each page was laced with hyperlinks to within and outside of the website.
I expected most users would be satisfied navigating the site without the aid of a “personalization system” (Stefanidis et al., 2011). The website seemed to understand its target audience –the online savvy 18-and-over-, and had designed its content and structure to accommodate anyone wanting information in long or short form, video or text.
This review is evidence I know how to evaluate a database by the quality and means of its information retrieval, and how its design can lend to an overall intuitive user experience.
Conclusion
Information is of no use if it cannot be stored then located. As information professionals, we should proactively assess databases for how their design, query features, and ease of use lend themselves to retrieving precise and relevant search results that users understand how to incorporate into their tasks. To help with this, I will survey colleagues and clients regularly to identify problems with the databases I am tasked to maintain. And should I become a school or academic librarian, I would like to create an instrument with other staff and faculty members to gauge how our students are getting on with our subscription and in-house databases. Then, at the start of each semester, we can teach a revised information literacy course.
References
Hong, J., Suh, E., & Kim, S. (2009). Context-aware systems: a literature review and classification. Expert Systems with Applications, 36(4), 8509-8522. https://doi.org/10.1016/j.eswa.2008.10.071
Quintarelli, E., Rabioso, E., & Tanca, L. (2015). A principled apporach to context schema evolution in a data management perspective. Information Systems, 24, 65-101. https://doi.org/10.1016/j.is.2014.11.008
Stefanidis, K., Pitoura, E., & Vassiliadis, P. (2011). Managing contextual preferences. Information Systems. 36(8), 1158-1180. https://doi.org/10.1016/j.is.2011.06.004.
For Competency F essay, please click ‘Previous Post’ below.