Metasearch questions

One of the objectives on the IMLS DCC project is to explore ways to more effectively integrate the metadata records we harvest from libraries, museums and archives with other digital content resources. We especially are striving to bridge the gap between primary source digitized content (most of the metadata we aggregate describes such content) and related secondary resources available in digital format (e.g., contemporary journal literature on the same or related topics, much of which is licensed). We are exploring a variety of tools and services that can facilitate this process, with particular attention to metasearch technologies. We’d be especially interested in your advice on how to make further advances in our research on this front.

So far, building on prior UIUC Library metasearch research and applications – e.g., our EasySearch application (see:
http://search.grainger.uiuc.edu/searchaid/easy_search_summary.html), we have added limited metasearch functionality to our Opening History portal
(http://imlsdcc.grainger.uiuc.edu/history/). On this portal when you do an item level search, we return search results for your query from Academic Search Premier, America: History and Life, (Elsevier) Scopus, and Google Book (when not blocked). A few specific issues on which we’d appreciate your comments:

1.    What other targets would be relevant for us to pull in through metasearch? Especially those indexing digital secondary sources relevant to the portal’s topic thrust (American history). We’re also very interested in other portals, repositories, or resources recognized in various communities, e.g., museums and historical societies covering American history, which we might be able to tap (even if we have to do some screen scraping).

2.    Should we consider adding metasearch functionality at other levels of the portal – e.g., to allow users looking at full records to do metasearches not of their original query, but of terms and indexing in found records?

3.     Beyond standards-based Z39.50, SRU/SRW, and XML Gateway implementations of metasearch functionality, what other similar kinds of services should we be looking to exploit?

4.    Should we look at making our metadata aggregation a metasearch target for other portals? Again are there existing portals in this domain that seek to exploit resources like ours. If so, which metasearch service protocols would be most critical and how might we register or otherwise advertise availability?

5.    What else, in terms of the services we support or exploit, should we be looking at to improve our integration with other relevant digital repositories and portals?

Interface subgroup questions

The interface group has been looking at how to support various kinds of browsing within and particularly between collections.
In this part of the work we want to focus on supporting browsing for a number of reasons:

Search technology is well advanced, but browsing technology, functionality and interfaces are relatively weak, especially with respect to the kinds of items we have in our collections.

There is evidence to indicate the importance of browsing to support scholarly activity.

We want to support the following kinds of browsing activities:
·         From collection to collection, discovering related collections
·         From item to collection
·         From item to item within a collection
·         From item in one collection to item in another collection, but making the collections explicit

From these general aims we have a few more particular ones to help us get started:

Overview Visualization

We want to provide a variety of summary views of what is in a collection, to complement and supplement the existing collection metadata, including the collection description. All these sources provide different ways to help answer the question: “What have you got?”

The idea is to give some of the qualitative ‘feel’ for a collection that traditionally can often be obtained by physically browsing shelves in a library or archive.

We suspect that even relatively simple summary statistics can usefully contribute to such an overview.

We are experimenting with a few, starting with time and space:

Where are items in the collection from, and when were the items created?

Even simple maps and temporal histograms may give a rapid and rough sense of coverage and focus within a collection.

Additionally size and shape, based on simple counts of the number of artifacts accumulating in certain spaces and time periods can aid this sense.

The following mockup illustrates a number of different lightweight summary visualizations that might be generated from collection and item metadata

http://flickr.com/photos/musebrarian/3116736806/sizes/o/

This design is just a thought piece to provoke further brainstorming, discovery of possible features, functionalities, visualizations and interface elements. We do not envisage implementing them all.

We would appreciate any ideas this approach provokes, including:
·         suggestions of other projects doing something similar, either in the context of cultural heritage, or elsewhere (such as e-commerce – Amazon we know provides various mechanisms to support browsing, though not a good sense of “what have you got?” – other than “everything!”)
·         descriptions of how similar low cost qualitative awareness activities are done in *physical* museums, libraries and archives, such as walking the stacks.
·         suggestions of other measures that might have a good cost/impact ratio

Derived and inferred Metadata
Prior work has shown that harvested and aggregated metadata inevitably suffers from various quality problems.  One in particular is missing data in certain fields. We believe it may be possible to derive at least good guesses of certain values by a number of relatively simple inferencing applications.

As before we will prioritize place and time, trying to fill in missing spatial and temporal fields from information from similar, ‘nearby’ items.

Once done, we will have the challenge of considering how to represent this inferred data, so that users realize that these are just good guesses of other things that might be of interest.
Implying definitiveness or greater confidence in accuracy risks leading users to mistrust  the accuracy of the entire dataset.

One analogy from prior, pre-computational work in this area is the use of the concept and term ‘circa’ to indicate doubt, but a potentially useful informed guess.

Suggestions of use scenarios to support in functionality and interface
In describing this project, a common initial response is: “Oh, so that will enable scholars to find appropriate illustrations for their work?”. That is certainly one plausible use scenario. But what are others?

It seems that the items in our collection have a different status and granularity than a book or a research paper accessed from a digital library or repository. If a retrieved item from our system is a digitized textual document, then it contains material that can be read. But what if the retrieved item is a photograph (with accompanying metadata)? What are some ways of ‘reading’ photographs and the like, and what requirements do they create for supporting finding?

We would really appreciate use scenarios from physical library museums and archives that can inform our design of interfaces and functionality to support such use.
These might be browsing and use behaviors in the physical world that it would be desirable to also explicitly use in the digital world, or to have equivalents for.
There might also be problematic behaviors in the physical world that a digital world might explicitly support. For example, thirty years ago, a special collection might have a card catalogue organized by author and by subject, but have lacked the resources for also supporting search by title. Computerization makes adding title search trivial. Whole text scanning makes full text search also possible.
What are desirable but currently unobtainable or expensive browsing options?

Opening History

The IMLS Digital Collections and Content project announces a new portal of digital collections and items: Opening History: U.S. History Resources from Libraries, Museums, and Archives.

Opening History is a registry of digital collections of United States history-related content and a respository of items harvested from these collections.  The related Sowing Culture blog will highlight individual items found within the portal in order to increase awareness of these resources.