22 August 2007

Related article searching

A study on PubMed related article searching, which I learned about from a post by David Rothman last month, got me excited again about a feature of the database that I use frequently but unreflectively. Like many of my colleagues, I welcomed the introduction of related articles linking in PubMed and rejoiced at the inclusion of the top five related links in the AbstractPlus display. As I work away at my literature searches, I have often wondered how this feature is being used, what is driving the sophisticated software gears turning in the background, and how successful it is at retrieving truly relevant material.

A recent Technical Report from the University of Maryland, Exploring the effectiveness of related article search in PubMed (1), explains how "related article links create dense clusters of relevance that make for a potentially fruitful browsing experience" (p. 9). Query logs created for the study show that PubMed searchers often click on related articles suggested by the system. This enhanced feature of PubMed, the authors conclude, forms an integral part of the information-seeking experience of many users.

During a period of one week the authors observed 35,136,632 page views across 7,964,643 sessions. Many of these had to be discounted because they represented bots and direct access to MEDLINE via embedded links or another search engine. Their final data set was whittled down to a mere 1,941,329 sessions (24.4% of the total). These sessions included at least one PubMed search query and view of an abstract - in other words, "actual attempts at addressing information needs in PubMed," as the authors put it (p. 7-8).

From this subset it was observed that roughly a fifth of all "non-trivial" PubMed sessions (18.5%) involved examination of related articles. Subsequent actions of users were also analyzed after they clicked on a related article link. Once they started browsing related articles they were likely to continue doing so more than 40% of the time. The study demonstrates quite clearly that related article search receives "high sustained usage" (p. 9).


Points to remember about related article searching

More than two years ago Jan's Search Tips was recommending adding related article searching to our arsenal of strategies, and she listed briefly the advantages and disadvantages. To review the topic again and refresh my own memory, I would like to restate here the relevant points about the Related Articles feature (you can also review PubMed's help file under "Finding articles related to a citation"):
  • The AbstractPlus display automatically includes the first five Related Links citations.
  • Each citation in PubMed includes a link that retrieves a pre-calculated set of citations that are closely related to the selected article. PubMed creates this set by comparing words from the title, abstract, and MeSH terms using a word-weighted algorithm.
  • If you select Related Articles from the Display menu without selecting specific citations, PubMed will retrieve the related articles for the citations displayed on the page.
  • Related article searching is not comprehensive.
  • Any limits you applied to your original search are NOT in effect when you use a Related Articles link, even if Limits are selected.
  • By using History you can reapply limits (e.g., English, dates) but this removes the ranked order and may remove citations that are most relevant.
  • Relevancy can drop off quickly. The list you get with a Related Articles link is displayed in ranked order from most to least relevant.
  • And, of course, you need to start with a relevant citation for the feature to work at all.

Serendipity

Related Articles searching is undoubtedly a valuable addition to even the most carefully planned PubMed run. For all my sedulous crafting of comprehensive professional search strategies, I still find myself stumbling over citations that have eluded my detection methods. So I almost never complete a search without taking the extra step of checking related articles on the most pertinent citations I have found. The serendipitous benefits of these links have been highlighted once again for me in a recent paper on searching for systematic reviews in palliative medicine, which appeared in the American Journal of Hospital & Palliative Medicine (2).

The authors of this study report that PubMed related articles links yielded 15% of all the citations finally included in their systematic literature review on cancer symptom assessment instruments. Although the related articles link is not included in methodologic recommendations for systematic literature reviews, the experience of the authors suggests that it is a useful tool in PubMed for reviewing complex evidence. Related links searches are recommended for any systematic PubMed literature review on a complex topic.

The conclusions of this study make good sense, but take a look at the authors' search strategy in Table 2 (p. 182):
First search
Neoplasms AND signs and symptoms AND scale, OR instrument, OR checklist, OR inventory, AND evaluation OR assessment, OR rating, OR distress, OR severity, OR frequency

Second search
Incorporating (prevalence) and (incidence)

No wonder they had trouble finding good stuff without relying on related articles searching. The strategy looks sketchy and incomplete to me. There isn't any indication of how or whether they balanced MeSH and text string searches. Some obvious approaches seem to have been omitted. The MeSH heading Questionnaires was not used - even though it is to be found in the PubMed record for the actual systematic review of which this is a companion article (3). And what about other appropriate headings like Pain Measurement or Nursing Assessment? I'll wager that contributions from readers of this post, based on just a few minutes of analysis, could have greatly improved the results of the original project, or at least reduced the work of this team of scientists and physicians in producing their results. But perhaps I am being presumptuous.

The use of sophisticated computer algorithms to extract knowledge from the MEDLINE database is a feature of the third-party PubMed alternatives that have come along lately. For an overview see the recent article by David Rothman (4) and numerous references on his blog. There is a comprehensive list on the Arrowsmith website, and I surveyed a few myself earlier this year in my library's Info-RX newsletter. (I like PubFocus for its cool way of incorporating journal ranking into searches.)

There is a considerable literature on MEDLINE data mining. I looked at a few articles with various proposed methods of citation relevance scoring (5,6,7), but found no further details on PubMed's own related articles algorithm. This research is interesting if you can get by the mathematical formulas, but I would like to see more work assessing the value of what we have now and suggesting practical methods for search enhancement.
I to the world am like a drop of water
That in the ocean seeks another drop.

Comedy of Errors 1.2

References:

1. Lin J, DiCuccio M, Grigoryan V, Wilbur WJ. Exploring the effectiveness of related article search in PubMed. Technical report LAMP-TR-145/CS-TR-4877/UMIACS-TR-2007-36 /HCIL-2007-10, University of Maryland, College Park, July 2007. 2007 July:1-10 [cited 2007 Aug 20]. Available from: http://hcil.cs.umd.edu/trs/2007-10/2007-10.pdf

2. O'Leary N, Tiernan E, Walsh D, Lucey N, Kirkova J, Davis MP. The pitfalls of a systematic MEDLINE review in palliative medicine: symptom assessment instruments. Am J Hosp Palliat Care. 2007 Jun-Jul;24(3):181-4.

3. Kirkova J, Davis MP, Walsh D, Tiernan E, O'Leary N, LeGrand SB, et al. Cancer symptom assessment instruments: a systematic review. J Clin Oncol. 2006 Mar 20;24(9):1459-73.

4. Rothman D. A selection of useful third-party PubMed tools. MLA News 2007 Jun-Jul; 397:12,24.

5. Demner-Fushman D, Lin, J. Answering clinical questions with knowledge-based and statistical techniques. Computational linguistics. 2007 March;33(1): 63-103.

6. Tbahriti I, Chichester C, Lisacek F, Ruch P. Using argumentation to retrieve articles with similar citations: an inquiry into improving related articles search in the MEDLINE digital library. Int J Med Inform. 2006 75(6):488–495.

7. Ruch P, Boyer C, Chichester C, Tbahriti I, Geissbühler A, Fabry P, et al. Using argumentation to extract key sentences from biomedical abstracts. Int J Med Inform. 2007 Feb-Mar;76(2-3):195-200.

0 comments: