28 January 2008

The Scarers in Print: analyzing a poop test brochure for readability

Now, it's too late for me to begin shovelling and sifting at alphabeds and grammar-books. I'm getting to be a old bird, and I want to take it easy. But I want some reading—some fine bold reading, some splendid book in a gorging Lord-Mayor's-Show of wollumes. ~ Charles Dickens, Our Mutual Friend, Chapt. 5
Those of us who take our ability to read for granted, who live and breathe letters as dogs do feces and effluvia, are sometimes forgetful of the difficulties experienced by so many in the face of nothing more formidable than the back of a cereal box, let alone a "chapter book." In Our Mutual Friend the illiterate dustman Mr. Boffin hires that "ligneous sharper," the peg-legged Silas Wegg, to read to him his newly acquired and highly treasured book Decline-And-Fall-Off-The
Rooshan-Empire ("Eight wollumes. Red and gold. Purple ribbon in every wollume, to keep the place where you leave off.") Because all print is shut to them, he and Mrs. Boffin are willing to endure night after night of Mr. Wegg's garbled delivery of Gibbon's prose, just for the feeling of participating in what they consider proper culture. They want some fine bold reading, in some splendid book. Instead they have to put up with the malapropisms, mispronunciations and petulant malingering of the scheming Silas Wegg.

In 21st century Canada there are still many Mr. and Mrs. Boffins. According to ABC Canada:
  • Four out of 10 adults, age 16 to 65 — representing 9 million Canadians struggle with low literacy. (Adult Literacy and Life Skills (ALL) Survey, Statistics Canada and the Organization for Economic Co-operation and Development, 2005)
  • Nearly 15 per cent of Canadians can't understand the writing on simple medicine labels such as on an Aspirin bottle.
  • An additional 27 per cent can't figure out simple information like the warnings on a hazardous materials sheet.
  • In total, 42 per cent of Canadians are semi-illiterate. The proportion is even worse for those in middle age. And even when new immigrants are excluded, the numbers remain pretty much the same.
The statistics for our neighbours to the south are roughly the same, nor has there has been much improvement over the last generation in either country. This surprises me, for reading is ostensibly such a popular pastime and a prominent feature of our popular culture — certainly much more so than in that now distant, pre-postmodern period when there were few book clubs, no lit blogs, no Chapters-Indigo, no Amazon, no Oprah, no Harry Potter. In the United States, a 2004 NEA report on Reading at Risk lamented the sharp decline in the reading of books of "literature." Some hapless bloke's complaint is recorded there for posterity: "I just get sleepy when I read." To which Ursula Le Guin, in a feature article in the February Harper's, replies resignedly, yes, but there are actually many people who read wide awake. Le Guin thinks books are here to stay. "It’s just that not all that many people ever did read them. Why should we think everybody ought to now?" (Warning: the Harper's website only provides a teaser version of the article to non-subscribers.)

I have to agree. Books are not a threatened species. However, when it comes to health information, the large ranks of the functionally illiterate, possessing a merely nodding acquaintance with the printed word, face a very real threat. In English-speaking North America, as more than 300 studies indicate, health-related materials cannot be understood by most of the intended audience [1]. Now that two thirds to three quarters of our populations are seeking out consumer health information on the Internet, one wonders how this massive group of so-so readers is dealing with the often challenging vocabulary and syntax to be found on sites like MedlinePlus and the Canadian Health Network, not to mention the huckstering puffery of Health 2.0.
'Why, truly, sir,' Mr Wegg admitted, with modesty; 'I believe you couldn't show me the piece of English print, that I wouldn't be equal to collaring and throwing.' 'On the spot?' said Mr Boffin. 'On the spot.' 'I know'd it! Then consider this. Here am I, a man without a wooden leg, and yet all print is shut to me.'

Trying to digest the ColoScreen brochure
My own recent experience with "patient information" has convinced me that the literate need to learn how to write as badly as the illiterate need to learn how to read. Not long ago I had occasion to do some stooping and scooping for my own, not my Retriever's. A lab test was required, and for the first time in my life I was introduced to the mysteries of ColoScreen. My doctor gave me a package including a special envelope, three sample collection areas with fold-over flaps, and three small wooden "applicator sticks" with which to provide "specimen" smears. The ColoScreen kit came with a 600-word set of instructions: everything I needed to know about the poop test. To this extraordinary work they give the title ColoScreen: a test for fecal occult blood. Now there was some fine bold reading!

Talk about occult. Here is a representative selection of the kind of prose some committee probably laboured over for hours in order to ensure I would make no error in carrying out what any two-year-old can do with the greatest of ease all over the nursery wallpaper:
Two days prior to, and including the test period, a red-meat free, high-residue diet should be followed ... Do not ingest high doses of aspirin or other anti-inflammatory drugs, for 7 days prior to and during testing ... However, consult a medical professional before discontinuing any prescribed medication ... Discontinue the use of toilet tank/bowl cleaners or deodorizers throughout the test period to avoid interference ... Flush tissue with stool, and discard stick in waste container ... On the next two subsequent bowel movements, repeat above steps ...
What possible excuse could there be for this laughable effort? Precision? There certainly is lots of that. Ass-covering? No one can say they weren't told in excessive detail how to scrape their own excrement into an envelope. Just who was this written for? The patients, their health care professionals, the government, or the lawyers? The strenuous, stilted syntax, the jargon, the Latinisms, the pathological avoidance of common idiom — all this adds up in my mind to a truly deplorable effort at communication. What were the authors thinking? Half their potential readership is left to puzzle at expressions like "discontinuing any prescribed medication" and "subsequent bowel movements." And couldn't they have found a simpler way to say "waste container?" Do any of us use such language in our daily lives? "That's alright, Junior. Just make sure you throw any of Fido's subsequent evacuations into the waste container."

Helena Laboratories in Beaumont, Texas, is the manufacturer of the fecal occult blood test package, and, I assume, responsible for the accompanying patient instruction. Their website advertises a number of educational brochures, and I shudder to think of the squinting and squirming and cocking of heads going on right at this moment as millions of helpless Boffins decline and fall under these heavy catapults of English prose.

Different versions of the ColoScreen brochure may be found on the Helena.com website. I checked to see how they compare to the print version I received. Sad to say, they are even more prolix and impenetrable. Here is a not untypical excerpt (perhaps it reads more easily in the Spanish version):
Because of the nonhomogeneity of the stool, it is recommended that the test be performed on three (3) consecutive evacuations, or as close together as possible.
Testing for readability
In the face of such a frontal assault one can do little else but evacuate the wounded and regroup. I wanted to find out just how bad the ColoScreen brochure really was.
I needed to see some data that would allow me to compare it against a benchmark. So I resolved to analyze the text using a number of standard readability measures: Coleman Liau, Flesch Kincaid, ARI, SMOG, Gunning Fog, and Lexile. Microsoft Word does a basic readability test as part of its word count feature, and I started there.

According to Word's word count my printed ColoScreen brochure scored 57.2 in the Flesch Reading Ease measure, and 8.8 in the Flesch-Kincaid Grade Level. Not satisfied to rely on a single test, I used a number of freely available online readability analyzers. The sites I used are stored under my readability tag on del.icio.us.

Running my brochure through these online tools produced slightly varying readability results, but they averaged out at roughly Grade 9 or higher. Testing against the Lexile measure returned a Grade 10 (1100L to 1200L). By way of comparison, the Harry Potter series measure between 880L and 950L; Don Quixote (in English translation presumably) rates a fairly high 1410L.

The scores were much too high. For consumer health information, the literature is full of admonitions to employ plain language at the fifth grade level or lower to accommodate differing literacy levels [2,3,4]. The ColoScreen instructions fail to inform patients in an appropriate manner. Not only is this preposterous brochure well-nigh impenetrable to the semi-literate, it is prudish and officious to boot. In fact, it stinks.

Everyone advocates health literacy. Library shelves sag with literature on the subject and the web does the same digitally. Barbara Nail-Chiwetalu reviews the issue of health literacy in a way I found useful [5]:
Health literacy may be defined as the ability to obtain, read, comprehend, and use health information to make appropriate health decisions. The development of appropriate and effective health communication is an initiative recognized in Healthy People 2010. To this end, improving health communication may call for use of a variety of approaches, which may include:

• Improving the accessibility of appropriate health materials in communities [6]
• Emphasizing readability and comprehension of health communication materials (e.g., pamphlets, instruction guides, package inserts, books, Web sites) by
o acquiring materials that are written in conversational style (active voice) [2]
o using short sentences of ten to fifteen words [2]
o translating complex medical terms [3]
o using plain language at the fifth grade level or lower to accommodate differing literacy levels [3]
o using caution with medical textbooks written for physicians or other health professions with consumer due to the high readability level and comprehension of terms [2]
o considering use of nonwritten materials (e.g., charts, diagrams, photographs , picture books, videotapes, audiotapes, multimedia presentations) with persons having limited literacy [3,4]
• Showing sensitivity to language and cultural needs by
o providing materials that are culturally relevant [3]
o translating materials into different languages [3]
o using interpreter services to provide direct translations of what is said [4]
• Adjusting oral communication of health information by
o slowing down the rate of speech when delivering health information [3]
o using a “teach back” or “show me” approach to ensure understanding [3]
o including important family members or close friends in discussions including “surrogate” readers [3]
I don't claim expertise in health literacy, and I realize that simple readability scores are not the last word in assessing the quality of consumer health information [7,8,9]. A recent study by Rosemblat et al. [10] enlightened me as to the importance of the "main point" for readability measurement, while at the same time admitting the difficulty of measuring it:
Only two features, "Vocabulary" and "Main Point," significantly predict whether the annotators rated consumer health texts as readable for general audiences. Traditional readability formulas incorporate syntactic (words per sentence) and semantic (vocabulary) features to predict readability. While the annotators verified familiarity with vocabulary as a predictor, they also found that effective communication of the main point is a significant attribute. These results may contribute to understanding consumer seeking and browsing health information online. For example, eye-tracking studies indicate that users typically scan a Web page for the "take-home" message and move on to another page if not found in a few seconds. However, "ability to communicate the main point" is difficult to define operationally and measure.
So with all this theory and the outstanding efforts of experts and advocates, how did I end up with the ColoScreen atrocity? What applicator stick smeared this rank specimen of English prose onto my slide? Here is the take-home message I extract from my experience. We all must work a little harder at communicating to the system that it must serve real people by promoting reading and assisting the non-readers among us. That is a professional and a political commitment. If the main goals of a system of public health are to increase quality and years of health life and to eliminate health disparities, then we health librarians must re-dedicate ourselves to the dissemination of health information and the promotion of healthy lifestyles, not just for the educated and comfortable, but for everyone who would not be able to read this sentence. More simply put, let's get rid of the Scarers in Print.

'... upon-my-soul to a old bird like myself these are scarers. And even now that Commodious is strangled, I don't see a way to our bettering ourselves. ... I didn't think this morning there was half so many Scarers in Print. But I'm in for it now!' ~ Mr. Boffin, after an evening of "declining and falling" with Mr. Wegg (Our Mutual Friend, Chap. 5)


24 January 2008

Medline copycats found out

Nature News (published online 23 January 2008 | doi:10.1038/news.2008.520) has revealed that as many as 200,000 of the 17 million articles in the Medline database might be duplicates, either plagiarized or republished by the same author in different journals. The full article is available here (Errami M, Garner H. A tale of two citations. Nature 2008 Jan 24;451:397-399.)

Analysis with text-matching software produced estimates that 0.04% of a random sample of 62,000 articles might be plagiarized, and 1.35% might be duplicates with the same author. Employing a clever shortcut, the researchers examined more than 7 million Medline abstracts with listed related articles, running their algorithm against just the original abstract and its "most related" abstract. This method revealed 70,000 potential duplicates, which have been loaded onto a publicly accessible database called Déjà vu. It is likely that tools such as Déjà vu and text-comparison software will act as future deterrents to plagiarism.

Publishers are already taking part in tests of anti-plagiarism tools. One of these, CrossCheck, compares new manuscripts against already published materials in its database. CrossCheck searches for similar or identical parts of manuscripts, and when it detects questionable text, it highlights those sections for a suspicious editor to scrutinize.

iParadigm, in Overland Park, Kansas, is working with the IEEE and the other publishers to develop CrossCheck. This is the same company that developed Turnitin.com, an online resource that helps university educators detect plagiarism in student papers. The program has been very successful as a deterrent, although not without stirring up controversy. CrossCheck is expected to be its equivalent

As Oliver Obst (from whom this post was plagiarized), comments in his blog medinfo, this kind of text mining would be easier and more useful generally if all articles were Open Access. It would then be possible to compare more than abstracts, which are brief and not as textually significant; plagiarism could also be determined at the syntax level of an article's full text — truly an alarming scenario for the cribber.

20 January 2008

Shout with the largest: violent disagreement about Iraqi mortality rates

'It's always best on these occasions to do what the mob do.' 'But suppose there are two mobs?' suggested Mr. Snodgrass. 'Shout with the largest,' replied Mr. Pickwick.
~ Charles Dickens, The Pickwick Papers, Chapt. 13

An article in the New England Journal of Medicine was published on January 9, 2008 with new mortality statistics compiled by the Iraqi Ministry of Health under the sponsorship of the World Health Organization [1]. The study, by the innocuously named Iraq Family Health Survey Study Group (IFHSSG), concludes that 151,000 Iraqis suffered violent deaths between March 2003 and June 2006. A previous estimate, the highly controversial study published in the Lancet in October 2006 [2], suggested a much higher number: more than 600,000 deaths.

Les Roberts, a co-author of the Lancet survey (and an earlier one in 2004 [3]), has offered his response on Tim Lambert's Deltoid blog, which generated an enormous number of comments. The entire post reflects very well the politically charged controversy surrounding the issue of Iraqi mortality since the American-led invasion in March 2003. Roberts claims that there is more in common in the results than appears at first glance, and he continues to defend the conclusions of the two Lancet studies.

In an angry attack on the NEJM study in Counterpunch [4], Andrew Cockburn notes how their final tally of "only" 151,000 deaths has been greeted with respectful attention in US press reports, along with swipes at the Lancet effort for having, as the New York Times reminded readers, "come under criticism for its methodology." Cockburn argues forcefully that the IFHSSG study is guilty of sloppy methodology and tendentious reporting. He criticizes the NEJM for "lending its imprimatur to this farrago."

On September 14, 2007, ORB (Opinion Research Business), an independent UK-based polling agency, published a startling estimate of the total casualties of the Iraq war that has received little mention in the mainstream press. The figure suggested by ORB, which was based on survey responses from 1,499 adults, stands at more than 1.2 million deaths. The horrifying figures tallied by ORB — although its survey was conducted independently, using a different polling methodology — are consistent with the Lancet findings. One of the shocking results of the ORB analysis was that almost one in two households in Baghdad have lost a family member, significantly higher than in any other area of the country. The governorates of Diyala (42%) and Ninewa (35%) were next.

The reaction to the ORB report in the US political and media establishment was virtual silence. There was no comment from the Bush White House, the Pentagon, or the State Department; not a single Republican or Democratic presidential candidate or congressional leader made an issue of it; nor was the subject raised on the Sunday morning TV talk shows. Perhaps their attention was diverted by events in Iraq itself, for it was at this same time that eight civilians were reported killed by private US "security contractors" in a ghastly Baghdad shootout. That story was heavily covered by all the media for days afterward.

At the other end of the Iraq mortality scale, the Iraq Body Count website, which has strongly criticized the Lancet studies, estimates the death toll by violence at between 80,000 and 88,000. IBC's results are based on English-language media reports only, and are accurate as far as they go. They do not even attempt to estimate the number of deaths resulting from the dreadful conditions prevailing in Iraq.

As Stalin is said to have observed: A single death is a tragedy; a million deaths is a statistic. Aside from moral considerations and the ideological wars well documented in various sources [5-8], the difficulty with Iraq mortality numbers seems to involve methodological differences with regard to non-violent death. The Lancet studies and the ORB poll include estimates of mortality occasioned by other means than bombs or bullets. Hence the enormous casualty rates reported in those surveys. The problem with this month's NEJM study is that only violent deaths "count," as if people dying from poverty, lamentable public health conditions, poor nutrition, or terrible health care are somehow less dead, or as if the increase in their numbers is any less attributable to the invasion.

Illustrative of how the Iraq mortality debate has become a touchstone for broad ideological differences is that the media reaction to the Lancet studies is even being used by statistics teachers to highlight the role of politics in the framing of statistical findings. Les Roberts himself has commented on the controversy surrounding his research: "It is odd that the logic of epidemiology embraced by the press every day regarding new drugs or health risks somehow changes when the mechanism of death is their armed forces" [5]. De Maio [9] discusses how class discussions can be based around this quotation to explore the complex interplay between hierarchies of credibility, claims of scientific precision, and political standpoints. It is interesting to note the contrasting receptions given in the media to Robert's Iraq surveys and a study that Roberts led in the Congo using a very similar methodology [7]. The conclusions of the latter were accepted unquestioningly by the press and political leaders alike [6].

What exactly the intentions of the authors of the NEJM article were when they undertook their survey is a matter for debate. While to some their results will look like a whitewash, it does not appear that their work was dishonest or deliberate propaganda. Unfortunately, it has entered public discourse primarily in terms of its disapprobation of the Lancet survey (the ORB report is still under the radar). That is the way it is going to be used, and almost all the attention given to the problems and complications pointed out by Roberts and others will be confined to a small number of commentators and scholars. Lenin's Tomb commented: "Whatever the intentions of the ministry of health workers who carried out this study, its findings are now out of their hands. It is now a weapon for neutralising the findings of the Lancet survey."

Counterspin on the Lancet studies is still in full swing. Witness the oddball critique just published in the National journal [10], which goes so far as to suggest that The Lancet was a victim of "wartime fraud." The article's author, Neil Munro, also claims that jihadists "used this research as a justification for killing Americans." (In 2001 Munro looked forward to the destruction of Iraq in an enthusiastic opinion piece for the right-wing National review online entitled "The Iraqi opportunity: Berlin '45. Tokyo '45. Baghdad '02." He was off by a year.)

Meanwhile, in numbers that still remain undetermined — but that everyone agrees are horribly excessive — Iraqis continue to die. Kieran Healy at Crooked Timber has nicely summed up the disagreements among epidemiologists, politicians, statisticians, ideologues, intellectuals, and wing-nuts:

The main challenge facing those doing this sort of research is that there is a war going on, and wars kill a lot of people, bring about the dissolution of households, and compel very large numbers of people to flee the region. All of this makes the machinery of statistical science rather difficult to apply. None of the available numbers look any good, both on their own and given what they imply about what’s happening in Iraqi society. If you find yourself really delighted that a war of choice has resulted in the deaths of a population the size of Jersey City, or maybe Oakland, instead of one the size of Baltimore, you probably need to rethink your priorities.


13 January 2008

Stiffening in the cold: more on condoms in the Canadian winter

"The elastomers—natural rubber and polychloroprene— ... are susceptible to crystallization during prolonged exposure to low temperatures. This leads to a gradual long-term stiffening. ... Stress–strain measurements have confirmed the extremely large increase (up to 100-fold) in the initial stiffness that crystallization produces." (Fuller KNG, et al. The effect of low-temperature crystallization on the mechanical behavior of rubber. Journal of polymer science: Part B: Polymer physics. 2004;42:2181-90.)

The rubber librarian has been at it again, straining hard to stretch a bit more evidence over a vast gap in the condom literature. Last month I posted on my failure to find any scientific literature on the effect of extreme cold on latex condoms. But the thrust of my investigation didn't stop there. Not trusting my own abilities to probe the strange literature of latex, I consulted with a colleague in the Sciences and Technology Library who knows well the ins and outs of the relevant databases, just to make sure that I hadn't missed anything vital. After a prolonged search he found a number of seminal papers on the influence of low-temperature crystallization on the tensile elastic modulus of natural rubber. My enthusiasm for the subject momentarily bounced back, until I realized that once again the condom was getting no respect. All that rubber research and not one mention of condoms. Undeflated however, I carried on. Here is a brief survey of the science of gelid latex, and anything useful pertaining to condoms that can be extracted from this small body of knowledge. A full bibliography with abstracts is given below as an appendage.

To get any insight into what "tensile elastic modulus" exactly means, think stretchability or "elongatability." Modulus is a mathematical term that was appropriated by the British scientist Thomas Young in the 18th century to express the physical measure of stiffness, equalling the ratio of applied load (stress) to the resultant deformation of the material, such as elasticity or shear. (A high modulus indicates a stiff material.) Having thus stretched my high school chemistry to the snapping point by reading through a torrent of exceedingly dull prose, I finally reached a partial understanding of inspissation and cold crystallization and their effect on tensile elastic modulus. To avoid undue rigidity of language, let us translate this jargon-splotched no-man's-land of technolinguistic barbed wire and chevaux-de-frises into a more flexible dube-ological vernacular (kondomswissenschaftliche Umgangssprache). The upshot of seventy years of low-temperature rubber research is that it gets hard in the cold. The non-scientific majority of humanity must be truly grateful for this remarkable advance in rubber research.

What does this all mean for the hardy condom user? Because cold tends to "crystallize" rubber, this leads to a progressive increase in density, gradual long-term stiffening, and a doubling of tensile elastic modulus ... of the condom, not its wearer. None of the literature discovered by my research actually concerns itself with the common condom, but all the science points to a Canadian winter's ability to make rubbers slightly brittle, which could possibly — and I emphasize possibly — lead to leaking or breakage. Not to elongate this explanation more than the kinetic measurements allow, it seems clear that the effect of arctic air on a condom's stress-strain characteristics, in reverse proportion to its effect on the body part for which the condom is designed, is one of stiffening and tensile swelling. Furthermore, as Natarajan cogently reminds us [6], free radicals formed during tensile testing at low temperatures are stable below the glass transition temperature of the material. (These radicals arise from main-chain fracture occurring during yielding of the material — and too-frequent reading of Bakunin in unheated garrets. Natarajan also suggests that yielding of the material which gives rise to these characteristics occurs by crazing of the material — reading Bakunin in an unheated garret during a Winnipeg winter.)

The existing research suggests that public health officials might consider ensuring that condoms for distribution by clinics and street health workers are not stored at extreme winter temperatures. Individuals should not keep their condoms in glove compartments, unheated back porches, or hidden behind the snow blower in the garage. Most package directions already recommend a normal range of acceptable temperatures for safe storage. Maybe they are right.

A condom manufacturing company with whom my local health authority has dealings responded by email to an official request for their position on condom storage. The company's reply stated that in their opinion there is no risk in storing condoms in extreme cold, as long as they are not thawed out with the application of high heat, but are allowed to come gradually to a normal temperature. A follow-up email was sent to the company asking why, this being the case, large boxes bought at wholesale containing hundreds of condoms have printed instructions not to store their contents in extreme heat or cold. To date no reply has been received. This anecdote is no proof that condom manufacturers have no answers, but it does demonstrate how the lack of research on this issue means that the concerns of public health departments cannot be resolved by resorting to corporate public relations offices.

Other questions come to mind. Even if the storage of condoms in extremely cold environments, caeteris paribus, has no effect on their integrity, what guarantee is there that the cold would never contribute to damage caused by the often imperfect conditions that pertain in warehouses? What if, for example, a large box full of condoms were dropped from a truck or a fork lift at a temperature well below zero, or were otherwise jostled, jounced or dented? Might the cold, having stiffened the latex, not contribute further to any resulting damage to individual condoms? Is it possible that the increased modulus and crystallization of the latex might contribute to minute tears that could cause leakage or breakage when the condom is eventually used? What is the effect of extremely low humidity and excessive cold on condom integrity? Could this combination further contribute to damage from being bumped or dropped in storage?

This will have to be our last word for now on condoms and low temperatures, until a free radical bounces upon the scene to answer all our questions, electrify the rubber world, and warm the hearts of Canadian street health workers with a path-breaking, definitive study.

Flecte quod est rigidum,
Fove quod est frigidum,
Rege quod est devium.

Bend what is stiff,
Warm what is cold,
Guide what goes off the road.

Archbishop Stephen Langton, d. 1228


10 January 2008

"Filter, flavor, flip-top box": getting PubMed filters to work good and draw easy

While attempting to assist a medical resident with a difficult search, I was trying out the Cochrane filter (revised strategy) as discussed in an article by Robinson & Dickersin in the International Journal of Epidemiology [1]. They present a highly sensitive search strategy to retrieve reports of controlled trials using PubMed. In short, you get a lot to like. Excellent for use in systematic review searching, the authors' filter creates a subset much larger than what you would get from clicking on the various clinical trial check boxes in the Limits menu.

This is the search string in all its wonky glory:

(randomized controlled trial[pt] OR controlled clinical trial[pt] OR randomized controlled trials[mh] OR random allocation[mh] OR double-blind method[mh] OR single-blind method[mh] OR clinical trial[pt] OR clinical trials[mh] OR ("clinical trial"[tw]) OR ((singl*[tw] OR doubl*[tw] OR trebl*[tw] OR tripl*[tw]) AND (mask*[tw] OR blind*[tw])) OR ("latin square"[tw]) OR placebos[mh] OR placebo*[tw] OR random*[tw] OR research design[mh:noexp] OR comparative study[mh] OR evaluation studies[mh] OR follow-up studies[mh] OR prospective studies[mh] OR cross-over studies[mh] OR control*[tw] OR prospectiv*[tw] OR volunteer*[tw]) NOT (animal[mh] NOT human[mh])

I have been saving this article for years, but never got round to trying the filter out. Upon entering the strategy directly into the PubMed search box I was alerted by a pink-banded message telling me that the two descriptors highlighted above were not found. Of the two problems, the first is an error in the search strategy itself (I found no correction in the literature). The second is the result of a recent "major change" in PubMed.

This was more filter flavour than I had counted on. Now I was really huffing and puffing as I inhaled a man-size portion of PubMed technical detail in order to discover what had gone wrong with my peer-reviewed, much treasured filter in the flip-top box of my knowledge base. Here is what I found:

1. "Comparative Study" is used only as a publication type. The field delimiter or tag [mh] must be replaced with [pt]. This appears to be an error on the part of the creators of the filter. Comparative Study has only ever been a publication type since 1966. (An aside: this term has never found its way into the PubMed Help list of publication types. However, it does show up on the official NLM Publication Characteristics (Publication Types) - Scope Notes web page.)

2. "Evaluation Studies" was once a MeSH heading but is now a publication type. The tag must be changed to [pt] or PubMed gets tetchy. More on this change below.

Here is a corrected version of the filter (it works in PubMed without error reports):

(randomized controlled trial[pt] OR controlled clinical trial[pt] OR randomized controlled trials[mh] OR random allocation[mh] OR double-blind method[mh] OR single-blind method[mh] OR clinical trial[pt] OR clinical trials[mh] OR ("clinical trial"[tw]) OR ((singl*[tw] OR doubl*[tw] OR trebl*[tw] OR tripl*[tw]) AND (mask*[tw] OR blind*[tw])) OR ("latin square"[tw]) OR placebos[mh] OR placebo*[tw] OR random*[tw] OR research design[mh:noexp] OR comparative study[pt] OR evaluation studies[pt] OR follow-up studies[mh] OR prospective studies[mh] OR cross-over studies[mh] OR control*[tw] OR prospectiv*[tw] OR volunteer*[tw]) NOT (animal[mh] NOT human[mh])

So I had stubbed out the problem filter and made my corrections, but what had happened to require the change to Evaluation Studies?

Not that many of us took much notice in the annual yuletide neuronal storm, but the National Library of Medicine announced a major revision of publication types and corresponding subject descriptors in a Technical bulletin dated 26 Nov 2007 (final update 13 Dec 2007). These were bundled with the usual announcements of new MeSH headings. You know the type: score-settlings amongst the specialists (Coronary Occlusion — not to be confused with Coronary Stenosis); the exotic and somewhat frightening (Leukemia, Myeloid, Chronic, Atypical, BCR-ABL Negative; Shiga-Toxigenic Escherichia coli; Weapons of Mass Destruction); and the when-would-I-ever-use-this? puzzlers: Pollination; Muscle, Striated).

PubMed now distinguishes between articles ABOUT evaluative studies and articles that ARE actually evaluative studies. For the former you must use the brand new MeSH heading Evaluation Studies as Topic. For the latter you use the publication type delimiter: Evaluation Studies [pt]. A useful distinction. Makes sense when you think about it. Direct from NLM's last Technical bulletin, here is a list of the new MeSH Headings that correspond to the Publication Types used for journal article indexing:

Bibliography as Topic
Biography as Topic
Clinical Trials as Topic
Clinical Trials, Phase I as Topic
Clinical Trials, Phase II as Topic
Clinical Trials, Phase III as Topic
Clinical Trials, Phase IV as Topic
Congresses as Topic
Consensus Development Conferences as Topic
Consensus Development Conferences, NIH as Topic
Controlled Clinical Trials as Topic
Correspondence as Topic
Dictionaries as Topic
Directories as Topic
Duplicate Publication as Topic
Evaluation Studies as Topic
Government Publications as Topic
Guidelines as Topic
Interviews as Topic
Legislation as Topic
Meta-Analysis as Topic
Multicenter Studies as Topic
Patient Education as Topic
Practice Guidelines as Topic
Randomized Controlled Trials as Topic
Retraction of Publication as Topic
Review Literature as Topic
Twin Studies as Topic
Validation Studies as Topic (New for 2008)

In what looks like a gaff or an oversight, the indexers at NLM did not see fit to create a MeSH heading Comparative Study as Topic, nor have they come up with rationalizations for the following: Follow-up Studies and Prospective Studies. Nor have they resolved the singular/plural confusion in headings of this type. Perhaps they wish to spare us too much excitement at once. Leave room in your Xmas stocking next December.

If you enter the revised Cochrane filter for controlled trials into PubMed, the database puffs out more than 3,450,000 hits. This creates a nice subset of the PubMed database consisting of controlled trials (or, more precisely, consisting of articles that at least contain terms that would lead one to suspect that they might be controlled trials of some sort). It works good and draws easy. You'd expect it to cost more, but it doesn't.

As Robinson and Dickersin state in concluding their article, "To continue to be an effective and efficient strategy, the revised strategy should be examined periodically to take into account new features available on PubMed, as well as developments in indexing by the National Library of Medicine." I can vouch for that. Pass me that flip-top box.


06 January 2008

Why can't del.icio.us do A to Z?

I struggled through the alphabet as if it had been a bramble-bush; getting considerably worried and scratched by every letter. After that, I fell among those thieves, the nine figures, who seemed every evening to do something new to disguise themselves and baffle recognition. But, at last I began, in a purblind groping way, to read, write, and cipher, on the very smallest scale. ~ Charles Dickens, Great Expectations, Chapter 7
My alphabet starts with this letter called yuzz. It's the letter I use to spell yuzz-a-ma-tuzz. You'll be sort of surprised what there is to be found once you go beyond 'Z' and start poking around! ~ Dr. Seuss, On Beyond Zebra
Librarians are perhaps overly given to the love of alphabetical order, and many of us have a hard time with today's casual lack of concern about whether McTavish should precede or follow Macdonald, or whether St. Boniface should interfile with Saint Boniface. We still cringe internally when we watch an analphabetic patron fumbling through a dictionary or encyclopedia. Anti-alphabeticists like Mortimer Adler have complained that resorting to the alphabet is an evasion of intellectual responsibility. In his new book, Everything is miscellaneous, David Weinberg fondly recollects Adler's pique in his description of the digitized and "miscellanized" third order of information, in which alphabetization is quickly going the way of Ptolemaic cosmography.

Everyone still likes a good list, and the fact remains that many lists are easier to use when they are alphabetized, like the phone book (despite its making a hash of initialisms), or Wikipedia's List of countries, or Google Reader's list of subscriptions. For the users of these tools it is second nature to find items according to the arbitrary order of our alphabet. del.icio.us itself recognizes this archaic institution, by default providing its users with its famous alphabetical array of tags. The mega-cool tag cloud is fundamentally alphabetical. Finding a tag is much simpler when you can do it at a glance rather than typing blindly into a search box. That is why I find so irksome the complete inability of del.icio.us to alpha sort links within tags or without. When del.icio.us was being coded it was decided that the list default would be reverse chronological order, with no alternatives allowed. It's absolutely crazy-making. Alphabetical sorting is on most users' wish lists, and it has been promised by the folks at del.icio.us. Perhaps 2008 will see the release of the new Delicious 2.0.

What to do in the meantime? I have a simple need. I want to be able to sort a list of blogs by student nurses that I have gathered under the tag blogs.nurschool. There are about 50 items that I would like to list alphabetically by blog title. A simple matter it seems, but impossible for del.icio.us. I could copy the chronological list I get when I click on the tab and do the sort manually. But that's silly. Surely there is some other way, I thought; and I set out to find it.

A Greasemonkey approach to sorting del.icio.us bookmarks: Del.icio.us alpha sort

As a user of the Firefox add-on Greasemonkey, I immediately started searching for a script that might do the job. Greasemonkey allows users to install scripts that add new functionality to web pages. (If scripts are not your thing, move on the next section.) I quickly found one called Del.icio.us alpha sort, which places a control at the top of the page permitting the user to toggle between alphabetical and chronological sorting of links — something del.icio.us programmers should have added in the first place. The problem is that it doesn't work properly. When I tried the script it would do the alphabetical sort only for one page — which works up to a point, that is, if you have 25 items or less in that tag. But, in all fairness, should the user be required continually to adjust the number of items appearing on a del.icio.us page to accommodate for this programming deficiency? The maximum number of bookmarks on one page is one hundred. What if you have more than one hundred bookmarks to be sent to a list? I quickly uninstalled Del.icio.us alpha sort without experimenting further. This script is not ready for prime time (it also slows down del.icio.us) and I don't have the hacking ability to improve on it. It really is time for del.icio.us programmers to incorporate alphabetical sorting into their code.

It occurred to me later that Del.icio.us alpha sort might conflict with Pagerization, my favourite Greasemonkey script, which turns a website's page-by-page results into an unbroken scrolling list with no annoying "next page" links needing to be clicked in order to proceed. Some of the sites that can be "pagerized" are Google (Search, Image, News, Group, Video), Yahoo, Wikipedia, YouTube, del.icio.us, Twitter, and digg. Sadly, PubMed does not allow itself to be pagerized. For my tenosynovitic forearm's sake I would never give up Pagerization. This extraordinary bit of code is worth the small effort of installing Greasemonkey. I did try Del.icio.us alpha sort with Pagerization disabled; but it still failed to alphabetize more than the first page. Having wasted an hour or two on this futile experiment, I moved on.

Turning to the del.icio.us Help pages

I went to the Developer section of del.icio.us Help. There we are promised the ability to "access data and build cool stuff." I must say, in most cases the del.icio.us help pages truly are helpful. Sure enough, under the HTML section I found something I could work with.

A simple URL with a few added "arguments" creates a reasonably attractive bulleted list in reverse chronological order of your most recent del.icio.us bookmarks. In the following simply replace USERNAME with your del.icio.us identity, and replace XXX with the number of links you want to list:


The count is limited by default to 15. To list more items add the appropriate number after "?count=".

I used these arguments to refine my results:

&extended=body : includes the full descriptive note for each bookmark
&extendeddiv=yes : starts the note on a separate line under the bookmark link &rssbutton=no : removes the default orange RSS button at the bottom of the bookmark list
&bullet=bull : adds a standard round bullet before each bookmark link (I prefer this to the del.icio.us help site's suggestion of "&bullet=raquo", which produces a right-pointing quotation mark or guillemet (»))

Using the above URL you get a result that looks like this:

BioMed Central | Abstract | 1471-2105-8-487 | Userscripts for the Life Sciences
Discusses userscripts that aggregate information from web resources. Examples of enriching web pages from other resources, and how information from web pages can be used to link to, search, and process information in other resources.
2008 Web Predictions - ReadWriteWeb
Like, yeah! "People engaged in the new web will do some really awesome stuff that we'll all be in awe of."
M A B - Mozilla Amazon Browser
Can now be run as a remote application. http://www.faser.net/mab/chrome/content/mab.xul
Partners in a pandemic - www.universityaffairs.ca
May 2005. Medical researchers from Winnipeg helped discover how the AIDS virus is transmitted in Kenya because they were in the right place at the right time.
Top 10 Medical Breakthroughs - 50 Top 10 Lists of 2007 - TIME
University of Manitoba researchers involved in the #1 medical breakthrough. Circumcized men in Africa 50% less likely to be infected by HIV.
MedlinePlus: Health Literacy
"... the ability to understand health information and to use that information to make good decisions about your health and medical care."
Show me the data -- Rossner et al. -- The Journal of Cell Biology 17 Dec 2007
Critique of impact factors. "We hope this account will convince some scientists and funding organizations to revoke their acceptance of impact factors as an accurate representation of the quality—or impact—of a paper published in a given journal."
Web 3.0 and medicine -- Giustini 335 (7633): 1273 -- BMJ
Logically, web 3.0 should bring order to the 21st century web in the same way that Dr John Shaw Billings’s Index Medicus brought order to medical research back in the 19th century.

If you want to a list of bookmarks for a particular tag, then simply add the tag name after your user name:


This hack allowed me to gather all 50 or so nursing school blogs together in an unalphabetized list. Not bad, but still offensive to my instinctive need for this list to be alphabetized.

No help from del.icio.us Export (but good for a complete backup)
Another way to get a list of links is the standard export procedure found in del.icio.us Settings. Unfortunately, you can only create an enormous list of all your bookmarks, and there is no option as to their order, which is reverse chronological. This feature is useful if you ever want a good backup of your bookmarks, however, or if you want to import them back into a browser (heaven forbid). Go to Settings, and click on "export/backup" under the Bookmarks heading. You are given the option of including your tags and notes. del.icio.us will generate an HTML file of all your bookmarks that you can save to your computer. Eminently simple, but frustrating for my purposes.

Finally getting an alphabetical list (but is it worth the trouble?)

The URL for producing a bulleted list was helpful, but del.icio.us provided no method for getting my list in alphabetical order. I remarked above how in our "miscellanized" world alphabetical order is terribly passé. I can only continue to demur. I live by the English alphabet, and could not tolerate the loss of the time-honoured sequence of its 26 letters. I cannot fathom why the del.icio.us designers left out the alpha sort. Of course, I can alphabetize any list manually, including the tag-delimited list that del.icio.us provides. But what if I had hundreds of items, instead of fifty? Surely there is some way out of the tedious omnium-gatherum of just-one-damn-link-after-another.
After considerable effort, here is one method I found to get a set of my del.icio.us bookmarks into a usable alphabetical list. This goes beyond the hobbled and unavailing alphabetical listings for tags allowed by the new del.icio.us Firefox extension.

In del.icio.us Help, under the section for Developers, click on API, then click on posts. This section is marked by a rather ascetic and elitist assumption that the intrepid enquirer who has ventured this far needs no further condescending explanations or instructions. We are sternly informed that del.icio.us APIs are done over https and require HTTP-Auth. Furthermore, "this document and the APIs herein are subject to change at any time." Accepting my subordinate status and having informed myself a bit better about APIs, I persisted and found something I could use.

The first https URL in the posts section allows you to "get" all the bookmarks for a certain tag. (At least I assume it gets all of them. There is another URL that also returns all your bookmarks, but which can be filtered by tag. I used both and the results were the same for my selected tag. However, each URL returns information in a slightly different order. The "get" URL works better for my purposes, which I will explain below.)

Use this URL to create an XML file for your specified tag, adding your chosen tag name after the equals sign:


Type the URL into your browser. After you press Enter you will be prompted to enter your del.icio.us ID and password. The next step is to save the resulting XML file to your hard disk and then import it into Excel. (I used my old home-computer version of Excel XP for this experiment, but any version of Excel will do the job.) Using the Data|Import External Data|Import Data command, I brought the contents of the XML file, correctly separated into columns for each data field, into a spreadsheet.

Once the title of each bookmark is in an Excel cell, it is a simple matter to use Excel's Sort command to alphabetize the list. Delete columns with extraneous material. I chose only three columns to use: bookmark title, URL, and accompanying note.

Highlight the data and copy it to the Clipboard. You can then paste the spreadsheet data into most word processors. In Microsoft Word use Paste Special to copy the data into a formatted table. A little more fussing will be required, but at least the list is alphabetical. The downside is that your bookmark titles are no longer hyperlinked, but have their URL hived off into a separate cell. To get anything like the bulleted list illustrated above, you must now highlight each bookmark title and manually create the link by pasting in the appropriate URL.

That seems like a lot of trouble. In fact, it's hardly worth the effort, and actually makes manual alphabetization of the HTML output start to look attractive. With all my best efforts exhausted, I found I really hadn't come very far. For all its benefits, del.icio.us makes it extremely difficult to create alphabetized link lists. If I have missed a glaringly obvious hack or a deus ex machina solution, will some kind reader let me know? Otherwise, I shall just have to wait until del.icio.us delivers on its promise to include alphabetical sorting in its long awaited new user interface.

Ten enhancements on my del.icio.us wish list

It has been four months since the new version of Delicious (finally dumping the trendy but annoying internal punctuation) was announced. Alphabetical sorting was promised and is eagerly awaited by the faithful. Here is my wish list for ten more improvements to what has become an indispensable tool. Do you have any others to add?

1. Increase the character limit on descriptive notes. They are never quite long enough to hold the text I want to include. I have to waste time trimming and editing. And fix the bugs, please. My cursor occasionally disappears in a note field.

2. Number all pages instead of only including 'earlier' and 'later', which forces us to click through every page. In the alphabetical sort, provide a clickable alphabet selection list for quick navigation.

3. Allow unfettered scrolling through all bookmarks, with no page divisions.

4. On-the-fly bundle creation and allocation. Call them categories instead. We're grown up now.

5. We should be permitted to set a permanent or session-defined defaults. Example, a setting for not sharing as well as sharing our links. Sometimes I want to bookmark a series of daily-use links that would be of no use or interest to anyone else. It's tiresome to be forced to click the do not share checkbox for each. In the same way, we should be able to set permanent or session-only default tags that are applied to each new bookmark.

6. Deleting items should be easier. Allow users to delete with only one click, or with a keyboard shortcut. (In the meantime I am using an excellent Greasemonkey script called del.icio.us delete now.) Highly recommended if you are the type who likes to clean house from time to time.

7. Importing any del.icio.us list into a spreadsheet or word processor should be simplified.

8. Allow users to set desired text, background and highlighting colours.

9. Let's see contextual search implemented in the new version. If I don't specify what I want, del.icio.us should search whatever I am currently looking at — my own bookmarks, bookmarks from my network, or all bookmarks.

10. Reduce the amount of clicking required by introducing a range of keyboard shortcuts for searching, bundle creation, tag renaming, easy deletion, etc.
Clover learnt the whole alphabet, but could not put words together. Boxer could not get beyond the letter D. He would trace out A, B, C, D, in the dust with his great hoof, and then would stand staring at the letters with his ears back, sometimes shaking his forelock, trying with all his might to remember what came next and never succeeding. On several occasions, indeed, he did learn E, F, G, H, but by the time he knew them, it was always discovered that he had forgotten A, B, C, and D. Finally he decided to be content with the first four letters, and used to write them out once or twice every day to refresh his memory. Mollie refused to learn any but the six letters which spelt her own name. She would form these very neatly out of pieces of twig, and would then decorate them with a flower or two and walk round them admiring them. None of the other animals on the farm could get further than the letter A. George Orwell, Animal farm. Chapt. 3. Penguin Books; 1968.