<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1742-5581-2-6</ui>
   <ji>1742-5581</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>Relevance similarity: an alternative means to monitor information retrieval systems</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Dong</snm>
               <fnm>Peng</fnm>
               <insr iid="I1"/>
               <email>cindy_dongpeng@yahoo.com</email>
            </au>
            <au id="A2">
               <snm>Loh</snm>
               <fnm>Marie</fnm>
               <insr iid="I1"/>
               <email>marie_lohcs@yahoo.com</email>
            </au>
            <au id="A3" ca="yes">
               <snm>Mondry</snm>
               <fnm>Adrian</fnm>
               <insr iid="I1"/>
               <email>mondry@hotmail.com</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Medical Statistics and Epidemiology Group, Bioinformatics Institute, BMRC, A*STAR, Singapore</p>
            </ins>
         </insg>
         <source>Biomedical Digital Libraries</source>
         <issn>1742-5581</issn>
         <pubdate>2005</pubdate>
         <volume>2</volume>
         <issue>1</issue>
         <fpage>6</fpage>
         <url>http://www.bio-diglib.com/content/2/1/6</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">16029513</pubid>
               <pubid idtype="doi">10.1186/1742-5581-2-6</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>24</day>
               <month>2</month>
               <year>2005</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>20</day>
               <month>7</month>
               <year>2005</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>20</day>
               <month>7</month>
               <year>2005</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2005</year>
         <collab>Dong et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Relevance assessment is a major problem in the evaluation of information retrieval systems. The work presented here introduces a new parameter, "Relevance Similarity", for the measurement of the variation of relevance assessment. In a situation where individual assessment can be compared with a gold standard, this parameter is used to study the effect of such variation on the performance of a medical information retrieval system. In such a setting, <it>Relevance Similarity </it>is the ratio of assessors who rank a given document same as the gold standard over the total number of assessors in the group.</p>
            </sec>
            <sec>
               <st>
                  <p>Methods</p>
               </st>
               <p>The study was carried out on a collection of Critically Appraised Topics (CATs). Twelve volunteers were divided into two groups of people according to their domain knowledge. They assessed the relevance of retrieved topics obtained by querying a meta-search engine with ten keywords related to medical science. Their assessments were compared to the gold standard assessment, and <it>Relevance Similarities </it>were calculated as the ratio of positive concordance with the gold standard for each topic.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>The similarity comparison among groups showed that a higher degree of agreements exists among evaluators with more subject knowledge. The performance of the retrieval system was not significantly different as a result of the variations in relevance assessment in this particular query set.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>In assessment situations where evaluators can be compared to a gold standard, <it>Relevance Similarity </it>provides an alternative evaluation technique to the commonly used kappa scores, which may give paradoxically low scores in highly biased situations such as document repositories containing large quantities of relevant data.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="bmc" subtype="user_supplied_xml" id="endnote"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>The advent of the Internet has changed the way both professionals and consumers look for health information <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Abbott <abbrgrp><abbr bid="B2">2</abbr></abbrgrp> found that the existing general public search engines have a high penetration into even restricted-access data repositories, yielding quality information alternative to traditional primary sources. Recently, Google has launched a beta-version of its <it>Google Scholar </it>search engine, Nature Publishing Group has changed its search engine to allow deep penetration, and Elsevier has created another specialised search engine for scientific literature, <it>Scopus</it>, which comes with a cost <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. All of these widen the general public's access to high-quality health information. But Peterson <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> showed that the generally low skill level for search strategies that most customers have could lead to retrieval of inadequate information, which raises anxiety and decreases compliance. In response to this, Curro <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> has suggested a simple methodology to assess the quality of medical information retrieved on the Internet, but the impact of this strategy remains to be seen. In the meantime, the medical professional is certainly better advised to look for information that has appraised content. Such sources include online repositories of Critically Appraised Topics (CATs). CATs are short summaries of current medical literature addressing specific clinical questions and are frequently used by clinicians who try to implement principles of Evidence Based Medicine (EBM) <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. Although some CAT libraries exist, a peer-to-peer sharing network as proposed by Castro <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> is not yet available. CAT Crawler <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>, an online search engine, provides access to a number of public online CAT repositories and is the focus of the present study on retrieval quality.</p>
         <p>Two commonly used evaluation parameters are recall and precision <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. The former measures the comprehensiveness of a search and the latter measures the accuracy of a search. Relevance is the key concept in the calculation of recall and precision but poses problems of multidimensionality and of dynamic quality. Schamber <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> has emphasized that relevance assessment differs between judges and for the same judge at different times or in different environments. Barry <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> and Schamber <abbrgrp><abbr bid="B11">11</abbr></abbrgrp> have studied the factors affecting relevance assessments. Both studies have agreed that relevance assessments depend on evaluators' perceptions of the problem situation and the information environment, and the perceptions encompass many other factors beyond information content when they make the relevance assessment <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. Only a few studies have directly addressed the effect of the variation in relevance assessments on the evaluation of information retrieval systems <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp>. All studies varied relevance assessments with evaluators from different domain knowledge background. All of them concluded that variation in relevance assessments among judges has no significant effect on measures of retrieval effectiveness. However, Harter <abbrgrp><abbr bid="B18">18</abbr></abbrgrp> has questioned this conclusion because none of these studies employs real users who approach the system for information need, although some of them tried to simulate this condition. He also highlighted the need to develop measurement instruments that are sensitive to variations in relevance assessments. A common statistical method used in this context is the kappa score, which, in principle, is a contingency table based method that can eliminate chance concordance from the assessment. However, modern search engines usually have filter systems <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>, which lead to a selection bias towards relevant documents. Feinstein et al <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> observed that in situations with high imbalance, the paradox of high agreement but low kappa scores can arise. Better filters create more bias, thus increasing the tendency to find such paradox results. In such a situation, a performance assessment based on kappa scores may become meaningless.</p>
         <p>The work presented here introduces a new parameter, <it>Relevance Similarity</it>, to address this problem. Based on this measurement parameter, the effect of the inter-evaluator variation of relevance assessment on the evaluation of the information retrieval performance was studied. The experiment was carried out on a collection of CATs. Two groups of evaluators participated in the relevance assessments on a set of retrieved topics from the medical meta-search engine, CAT Crawler.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <p>The retrieval system used in the study is the CAT Crawler meta-search engine. In a very brief summary, CAT Crawler can be described as a one-stop search engine for CATs stored over numerous online repositories. It has its own search engine, which allows the user to do a specific search rather than simply browse the repositories' contents. The CAT Crawler's standard setting has been shown to yield search results of equal quantity and enhanced quality compared to the original search engines available at some of the repositories <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. The detailed structural design of CAT Crawler <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> has been described previously. The workflow of the CAT Crawler's evaluation is summarized in Figure <figr fid="F1">1</figr>.</p>
         <fig id="F1">
            <title>
               <p>Figure 1</p>
            </title>
            <caption>
               <p>Workflow for analysing the effect of the inter-evaluator variation on CAT Crawler information retrieval system</p>
            </caption>
            <text>
               <p>Workflow for analysing the effect of the inter-evaluator variation on CAT Crawler information retrieval system.</p>
            </text>
            <graphic file="1742-5581-2-6-1"/>
         </fig>
         <sec>
            <st>
               <p>Relevance assessment of CATs in the test document set</p>
            </st>
            <p>Ten keywords (Table <tblr tid="T1">1</tblr>) related to medicine were chosen as the test seed and submitted to the search engine. All together 132 CAT links were retrieved and then evaluated for their relevance by 13 people, who were categorized into three groups according to their level of training regarding medical knowledge. Among them, one physician represents medical professionals and is considered as the gold standard for the evaluation, the six evaluators in Group A were trained in biology or medicine, while the six evaluators in Group B had no medical or biological background. For the sake of this exercise, the physician's evaluation of the relevance of each topic was taken as the gold standard or 'true' relevance of each retrieval result.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>CAT Link retrieval details. The numbers indicate how many documents were retrieved by the CAT Crawler meta-search engine.</p>
               </caption>
               <tblbdy cols="2">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Keyword</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Number of retrieved links</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="2">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Appendicitis</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Colic</p>
                     </c>
                     <c ca="center">
                        <p>9</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Intubation</p>
                     </c>
                     <c ca="center">
                        <p>22</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Ketoacidosis</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Octreotide</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Palsy</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Prophylaxis</p>
                     </c>
                     <c ca="center">
                        <p>30</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Sleep</p>
                     </c>
                     <c ca="center">
                        <p>16</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Tape</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Ultrasound</p>
                     </c>
                     <c ca="center">
                        <p>29</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>132</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Computation of <it>Relevance </it>Similarity</p>
            </st>
            <p>For each retrieved CAT, the evaluation by every participant in Group A and B was compared with the gold standard set by the medical professional. The <it>Relevance Similarity </it>is defined as:</p>
            <p>
               <graphic file="1742-5581-2-6-i1.gif"/>
            </p>
            <p><it>Relevance Similarity </it>was computed for each of 132 retrieved links. To compare the relevance assessment between Group A and B, a Chi-square test on the contingency table was carried out on all calculated <it>Relevance Similarity </it>values using the statistics software SPSS 11.0 (SPSS Inc., Chicago, IL, USA). In addition, kappa scores within evaluators of Group A and B were calculated respectively.</p>
         </sec>
         <sec>
            <st>
               <p>Computation of recall and precision</p>
            </st>
            <p>In this study, the retrieval system performance is qualified by recall and precision. CATs containing a particular keyword are defined as "technically relevant" documents for that keyword <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. In the first step, for each keyword, technically relevant documents were identified from the experimental document set and individual recall was computed for every evaluator accordingly. In the next step, the recall was averaged over all evaluators in a single group. Finally, the recall was averaged over the ten keyword queries. Following a similar process, the average precision was calculated.</p>
         </sec>
         <sec>
            <st>
               <p>Computation of kappa score</p>
            </st>
            <p>To ensure the qualification of the physician as a gold standard, he re-evaluated the same document set a year after the initial assessment. A kappa score, observed agreement, positive and negative specific agreements between the two evaluations were calculated <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr></abbrgrp>. The inter-evaluator kappa scores within each group were computed for comparison.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Analysis of the inter-evaluator variation</p>
            </st>
            <p>For each of the 132 retrieved links, <it>Relevance Similarity </it>was calculated for both Group A and B (Table <tblr tid="T2">2</tblr>). For instance, one CAT "<it>Plain Abdominal Radiographs of No Clinical Utility in Clinically Suspected Appendicitis</it>" was retrieved from <url>http://www.med.umich.edu/pediatrics/ebm/cats/radiographs.htm</url> upon querying the meta-search engine with the keyword <it>Appendicitis</it>. The gold standard rated it as relevant; all six evaluators in Group A rated it as relevant too; whereas, one out of six evaluators in Group B rated it as irrelevant. The corresponding similarity for this particular CAT is computed as:</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Relevance Similarity for 132 retrieved CAT links. For each of the 132 documents retrieved by the CAT Crawler meta-search engine, Relevance Similarity (in %) was calculated for both Group A and B. <it>Link S/N </it>attribute is the serial number to each document.</p>
               </caption>
               <tblbdy cols="9">
                  <r>
                     <c ca="center">
                        <p>
                           <b>Link S/N</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Group A (%)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Group B (%)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Link S/N</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Group A (%)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Group B (%)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Link S/N</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Group A (%)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Group B (%)</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="9">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>1</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                     <c ca="right">
                        <p>45</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>89</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                     <c ca="right">
                        <p>33.33</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>2</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                     <c ca="right">
                        <p>46</p>
                     </c>
                     <c ca="right">
                        <p>50</p>
                     </c>
                     <c ca="right">
                        <p>50</p>
                     </c>
                     <c ca="right">
                        <p>90</p>
                     </c>
                     <c ca="right">
                        <p>50</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>3</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>47</p>
                     </c>
                     <c ca="right">
                        <p>50</p>
                     </c>
                     <c ca="right">
                        <p>33.33</p>
                     </c>
                     <c ca="right">
                        <p>91</p>
                     </c>
                     <c ca="right">
                        <p>50</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>4</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>48</p>
                     </c>
                     <c ca="right">
                        <p>50</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                     <c ca="right">
                        <p>92</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>5</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>49</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>93</p>
                     </c>
                     <c ca="right">
                        <p>33.33</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>6</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>50</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>94</p>
                     </c>
                     <c ca="right">
                        <p>50</p>
                     </c>
                     <c ca="right">
                        <p>50</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>7</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>51</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>95</p>
                     </c>
                     <c ca="right">
                        <p>33.33</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>8</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>52</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                     <c ca="right">
                        <p>50</p>
                     </c>
                     <c ca="right">
                        <p>96</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>9</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>53</p>
                     </c>
                     <c ca="right">
                        <p>33.33</p>
                     </c>
                     <c ca="right">
                        <p>16.67</p>
                     </c>
                     <c ca="right">
                        <p>97</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>10</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>54</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>98</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>11</p>
                     </c>
                     <c ca="right">
                        <p>0</p>
                     </c>
                     <c ca="right">
                        <p>0</p>
                     </c>
                     <c ca="right">
                        <p>55</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>99</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>12</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>56</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>13</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>57</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>101</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>14</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>58</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>102</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>15</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>59</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>103</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>16</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>60</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>104</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>17</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                     <c ca="right">
                        <p>61</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                     <c ca="right">
                        <p>33.33</p>
                     </c>
                     <c ca="right">
                        <p>105</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>18</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>62</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                     <c ca="right">
                        <p>33.33</p>
                     </c>
                     <c ca="right">
                        <p>106</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>19</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                     <c ca="right">
                        <p>63</p>
                     </c>
                     <c ca="right">
                        <p>16.67</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                     <c ca="right">
                        <p>107</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>20</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                     <c ca="right">
                        <p>50</p>
                     </c>
                     <c ca="right">
                        <p>64</p>
                     </c>
                     <c ca="right">
                        <p>50</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                     <c ca="right">
                        <p>108</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>21</p>
                     </c>
                     <c ca="right">
                        <p>50</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                     <c ca="right">
                        <p>65</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>109</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>22</p>
                     </c>
                     <c ca="right">
                        <p>33.33</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                     <c ca="right">
                        <p>66</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                     <c ca="right">
                        <p>110</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>23</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>67</p>
                     </c>
                     <c ca="right">
                        <p>0</p>
                     </c>
                     <c ca="right">
                        <p>33.33</p>
                     </c>
                     <c ca="right">
                        <p>111</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>24</p>
                     </c>
                     <c ca="right">
                        <p>50</p>
                     </c>
                     <c ca="right">
                        <p>50</p>
                     </c>
                     <c ca="right">
                        <p>68</p>
                     </c>
                     <c ca="right">
                        <p>50</p>
                     </c>
                     <c ca="right">
                        <p>50</p>
                     </c>
                     <c ca="right">
                        <p>112</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>25</p>
                     </c>
                     <c ca="right">
                        <p>50</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                     <c ca="right">
                        <p>69</p>
                     </c>
                     <c ca="right">
                        <p>0</p>
                     </c>
                     <c ca="right">
                        <p>16.67</p>
                     </c>
                     <c ca="right">
                        <p>113</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                     <c ca="right">
                        <p>33.33</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>26</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                     <c ca="right">
                        <p>50</p>
                     </c>
                     <c ca="right">
                        <p>70</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                     <c ca="right">
                        <p>50</p>
                     </c>
                     <c ca="right">
                        <p>114</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>27</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>71</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                     <c ca="right">
                        <p>115</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>28</p>
                     </c>
                     <c ca="right">
                        <p>50</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                     <c ca="right">
                        <p>72</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                     <c ca="right">
                        <p>116</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>29</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>73</p>
                     </c>
                     <c ca="right">
                        <p>50</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                     <c ca="right">
                        <p>117</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>30</p>
                     </c>
                     <c ca="right">
                        <p>50</p>
                     </c>
                     <c ca="right">
                        <p>50</p>
                     </c>
                     <c ca="right">
                        <p>74</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                     <c ca="right">
                        <p>118</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>31</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>75</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                     <c ca="right">
                        <p>119</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>32</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                     <c ca="right">
                        <p>76</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>120</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>33</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                     <c ca="right">
                        <p>77</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                     <c ca="right">
                        <p>121</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>34</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                     <c ca="right">
                        <p>78</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                     <c ca="right">
                        <p>50</p>
                     </c>
                     <c ca="right">
                        <p>122</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>35</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>79</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                     <c ca="right">
                        <p>123</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>36</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>80</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>124</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>37</p>
                     </c>
                     <c ca="right">
                        <p>33.33</p>
                     </c>
                     <c ca="right">
                        <p>16.67</p>
                     </c>
                     <c ca="right">
                        <p>81</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                     <c ca="right">
                        <p>125</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>38</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                     <c ca="right">
                        <p>82</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                     <c ca="right">
                        <p>126</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>39</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                     <c ca="right">
                        <p>50</p>
                     </c>
                     <c ca="right">
                        <p>83</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                     <c ca="right">
                        <p>33.33</p>
                     </c>
                     <c ca="right">
                        <p>127</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>40</p>
                     </c>
                     <c ca="right">
                        <p>50</p>
                     </c>
                     <c ca="right">
                        <p>50</p>
                     </c>
                     <c ca="right">
                        <p>84</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                     <c ca="right">
                        <p>128</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                     <c ca="right">
                        <p>33.33</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>41</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>85</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>129</p>
                     </c>
                     <c ca="right">
                        <p>33.33</p>
                     </c>
                     <c ca="right">
                        <p>50</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>42</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>86</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>130</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>43</p>
                     </c>
                     <c ca="right">
                        <p>16.67</p>
                     </c>
                     <c ca="right">
                        <p>50</p>
                     </c>
                     <c ca="right">
                        <p>87</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>131</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                     <c ca="right">
                        <p>66.67</p>
                     </c>
                  </r>
                  <r>
                     <c ca="right">
                        <p>44</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                     <c ca="right">
                        <p>88</p>
                     </c>
                     <c ca="right">
                        <p>33.33</p>
                     </c>
                     <c ca="right">
                        <p>50</p>
                     </c>
                     <c ca="right">
                        <p>132</p>
                     </c>
                     <c ca="right">
                        <p>83.33</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>
               <graphic file="1742-5581-2-6-i2.gif"/>
            </p>
            <p>Figure <figr fid="F2">2</figr> shows the frequency analysis of <it>Relevance Similarity </it>for every retrieved CAT. Both Group A and B have evaluated around 90% of retrieved CATs with more than 50% similarity to the gold standard. The gold standard and the other two groups have made exactly the same relevance assessment on about half of the retrieved CATs. As shown in the last two columns of Figure <figr fid="F2">2</figr>, participators in Group A have evaluated 65 CATs (49%) with the same relevance as the gold standard; those in Group B have evaluated 59 CATs (45%) with the same relevance as the gold standard. The Chi-square test performed using SPSS between these two categories resulted in a <it>p-value </it>of 0.713.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Frequency analysis of evaluation similarity of Group A and B versus the gold standard for all 132 CATs</p>
               </caption>
               <text>
                  <p>Frequency analysis of evaluation similarity of Group A and B versus the gold standard for all 132 CATs. Compared to the gold standard, the blue bar indicates the number of CATs evaluated by Group A at a different similarity level; the red bar indicates the number of CATs evaluated by Group B at a different similarity level.</p>
               </text>
               <graphic file="1742-5581-2-6-2"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Evaluation of the retrieval system</p>
            </st>
            <p>Average recall and precision was computed for each keyword query and all numerical data are listed in Tables <tblr tid="T4">4</tblr> and <tblr tid="T5">5</tblr> respectively, while Figure <figr fid="F3">3</figr> and <figr fid="F4">4</figr> provide a more intuitive view of the recall and precision evaluation of retrieval.</p>
            <tbl id="T4">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>Average recall for the gold standard and the two groups of evaluators</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Gold Standard</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Group A</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Group B</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Appendicitis</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>100.00</p>
                     </c>
                     <c ca="center">
                        <p>97.92</p>
                     </c>
                     <c ca="center">
                        <p>93.75</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Colic</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>53.33</p>
                     </c>
                     <c ca="center">
                        <p>58.89</p>
                     </c>
                     <c ca="center">
                        <p>58.89</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Intubation</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>37.84</p>
                     </c>
                     <c ca="center">
                        <p>41.44</p>
                     </c>
                     <c ca="center">
                        <p>40.09</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Ketoacidosis</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>33.33</p>
                     </c>
                     <c ca="center">
                        <p>50.00</p>
                     </c>
                     <c ca="center">
                        <p>50.00</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Octreotide</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>75.00</p>
                     </c>
                     <c ca="center">
                        <p>54.17</p>
                     </c>
                     <c ca="center">
                        <p>62.50</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Palsy</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>54.55</p>
                     </c>
                     <c ca="center">
                        <p>65.15</p>
                     </c>
                     <c ca="center">
                        <p>65.15</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Prophylaxis</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>64.86</p>
                     </c>
                     <c ca="center">
                        <p>69.82</p>
                     </c>
                     <c ca="center">
                        <p>56.76</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Sleep</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>43.75</p>
                     </c>
                     <c ca="center">
                        <p>59.38</p>
                     </c>
                     <c ca="center">
                        <p>51.04</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Tape</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>50.00</p>
                     </c>
                     <c ca="center">
                        <p>44.44</p>
                     </c>
                     <c ca="center">
                        <p>47.22</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Ultrasound</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>36.17</p>
                     </c>
                     <c ca="center">
                        <p>38.30</p>
                     </c>
                     <c ca="center">
                        <p>39.36</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Average</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>54.88</p>
                     </c>
                     <c ca="center">
                        <p>57.95</p>
                     </c>
                     <c ca="center">
                        <p>56.48</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <tbl id="T5">
               <title>
                  <p>Table 5</p>
               </title>
               <caption>
                  <p>Average precision for the gold standard and the two groups of evaluators</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Gold Standard</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Group A</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Group B</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Appendicitis</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>100.00</p>
                     </c>
                     <c ca="center">
                        <p>97.92</p>
                     </c>
                     <c ca="center">
                        <p>93.75</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Colic</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>88.89</p>
                     </c>
                     <c ca="center">
                        <p>98.15</p>
                     </c>
                     <c ca="center">
                        <p>98.15</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Intubation</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>63.64</p>
                     </c>
                     <c ca="center">
                        <p>69.70</p>
                     </c>
                     <c ca="center">
                        <p>67.42</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Ketoacidosis</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>50.00</p>
                     </c>
                     <c ca="center">
                        <p>75.00</p>
                     </c>
                     <c ca="center">
                        <p>75.00</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Octreotide</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>100.00</p>
                     </c>
                     <c ca="center">
                        <p>72.22</p>
                     </c>
                     <c ca="center">
                        <p>83.33</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Palsy</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>60.00</p>
                     </c>
                     <c ca="center">
                        <p>71.67</p>
                     </c>
                     <c ca="center">
                        <p>71.67</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Prophylaxis</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>80.00</p>
                     </c>
                     <c ca="center">
                        <p>86.11</p>
                     </c>
                     <c ca="center">
                        <p>70.00</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Sleep</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>43.75</p>
                     </c>
                     <c ca="center">
                        <p>59.38</p>
                     </c>
                     <c ca="center">
                        <p>51.04</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Tape</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>100.00</p>
                     </c>
                     <c ca="center">
                        <p>88.89</p>
                     </c>
                     <c ca="center">
                        <p>94.44</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Ultrasound</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>58.62</p>
                     </c>
                     <c ca="center">
                        <p>62.07</p>
                     </c>
                     <c ca="center">
                        <p>63.79</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Average</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>74.49</p>
                     </c>
                     <c ca="center">
                        <p>78.11</p>
                     </c>
                     <c ca="center">
                        <p>76.86</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Recall comparison</p>
               </caption>
               <text>
                  <p>Recall comparison. The bars indicate each of the three groups' recall (in %) for the ten keywords.</p>
               </text>
               <graphic file="1742-5581-2-6-3"/>
            </fig>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Precision comparison</p>
               </caption>
               <text>
                  <p>Precision comparison. The bars indicate each of the three groups' precision (in %) for the ten keywords.</p>
               </text>
               <graphic file="1742-5581-2-6-4"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Kappa scores</p>
            </st>
            <p>The two evaluations of the document set carried out by the physician who served as the "gold standard" have a high concordance with a kappa score of 0.879. The inter-evaluator kappa scores ranged from 0.136 to 0.713 (0.387 &#177; 0.165) within Group A, and from -0.001 to 0.807 within Group B (0.357 &#177; 0.218) (Table <tblr tid="T3">3</tblr>).</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Kappa scores within Group A and Group B, de monstrating the paradoxically low kappa scores despite high agreement.</p>
               </caption>
               <tblbdy cols="11">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="5" ca="center">
                        <p>
                           <b>Group A</b>
                        </p>
                     </c>
                     <c cspan="5" ca="center">
                        <p>
                           <b>Group B</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="11">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Evaluator</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="11">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>0.404</p>
                     </c>
                     <c ca="center">
                        <p>0.426</p>
                     </c>
                     <c ca="center">
                        <p>0.136</p>
                     </c>
                     <c ca="center">
                        <p>0.258</p>
                     </c>
                     <c ca="center">
                        <p>0.656</p>
                     </c>
                     <c ca="center">
                        <p>0.208</p>
                     </c>
                     <c ca="center">
                        <p>0.670</p>
                     </c>
                     <c ca="center">
                        <p>0.410</p>
                     </c>
                     <c ca="center">
                        <p>0.807</p>
                     </c>
                     <c ca="center">
                        <p>0.352</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>0.461</p>
                     </c>
                     <c ca="center">
                        <p>0.259</p>
                     </c>
                     <c ca="center">
                        <p>0.713</p>
                     </c>
                     <c ca="center">
                        <p>0.520</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>0.257</p>
                     </c>
                     <c ca="center">
                        <p>0.135</p>
                     </c>
                     <c ca="center">
                        <p>0.125</p>
                     </c>
                     <c ca="center">
                        <p>-0.001</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>0.180</p>
                     </c>
                     <c ca="center">
                        <p>0.438</p>
                     </c>
                     <c ca="center">
                        <p>0.439</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>0.440</p>
                     </c>
                     <c ca="center">
                        <p>0.643</p>
                     </c>
                     <c ca="center">
                        <p>0.353</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>0.241</p>
                     </c>
                     <c ca="center">
                        <p>0.270</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>0.370</p>
                     </c>
                     <c ca="center">
                        <p>0.250</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>0.404</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>0.330</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>Recall and precision remain standard evaluation parameters for the effectiveness evaluation of an information retrieval system. Both depend on the concept of relevance, i.e. the answer to the question whether the retrieved information is useful or not. A major problem lies in the fact that this answer may vary depending on multiple factors <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>. The perception of variance tempts one to assume that it must influence the assessment of retrieval efficiency, yet the small number of studies addressing this problem <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp>, including the one presented here, come to a different conclusion. This conclusion has been challenged <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>, and the need to find measurement criteria for variance impact was recognized.</p>
         <p>Three decades ago, Saracevic <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> has suggested to conduct more experiments on various systems with differently obtained assessments in the research of relevance variation. In contrast to previous studies, the present one investigates the effect of relevance assessment on the performance of a specialized retrieval system, developed specifically for physicians trying to implement EBM into daily routine. The test collection is a set holding around 1000 CATs. The variance of evaluator behavior is directly addressed by measuring <it>Relevance Similarity</it>. The concept of <it>Relevance Similarity </it>is strongly dependent on the knowledge of "true relevance".</p>
         <p>It may be impossible to establish the true relevance of a given document. Whoever assesses a document may make an error. As soon as the document is assessed by another, the relevance may be attributed differently. For this reason, the "true relevance" is usually decided by expert committees, e.g. a group of specialists. Documents they assess in unison are assumed to be truly relevant or truly irrelevant; documents with variations in the assessment are either judged according to the majority's decision or following a brief decision rule.</p>
         <p>In the present study, this problem was solved differently. According to the domain knowledge disparity between the evaluators, they could be categorized as: one medical professional, six life scientists and six IT scientists. From the training point of view, the physician is most closely related to the medical field and his judgement was therefore used as the gold standard or "true relevance". While one may (or may not) doubt his qualification to assign true relevance, his re-assessment of the same document set one year after his initial evaluation shows a good correlation. Using kappa statistics, a kappa score of 0.879 indicated an "excellent" concordance <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>.</p>
         <p>Kappa statistics are a standard measure of inter-evaluator agreement. In the present study, kappa scores for Group A evaluators ranged from 0.136 to 0.713, and from -0.001 to 0.807 for Group B (Table <tblr tid="T3">3</tblr>). Kappa statistics are based on the assumption that a "true" value is not known beforehand, and that a higher level of concordance signifies a higher probability to have a formed "truth". However, in situations where there is a strong bias towards either true or false positive, or true or false negative, high concordance can yield a low kappa score <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. Positive and negative agreements have been suggested as an additional quality measurement in such cases. In the present study, we calculated positive and negative agreements <abbrgrp><abbr bid="B25">25</abbr></abbrgrp> (P<sub>pos</sub>: 0.74&#8211;0.93; P<sub>neg</sub>: 0.15&#8211;0.82), but this does not give any additional information to that derived from kappa scores. While the calculation of kappa score does have its value, albeit not undisputed <abbrgrp><abbr bid="B19">19</abbr><abbr bid="B25">25</abbr></abbrgrp>, to rely on this calculation misses a philosophical point: human evaluators may assess as true or false a statement that is not so for reasons that depend on external factors ("philosophies of life", political, theological etc.) and err with high concordance because they have concordance on the external factors. By assessing the documents using a gold standard considered to stand for the "true relevance", the method of <it>Relevance Similarity </it>overcomes this problem. Internal concordance of the gold standard evaluator is demonstrated by his excellent kappa score, and his study subject of medicine as opposed to life sciences/computer sciences qualifies him for this position.</p>
         <p>With the physician as the gold standard, the <it>Relevance Similarity </it>for Groups A and B was computed for the analysis of these groups' agreement with the gold standard (Figure <figr fid="F2">2</figr>). For a high similarity level, Group A has more agreements with the gold standard than Group B. For example, for a relevance similarity level of 83.33%, Group A and the gold standard have evaluated 24 CATs with the same relevance. By comparison, Group B and the gold standard have an agreement over 21 CATs only. The same phenomenon occurs at a relevance similarity level of 100%. As the gold standard and Group A represent people with professional or some relevant medical domain knowledge, the result is consistent with what has been reported by Cuadra and Katter <abbrgrp><abbr bid="B26">26</abbr></abbrgrp> and Rees and Schultz <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> that the agreement among evaluators with more subject knowledge is higher. On the other hand, a <it>p-value </it>of 0.713 shows there is no significant difference between the mean relevance assessment of Group A and B as compared to the gold standard.</p>
         <p>Since the time of the Cranfield experiment <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>, researchers have been aware of the difficulty of calculating the exact recall as this requires the true knowledge of the total number of relevant documents in the entire database. Even in the relatively small document repository used here that consists of around 1000 CATs in total, a visual control of all documents is unlikely to produce a reliable result in finding all files that contain the keywords, i.e. "technically relevant" documents. Using PERL scripts as described previously <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>, this task is achieved reliably. The recall is computed accordingly.</p>
         <p>The average recall and precision over all queries (Table <tblr tid="T4">4</tblr> and <tblr tid="T5">5</tblr>) show that people with different domain knowledge have evaluated the retrieval system similarly. This supports the hypothesis of Lesk and Salton <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> that variations in relevance assessments do not cause substantial variations in retrieval performance. Their explanation is based on the fact that average recall and precision is obtained by averaging over many search requests. Concurring with this explanation, the average recall and precision for each keyword query in the present study (Table <tblr tid="T4">4</tblr>,<tblr tid="T5">5</tblr> and Figure <figr fid="F3">3</figr>,<figr fid="F4">4</figr>) does vary between the gold standard, Group A and Group B in response to variations in relevance assessments for each keyword by different evaluators.</p>
         <p>In this study, documents are judged for binary relevance, i.e. either relevant or irrelevant. Kek&#228;l&#228;inen and J&#228;rvelin <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> have highlighted the multilevel phenomenon of relevance. The binary evaluation technique used in many studies is not able to represent the degree of relevance and hence leads to the difficulty of ranking a set of relevant documents. Recognizing the problem, many studies on information seeking and retrieval used multi-degree relevance assessments <abbrgrp><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr></abbrgrp>. It would be worthwhile to consider the effect of multi-level relevance rating scales on the performance evaluation of the retrieval system.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>The present study directly addresses the question whether variability of relevance assessment has an impact on the evaluation of efficiency of a given information retrieval system. In the present setting, using a highly specialized search program exclusively targeting Critical Appraised Topics <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>, the answer to that question is a clear "no" &#8211; the effectiveness of the CAT Crawler can be evaluated in an objective way.</p>
         <p>To what extent the subject knowledge of the end-user influences his perception of relevance of the retrieved information is certainly important from an economic view, as it will have an impact on his usage patterns of information retrieval systems.</p>
         <p>The results presented here demonstrate, however, that a safe evaluation of the retrieval quality of a given information retrieval system is indeed possible. While this does not allow for a qualitative control of the information contents on the plethora of websites dedicated to medical knowledge (or, in some cases, ignorance), the good news is that at least the technical quality of medical search engines can be evaluated.</p>
      </sec>
      <sec>
         <st>
            <p>Competing interests</p>
         </st>
         <p>The author(s) declare that they have no competing interests.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>Author 1 (PD) participated in the design of the study, performed data analysis and drafted the manuscript. Author 2 (ML) has contributed on the statistical analysis of raw data. Author 3 (AM) participated in the design of the study and the drafting of the manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>The authors would like to thank the staff and students of the Bioinformatics Institute who have volunteered to evaluate the performance of the medical meta-search engine. We are also grateful for the help given by A. Ramasamy from University of Oxford, UK and A. L. Zhu from National University of Singapore on the statistical analysis of computed data. The manuscript was revised with the help of Dr. F. Tang and F. Mondry.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>How do consumers search for and appraise information on medicines on the Internet? A qualitative study using focus groups</p>
            </title>
            <aug>
               <au>
                  <snm>Peterson</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Aslani</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Williams</snm>
                  <fnm>KA</fnm>
               </au>
            </aug>
            <source>J Med Internet Res</source>
            <pubdate>2003</pubdate>
            <volume>5</volume>
            <fpage>e33</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.2196/jmir.5.4.e33</pubid>
                  <pubid idtype="pmpid" link="fulltext">14713661</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>International use of an academic nephrology World Wide Web site: from medical information resource to business tool</p>
            </title>
            <aug>
               <au>
                  <snm>Abbott</snm>
                  <fnm>KC</fnm>
               </au>
               <au>
                  <snm>Oliver</snm>
                  <fnm>DK</fnm>
               </au>
               <au>
                  <snm>Boal</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Gadiyak</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Boocks</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Yuan</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Welch</snm>
                  <fnm>PG</fnm>
               </au>
               <au>
                  <snm>Poropatich</snm>
                  <fnm>RK</fnm>
               </au>
            </aug>
            <source>Mil Med</source>
            <pubdate>2002</pubdate>
            <volume>167</volume>
            <fpage>326</fpage>
            <lpage>330</lpage>
            <xrefbib>
               <pubid idtype="pmpid">11977886</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>The ultimate search engine?</p>
            </title>
            <source>Nat Cell Biol</source>
            <pubdate>2005</pubdate>
            <volume>7</volume>
            <fpage>1</fpage>
         </bibl>
         <bibl id="B4">
            <title>
               <p>A quality evaluation methodology of health web-pages for non-professionals</p>
            </title>
            <aug>
               <au>
                  <snm>Curro</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Buonuomo</snm>
                  <fnm>PS</fnm>
               </au>
               <au>
                  <snm>Onesimo</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>de Rose</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Vituzzi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>di Tanna</snm>
                  <fnm>GL</fnm>
               </au>
               <au>
                  <snm>D'Atri</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Med Inform Internet Med</source>
            <pubdate>2004</pubdate>
            <volume>29</volume>
            <fpage>95</fpage>
            <lpage>107</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1080/14639230410001684396</pubid>
                  <pubid idtype="pmpid" link="fulltext">15370990</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Evidence-Based Medicine: How to practice and teach EBM</p>
            </title>
            <aug>
               <au>
                  <snm>Sackett</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>Straus</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Richardson</snm>
                  <fnm>WS</fnm>
               </au>
               <au>
                  <snm>Rosenberg</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Haynes</snm>
                  <fnm>RB</fnm>
               </au>
            </aug>
            <publisher> London, Churchill Livingstone</publisher>
            <pubdate>2000</pubdate>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Critically Appraised Topics (CAT) peer-to-peer network</p>
            </title>
            <aug>
               <au>
                  <snm>Castro</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wolf</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Karras</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Tolentino</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Marcelo</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Maramba</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>AMIA Annu Symp Proc</source>
            <pubdate>2003</pubdate>
            <fpage>806</fpage>
            <xrefbib>
               <pubid idtype="pmpid">14728311</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Enhanced quality and quantity of retrieval of Critically Appraised Topics using the CAT Crawler</p>
            </title>
            <aug>
               <au>
                  <snm>Dong</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Mondry</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Med Inform Internet Med</source>
            <pubdate>2004</pubdate>
            <volume>29</volume>
            <fpage>43</fpage>
            <lpage>55</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1080/14639230310001655849</pubid>
                  <pubid idtype="pmpid" link="fulltext">15204609</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Information-Retrieval Systems</p>
            </title>
            <aug>
               <au>
                  <snm>Hersh</snm>
                  <fnm>WR</fnm>
               </au>
               <au>
                  <snm>Detmer</snm>
                  <fnm>WM</fnm>
               </au>
               <au>
                  <snm>Frisse</snm>
                  <fnm>ME</fnm>
               </au>
            </aug>
            <source>Medical Informatics</source>
            <publisher>New York, Springer</publisher>
            <editor>H SE and E PL</editor>
            <pubdate>2001</pubdate>
            <fpage>539</fpage>
            <lpage>572</lpage>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Relevance and information behavior</p>
            </title>
            <aug>
               <au>
                  <snm>Schamber</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Annual Review of Information Science and Technology</source>
            <publisher>Medford, NJ, Learned Information</publisher>
            <editor>Williams ME</editor>
            <pubdate>1994</pubdate>
            <volume>29</volume>
            <fpage>33</fpage>
            <lpage>48</lpage>
         </bibl>
         <bibl id="B10">
            <title>
               <p>User-defined relevance criteria: an exploratory study</p>
            </title>
            <aug>
               <au>
                  <snm>Barry</snm>
                  <fnm>CL</fnm>
               </au>
            </aug>
            <source>Journal of the American Society for Information Science</source>
            <pubdate>1994</pubdate>
            <volume>45</volume>
            <fpage>149</fpage>
            <lpage>159</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1002/(SICI)1097-4571(199404)45:3&lt;149::AID-ASI5>3.0.CO;2-J</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Users' criteria for evaluation in a multimedia environment</p>
            </title>
            <aug>
               <au>
                  <snm>Schamber</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <publisher>Medford, NJ:Learned Information</publisher>
            <editor>Griffiths JM</editor>
            <pubdate>1991</pubdate>
            <volume>28</volume>
            <fpage>126</fpage>
            <lpage>133</lpage>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Users' criteria for relevance evaluation: a cross-situational comparison</p>
            </title>
            <aug>
               <au>
                  <snm>Barry</snm>
                  <fnm>CL</fnm>
               </au>
               <au>
                  <snm>Schamber</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Information Processing &amp; Management</source>
            <pubdate>1998</pubdate>
            <volume>34</volume>
            <fpage>219</fpage>
            <lpage>236</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/S0306-4573(97)00078-2</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Relevance assessments and retrieval system evaluation</p>
            </title>
            <aug>
               <au>
                  <snm>Lesk</snm>
                  <fnm>ME</fnm>
               </au>
               <au>
                  <snm>Salton</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Information Storage and Retrieval</source>
            <pubdate>1968</pubdate>
            <volume>4</volume>
            <fpage>343</fpage>
            <lpage>359</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/0020-0271(68)90029-6</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>The effect of variations in relevance assessments in comparative experimental tests of index languages</p>
            </title>
            <aug>
               <au>
                  <snm>Cleverdon</snm>
                  <fnm>CW</fnm>
               </au>
            </aug>
            <publisher>Cranfield, UK, Cranfield Institute of Technology</publisher>
            <pubdate>1970</pubdate>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Effects of subjective expert evaluation of relevance on the performance parameters of a document-based information-retrieval system</p>
            </title>
            <aug>
               <au>
                  <snm>Kazhdan</snm>
                  <fnm>TV</fnm>
               </au>
            </aug>
            <source>Nauchno-Tekhnicheskaya Informatsiya</source>
            <pubdate>1979</pubdate>
            <volume>2</volume>
            <fpage>21</fpage>
            <lpage>24</lpage>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Variations in relevance judgments and evaluation of retrieval performance</p>
            </title>
            <aug>
               <au>
                  <snm>Burgin</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Information Processing &amp; Management</source>
            <pubdate>1992</pubdate>
            <volume>28</volume>
            <fpage>619</fpage>
            <lpage>627</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/0306-4573(92)90031-T</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Variations in relevance judgments and the measurement of retrieval effectives</p>
            </title>
            <aug>
               <au>
                  <snm>Voorhees</snm>
                  <fnm>EM</fnm>
               </au>
            </aug>
            <source>Information Processing &amp; Management</source>
            <pubdate>2000</pubdate>
            <volume>36</volume>
            <fpage>697</fpage>
            <lpage>716</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/S0306-4573(00)00010-8</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Variations in relevance assessments and the measurement of retrieval effectiveness</p>
            </title>
            <aug>
               <au>
                  <snm>Harter</snm>
                  <fnm>SP</fnm>
               </au>
            </aug>
            <source>Journal of the American Society for Information Science</source>
            <pubdate>1996</pubdate>
            <volume>47</volume>
            <fpage>37</fpage>
            <lpage>49</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1002/(SICI)1097-4571(199601)47:1&lt;37::AID-ASI4>3.0.CO;2-3</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>High agreement but low kappa: I. The problems of two paradoxes</p>
            </title>
            <aug>
               <au>
                  <snm>Feinstein</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Cicchetti</snm>
                  <fnm>DV</fnm>
               </au>
            </aug>
            <source>J Clin Epidemiol</source>
            <pubdate>1990</pubdate>
            <volume>43</volume>
            <fpage>543</fpage>
            <lpage>549</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0895-4356(90)90158-L</pubid>
                  <pubid idtype="pmpid">2348207</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Quantitative evaluation of recall and precision of CAT Crawler, a search engine specialized on retrieval of Critically Appraised Topics</p>
            </title>
            <aug>
               <au>
                  <snm>Dong</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Wong</snm>
                  <fnm>LL</fnm>
               </au>
               <au>
                  <snm>Ng</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Loh</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Mondry</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>BMC Medical Informatics and Decision Making</source>
            <pubdate>2004</pubdate>
            <volume>4</volume>
            <fpage>21</fpage>
            <xrefbib>
               <pubid idtype="doi">10.1186/1472-6947-4-21</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>A coefficient of agreement for nominal scales</p>
            </title>
            <aug>
               <au>
                  <snm>Cohen</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Educ Psychol Meas</source>
            <pubdate>1960</pubdate>
            <volume>20</volume>
            <fpage>37</fpage>
            <lpage>46</lpage>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Measuring agreement between two judges on the presence or absence of a trait</p>
            </title>
            <aug>
               <au>
                  <snm>Fleiss</snm>
                  <fnm>JL</fnm>
               </au>
            </aug>
            <source>Biometrics</source>
            <pubdate>1975</pubdate>
            <volume>31</volume>
            <fpage>651</fpage>
            <lpage>659</lpage>
            <xrefbib>
               <pubid idtype="pmpid">1174623</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>The concept of relevance in information science: a historical view</p>
            </title>
            <aug>
               <au>
                  <snm>Saracevic</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Introduction to information science</source>
            <publisher>New York, R.R.Bowker</publisher>
            <editor>Saracevic T</editor>
            <pubdate>1970</pubdate>
            <fpage>111</fpage>
            <lpage>151</lpage>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Measuring agreement in medical informatics reliability studies</p>
            </title>
            <aug>
               <au>
                  <snm>Hripcsak</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Heitjan</snm>
                  <fnm>DF</fnm>
               </au>
            </aug>
            <source>J Biomed Inform</source>
            <pubdate>2002</pubdate>
            <volume>35</volume>
            <fpage>99</fpage>
            <lpage>110</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S1532-0464(02)00500-2</pubid>
                  <pubid idtype="pmpid">12474424</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>High agreement but low kappa: II. Resolving the paradoxes</p>
            </title>
            <aug>
               <au>
                  <snm>Cicchetti</snm>
                  <fnm>DV</fnm>
               </au>
               <au>
                  <snm>Feinstein</snm>
                  <fnm>AR</fnm>
               </au>
            </aug>
            <source>J Clin Epidemiol</source>
            <pubdate>1990</pubdate>
            <volume>43</volume>
            <fpage>551</fpage>
            <lpage>558</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0895-4356(90)90159-M</pubid>
                  <pubid idtype="pmpid">2189948</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Experimental studies of relevance judgments</p>
            </title>
            <aug>
               <au>
                  <snm>Cuadra</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Katter</snm>
                  <fnm>RV</fnm>
               </au>
            </aug>
            <publisher>Santa Monica, CA, Systems Development Corporation</publisher>
            <pubdate>1967</pubdate>
         </bibl>
         <bibl id="B27">
            <title>
               <p>A field experimental approach to the study of relevance assessments in relation to document searching</p>
            </title>
            <aug>
               <au>
                  <snm>Rees</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Schultz</snm>
                  <fnm>DG</fnm>
               </au>
            </aug>
            <publisher>Cleveland, OH, Center for Documentation and Communication Research, School of Library Science, Case Western University</publisher>
            <pubdate>1967</pubdate>
         </bibl>
         <bibl id="B28">
            <title>
               <p>The Cranfield tests on index language devices</p>
            </title>
            <aug>
               <au>
                  <snm>Cleverdon</snm>
                  <fnm>CW</fnm>
               </au>
            </aug>
            <source>Aslib Proceedings</source>
            <pubdate>1967</pubdate>
            <volume>19</volume>
            <fpage>173</fpage>
            <lpage>193</lpage>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Using graded relevance assessments in IR evaluation</p>
            </title>
            <aug>
               <au>
                  <snm>Kek&#228;l&#228;inen</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>J&#228;rvelin</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Journal of the American Society for Information Science and Technology</source>
            <pubdate>2002</pubdate>
            <volume>53</volume>
            <fpage>1120</fpage>
            <lpage>1129</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1002/asi.10137</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>From highly relevant to non-relevant: Examining different regions of relevance</p>
            </title>
            <aug>
               <au>
                  <snm>Spink</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Greisdorf</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Bateman</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Information Processing &amp; Management</source>
            <pubdate>1998</pubdate>
            <volume>34</volume>
            <fpage> 599</fpage>
            <lpage>622</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/S0306-4573(98)00025-9</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Changes in relevance criteria and problem stages in task performance</p>
            </title>
            <aug>
               <au>
                  <snm>Vakkari</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Hakala</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>Journal of Documentation</source>
            <pubdate>2000</pubdate>
            <volume>56</volume>
            <fpage>540</fpage>
            <lpage>562</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1108/EUM0000000007127</pubid>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
