Methods: The authors developed two methods for mapping defined and undefined abbreviations (defined abbreviations are associated with their full form in articles, while undefined abbreviations are not). For defined abbreviations, they developed a set of template matching rules to map an abbreviation to its full form, and implemented the rules in software, AbbRE (for “abbreviation recognition and retrieval”). Using the opinions of experts in the field as the reference standard, they assessed AbbRE`s recall and accuracy for abbreviations defined in ten biomedical papers randomly selected from the ten most cited medical and biological journals. They also measured the percentage of undefined abbreviations in the same series of articles and examined whether they could assign undefined abbreviations to one of the four public abbreviation databases (GenBank LocusLink, swissprot, LRABR of the UMLS lexicon and Bioabacus). In Part B of the assessment, we performed AbbRE with the remaining 40 articles (20 medical articles from five medical journals and 20 biological articles from five biological journals). The edition of AbbRE consisted of defined abbreviations, their complete forms and unique article identification numbers, as well as sentences containing the abbreviations and complete forms. We asked the experts to assess the accuracy of each abbreviation and its full form, which is listed in AbbRE editions. The reference standard consisted of abbreviations agreed upon by two or three experts. We obtained AbbRE`s accuracy for medical and biological journals separately as well as for the aggregate. CEP applies an approximate match (i.e.
if the string formed from the first letters of a word sequence corresponds to more than 70% of the abbreviation, CEP considers the word sequence as its complete form), and the approximation may indirectly contain matches from the middle letters. However, it is not clear to what extent the approach is appropriate in the biomedical field. All three systems have limitations that can affect their use in the biomedical field. The approaches of Hisamitsu and Niwa are based on the statistical significance of the two terms associated with parentheses; The approach may miss the abbreviations and full forms that have just been introduced into the literature. CEP considers abbreviations to be just words in which all letters are capital letters and only the letters (and not other symbols, such as numbers) match. These restrictions do not apply to many biomedical abbreviations, which often consist of uppercase and lowercase letters (e.B. Ab for antibodies) and numbers (e.B. lg1 for lateral gastrocnemius 1). Pnad-css was built on proper and may lack paired abbreviations and complete forms that have not been recognized by proper. Love does not have a complete form, but people create their own complete forms. Some of them are listed below: we have developed a set of rules that define a well-formed abbreviation.
The rules were generalized from examining all abbreviations and their complete forms in 200 scientific papers, a subset of randomly selected papers related to signal transduction pathways. Table 1▶ summarizes these rules. AbbRE starts with the string with fewer words (e.B domain) and maps the string to its potential abbreviation by applying rules 2 through 7. If the abbreviation does not match the full form (e.B. rules 2 to 7 do not apply), the next larger string, the area of death, is processed. Once the abbreviation matches its full form, AbbRE generates the matched abbreviation and its full form and moves on to subsequent matched sentences. The output is in the form “abbreviation| Full form| Article IDENTIFICATION NUMBER: The approaches of Hisamitsu and Niwa, KEP and Pnad-css, apply all model matching rules to assign an abbreviation to their full form. However, the matching rules of the Hisamitsu and Niwa models are preliminary and can lead to false matches. For example, column would be mistakenly recognized as an abbreviation of Columbia University because the letters in the column are displayed in the order of Columbia University. Other researchers have developed automatic methods to identify abbreviations and associate these abbreviations with a definition.15-17 Hisamitsu and Niwa15 identified technical terms — including company names, organization names, law names, and theories — the names of Japanese newspaper articles. They first selected sentences associated with parentheses through two-gram statistics (the sentence in parentheses and the outer sentence occur more often than randomly); They then applied a set of simple rules to determine whether the sentence in square brackets was an abbreviation of the outer sentence. For example, a rule stated that a sentence is the abbreviation of a complete form when the letters of the phrase appear in the full form in order.
Their evaluation of this approach showed an accuracy of 97%. The AbbRE program we have developed differs from the three approaches we have just described. AbbRE is designed to treat complete biomedical items. AbbRE looks for expressions in parentheses for paired abbreviations and full shapes. AbbRE does not break words down into components; It relies only on a set of model matching rules to assign an abbreviation to its full form. Model matching rules have been generalized from common conventions by which people create an abbreviation. As described in this article, AbbRE has been evaluated by experts in the field. In Part A of the evaluation, ten items were used; Each article was randomly selected from the five articles uploaded from one of the ten selected journals. We gave each expert in their field (medical or biological) five articles. We emailed all the experts their review articles (in HTML format) with their article identification numbers.