Download MutationFinder

  MutationFinder is available for download from  
  MutationFinder license information.  

MutationFinder supplementary data

  MutationFinder supplementary data for our 2007 JBCB paper is here.  
  MutationFinder supplementary data for our 2008 PSB paper is here.  

About MutationFinder

  MutationFinder (MF) is an open-source, high-performance information extraction system for extracting mentions of point mutations from free text. MutationFinder was developed in the Center for Computational Pharmacology, a part of the Computational Bioscience Program at the University of Colorado Health Sciences Center. On blind test data, MF achieves a precision of 98.4% and a recall of 81.9% when extracting point mutation mentions. MF is available as a stand-alone Python script that can be applied to input text, and via an API. For accessibility, three functionally identical implementations of the API are provided, in Python, Java, and Perl. Full documentation, along with usage examples, are presented in a README file packaged with the software.

Along with MF, we have published a gold standard corpus for mutation extraction systems consisting of 1515 human-annotated mutation mentions in 813 MEDLINE abstracts. This corpus is divided into development and test subsets. Interannotator agreement on this corpus, judged on fifty abstracts, was 94%.

MF applies a set of approximately 700 regular expressions to identify mutation mentions in input text. These regular expressions differ from those used by previous mutation extraction systems in that they were automatically generated in a process informed by MEDLINE in its entirety. Automating pattern generation allows for less commonly used formats for describing mutations to be matched directly (as opposed to being matched via heuristics), and therefore allows for improved recall over prior systems while maintaining high precision. Our Bioinformatics Applications Note (Caporaso et al., 2007) and software documentation present MF and our corpus, and provide the requisite information to use the system. Our article in Journal of Bioinformatics and Computational Biology (December 2007) provides detailed information on our approach to automatic pattern generation and the development of our gold standard corpus. Our Pacific Symposium on Biocomputing article (January 2008) compares the performance of MutationFinder in intrinsic versus extrinsic evaluations by applying MutationFinder in an attempt to recreate the mutation data in the Protein Data Bank (PDB).


Feature requests or bug reports

  We are very interested in improving MutationFinder, both in terms of its performance and its usability. If you have feature requests or bug reports, the most reliable way to get them to us is via our Sourceforge feature request and bug report trackers. Thanks for your input!  

Citing MutationFinder

  Please cite MutationFinder with the following reference:
MutationFinder: A high-performance system for extracting point mutation mentions from text
J. Gregory Caporaso, William A. Baumgartner Jr., David A. Randolph, K. Bretonnel Cohen, and Lawrence Hunter; Bioinformatics, 2007 23(14):1862-1865; doi:10.1093/bioinformatics/btm235;
Abstract PDF

Other peer-reviewed articles about MutationFinder

  Rapid pattern development for concept recognition systems: application to point mutations
J. Gregory Caporaso, William A. Baumgartner Jr., David A. Randolph, K. Bretonnel Cohen, and Lawrence Hunter; Journal of Bioinformatics and Computational Biology, 2007 Dec;5(6):1233-59. doi:10.1142/S0219720007003144. PDF
(A description of our approach for automatically generating the patterns that MutationFinder uses to extract mutation mentions from text, in addition to an analysis of MutationFinder on a full-text mutation corpus developed outside of our lab.)

Intrinsic evaluation of text mining tools may not predict performance on realistic tasks
J. Gregory Caporaso, Nita Deshpande, J. Lynn Fink, Philip E. Bourne, K. Bretonnel Cohen, and Lawrence Hunter; Pacific Symposium on Biocomputing 13:640-651(2008). PDF
(A discussion on how the performance of text mining systems on gold standard data relates to their usability in real-world tasks, such as automated database construction.)


Contact Information

  Project administrator: J. Gregory Caporaso e-mail web
Other projects from the Biomedical Text Mining Group at the Center for Computational Pharmacology are available at Logo