institute of information science, academia sinica taipei, taiwan
*to whom correspondence should be addressed. tel: +886 2 27883799, ext. 1804; fax: +886 2 27824814; email: [email protected]
nmr data from different experiments often contain errors; thus, automated backbone resonance assignment is a very challenging issue. in this paper, we present a method called gana that uses a genetic algorithm to automatically perform backbone resonance assignment with a high degree of precision and recall. precision is the number of correctly assigned residues divided by the number of assigned residues, and recall is the number of correctly assigned residues divided by the number of residues with known human curated answers. gana takes spin systems as input data and uses two data structures, candidate lists and adjacency lists, to assign the spin systems to each amino acid of a target protein. using gana, almost all spin systems can be mapped correctly onto a target protein, even if the data are noisy. we use the biomagresbank (bmrb) dataset (901 proteins) to test the performance of gana. to evaluate the robustness of gana, we generate four additional datasets from the bmrb dataset to simulate data errors of false positives, false negatives and linking errors. we also use a combination of these three error types to examine the fault tolerance of our method. the average precision rates of gana on bmrb and the four simulated test cases are 99.61, 99.55, 99.34, 99.35 and 98.60%, respectively. the average recall rates of gana on bmrb and the four simulated test cases are 99.26, 99.19, 98.85, 98.87 and 97.78%, respectively. we also test gana on two real wet-lab datasets, hbsbd and hblbd. the precision and recall rates of gana on hbsbd are 95.12 and 92.86%, respectively, and those of hblbd are 100 and 97.40%, respectively.......