institute of information science, academia sinica taipei, taiwan
*to whom correspondence should be addressed. tel: +886 2 27883799, ext. 1804; fax: +886 2 27824814; email: [email protected]
nmr data from different experiments often contain errors; thus, automated backbone resonance assignment is a very challenging issue. in this paper, we present a method called gana that uses a genetic algorithm to automatically perform backbone resonance assignment with a high degree of precision and recall. precision is the number of correctly assigned residues divided by the number of assigned residues, and recall is the number of correctly assigned residues divided by the number of residues with known human curated answers. gana takes spin systems as input data and uses two data structures, candidate lists and adjacency lists, to assign the spin systems to each amino acid of a target protein. using gana, almost all spin systems can be mapped correctly onto a target protein, even if the data are noisy. we use the biomagresbank (bmrb) dataset (901 proteins) to test the performance of gana. to evaluate the robustness of gana, we generate four additional datasets from the bmrb dataset to simulate data errors of false positives, false negatives and linking errors. we also use a combination of these three error types to examine the fault tolerance of our method. the average precision rates of gana on bmrb and the four simulated test cases are 99.61, 99.55, 99.34, 99.35 and 98.60%, respectively. the average recall rates of gana on bmrb and the four simulated test cases are 99.26, 99.19, 98.85, 98.87 and 97.78%, respectively. we also test gana on two real wet-lab datasets, hbsbd and hblbd. the precision and recall rates of gana on hbsbd are 95.12 and 92.86%, respectively, and those of hblbd are 100 and 97.40%, respectively.
nmr provides an alternative to x-ray diffraction for determining the 3d structures of proteins in atomic resolution. nmr is also a powerful analytical tool for studying protein ligand binding, protein nucleic acid interactions and protein dynamics because it can probe protein molecules in a liquid environment. the first requirement for these studies is sequential resonance assignment on backbone structures. researchers usually conduct several 3d nmr experiments, such as cbcanh, cbca(co)nh or hn(co)ca, on 13c/15n/1hn-labeled proteins, and 2d nmr experiments, such as hsqc, on 15n/1hn-labeled proteins. these experiments are combined to construct sequential assignments. the multi-dimensional nmr spectra contain a mass of peaks that in turn contain chemical shifts and corresponding intensities. different kinds of nmr experiments provide different partial resonance information about particular atom groups on the backbone structure. for example, the 2d hsqc experiment is used to detect whether there is a covalent bond between n and hn. if such a bond exists, a corresponding peak should appear in the spectrum, thereby showing the chemical shifts of the two atoms. the backbone resonance assignment problem is how to identify the chemical shifts of particular atoms on the backbone structure from the connectivity information among the mass of isolated peaks. in the past, biologists had to make tedious backbone assignments manually or semi-manually during the spectra analysis process, but many automated tools using computational technologies are now available for the task. even so, backbone resonance assignment is still very difficult in practice owing to noise and errors in experimental nmr data.