Friday, December 15, 2017

computational chemistry - Why we can't see 3-d structures of certain compounds on PubChem?





It's common to read statements like one given below on PubChem:



Conformer generation is disallowed since MMFF94s unsupported element.



What's the meaning of the statement, and why are certain compounds unsupported? And what is MMFF94s? This might be something really basic, but since I have never come across I asked it.



Answer



Most classical molecular force fields are parameterized for a set of elements and atom types. The MMFF94 method was designed for standard organic drug-like small molecules, so it has a limited set of elements (H, Li, C, N, O, F, Na, Mg, Si, P, S, Cl, K, Ca, Fe, Cu, Zn, Br, I) categorized into 99 atom types (e.g. mmffprop.par from the Open Babel implementation).


In the cases of the metals, they should be ions, not covalently bonded to any other element.


As in the case of many molecule force fields, there are sets of bond types, angle types, etc. as combinations of the atom types. Molecules that contain elements outside the parameterization, or combinations that aren't in the bond or atom parameters, are rejected.


As an example, the MMFF94 validation set is available: http://server.ccl.net/cca/data/MMFF94/MMFF94_dative.mol2.shtml



Beyond the limits of MMFF94 and MMFF94s themselves, PubChem3D had several limits indicated in the accompanying manuscript: Bolton et. al. "PubChem3D: a new resource for scientists" J Cheminf. (2011) v. 3, art. 32)



  • Not too large (with ≤ 50 non-hydrogen atoms).

  • Not too flexible (with ≤ 15 rotatable bonds).

  • Consists of only supported elements (H, C, N, O, F, Si, P, S, Cl, Br, and I).

  • Has only a single covalent unit (i.e., not a salt or a mixture).

  • Contains only atom types recognized by the MMFF94s force field.

  • Has fewer than six undefined atom or bond stereo centers.


In the case of molecules with undefined atom or bond stereo (e.g., E/Z) multiple stereoisomers were generated. Personally, I'd use these compounds with extreme care - it's not always obvious what the original PubChem entry represents.



Thus there are other records without 3D versions (e.g, they're large, have multiple covalent units, etc.)


No comments:

Post a Comment

digital communications - Understanding the Matched Filter

I have a question about matched filtering. Does the matched filter maximise the SNR at the moment of decision only? As far as I understand, ...