Java代写:CS106RealMolecular


实现分子计算器,练习Java三方类库的使用。

Requirement

This is the last practical exercise and will continue over the remaining weeks
of the course.
In this practical you will implement a real molecular similarity method
Ultrafast shape recognition to search compound databases for similar molecular
shapes
So this problem involves reading from a file one reference molecule
calculating a descriptor for it, then reading a series of molecules from a
second file, computing the descriptor for each molecule and then quantifying
the difference between it and the reference. At the end of the run the program
should report the closest molecule and the magnitude of its difference to the
reference. All files will be in SD format and hydrogens should be completely
ignored in the procedure
The descriptor we will calculate consists of 4 triples of numbers. Each triple
consists of 3 statistical measures of distances from a point.
The measures are

  1. The mean distance from the point (sum of all distances divided by number of distances)
  2. The variance of this distance (sum of the squares of distances - mean all divided by number of distances minus 1)
  3. The skew of this distance (sum of the cubes of (distances - mean) / standard dev all divided by number of distances. The standard deviation is the square root of the variance.
    The four points we use to calculate these from are
  4. The centre of gravity
  5. the closest atom position to the COG
  6. The furthest atom position from the COG
  7. The furthest atom position from point 3 above.
    To calculate the difference between any 12 double set and another simply do
    the equivalent of a distance calculation but over all 12 numbers.
    Remember we know how to read SDfiles from a previous practical, however here
    is a reminder
    In order to access the CDK library you will need some import statements
    import org.openscience.cdk.CDKConstants;
    import org.openscience.cdk.Molecule;
    import org.openscience.cdk.DefaultChemObjectBuilder;
    import org.openscience.cdk.io.iterator.IteratingMDLReader;
    import org.openscience.cdk.io.MDLWriter;
    —|—
    import org.openscience.cdk.interfaces.*;
    To read a single SD file you could use something like
    IteratingMDLReader MDLReader = new IteratingMDLReader(new FileInputStream(RefFile), DefaultChemObjectBuilder.getInstance());
    if (MDLReader.hasNext()) {
    mymol = (Molecule)MDLReader.next();
    }
    —|—
    To read a sequence of files from an SD file
    MDLReader = new IteratingMDLReader(new FileInputStream(ScrFile), DefaultChemObjectBuilder.getInstance());
    while (MDLReader.hasNext()) {
    mymol = (Molecule)MDLReader.next();
    }
    MDLReader.close();
    —|—
    To get the name of a Molecule (here called m1) object
    Name = new String(String.valueOf(m1.getProperty(CDKConstants.TITLE)));
    —|—
    To get its number of atoms
    int natoms = m1.getAtomCount();
    —|—
    you can get each atom in a molecule by
    IAtom myatom = m1.getAtom(i);
    —|—
    Where i is the ith atom
    You can get the chemical symbol from each atom
    String s1 = myatom.getSymbol();
    —|—
    You can get the coordinates as a Point3d object by
    Point3d mypoint = myatom.getPoint3d();
    —|—
    (to use Point3d class you have to import javax.vecmath.Point3d )
    The Point3d class has a method called distance which returns the distance
    between the instance calling and its argument so
    Point3d a,b;

    d = a.distance(b);
    —|—
    In addition to the usual criteria of Functionality, readability, comments and
    a readme file, I request that you prepare a document called plan.txt in which
    you write a simple logic plan for the program.
    In order that you don’t get bogged down in the statistics I have given you a
    set of example methods to calculate mean, variance and skew.

文章作者: SafePoker
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 SafePoker !
  目录