Creating a Training Data Set

[1]:
import os
import ase
from ase.io.trajectory import Trajectory
[2]:
from qmlearn.drivers.mol import QMMol
from qmlearn.preprocessing import get_train_atoms
from qmlearn.io.hdf5 import DBHDF5

Engine properties

  1. Define the basis set, level of theory, exchange correlation functional and the total charge of your system:

    Basis set available in PySCF : https://pyscf.org/_modules/pyscf/gto/basis.html

    Level of Theory available in PySCF : https://pyscf.org/user.html

    Exchange Correlation Functional available in PySCF : https://pyscf.org/_modules/pyscf/dft/xcfun.html

[3]:
basis = 'cc-pvTZ'
method = 'rks'
xc = 'lda,vwn_rpa'
charge = 0

Training set

  1. From a molecular dynamics trajectory 'h2o_vib.traj' remove similar structures using the get_train_atoms function and keep them as train_atoms

[4]:
nsamples=10000
tol = 1E-3
mdtraj = 'h2o_vib.traj'
[5]:
train_atoms=get_train_atoms(mdtraj, nsamples=nsamples, tol=tol)
nsamples = len(train_atoms)
WARN : Only get 36 samples at 90 step. Maybe you can reduce the 'tol'.

Reference structure

  1. Using train_atoms define the reference structure.

[6]:
refqmmol = QMMol(atoms = train_atoms[0],  method = method, basis=basis, xc = xc)

Properties definition

  1. Create a list prop with the properties you wish to store after running PySCF engine using qmmol.run().

  2. Iterate over each geometry in train_atoms, run PySCF engine using qmmol.run() and keep the results in properties dictionary.

[7]:
prop = ['vext', 'gamma', 'energy', 'forces']
properties= { k: [] for k in prop}

for atoms in train_atoms:
    qmmol = refqmmol.duplicate(atoms)
    qmmol.run()
    properties['vext'].append(qmmol.engine.vext)
    properties['gamma'].append(qmmol.engine.gamma)
    properties['energy'].append(qmmol.engine.etotal)
    properties['forces'].append(qmmol.engine.forces)

Database init

  1. Initialize the hdf5 database name and create the object db.

[8]:
dbfile = os.path.splitext(mdtraj)[0]+'_QML_set.hdf5'
db = DBHDF5(dbfile, qmmol=refqmmol)

Writing Database

  1. Write the database object db and close it.

[9]:
db.write_qmmol(refqmmol)
db.write_images(train_atoms, prefix='train')
db.write_properties(properties, prefix='train')
print(db.names)
db.close()
['rks', 'rks/qmmol', 'rks/train_atoms_36', 'rks/train_props_36']

Check Database

  1. Check the database by oppening and reading it.

[10]:
db = DBHDF5(dbfile, qmmol=refqmmol)
data = db.read_properties(db.get_names('*/train_props_*')[0])
db.close()
data['energy'][0]
[10]:
-76.09382403593892