Creating a Training Data Set
[1]:
import os
import ase
from ase.io.trajectory import Trajectory
[2]:
from qmlearn.drivers.mol import QMMol
from qmlearn.preprocessing import get_train_atoms
from qmlearn.io.hdf5 import DBHDF5
Engine properties
Define the basis set, level of theory, exchange correlation functional and the total charge of your system:
Basis set available in PySCF : https://pyscf.org/_modules/pyscf/gto/basis.html
Level of Theory available in PySCF : https://pyscf.org/user.html
Exchange Correlation Functional available in PySCF : https://pyscf.org/_modules/pyscf/dft/xcfun.html
[3]:
basis = 'cc-pvTZ'
method = 'rks'
xc = 'lda,vwn_rpa'
charge = 0
Training set
From a molecular dynamics trajectory
'h2o_vib.traj'
remove similar structures using theget_train_atoms
function and keep them astrain_atoms
[4]:
nsamples=10000
tol = 1E-3
mdtraj = 'h2o_vib.traj'
[5]:
train_atoms=get_train_atoms(mdtraj, nsamples=nsamples, tol=tol)
nsamples = len(train_atoms)
WARN : Only get 36 samples at 90 step. Maybe you can reduce the 'tol'.
Reference structure
Using
train_atoms
define the reference structure.
[6]:
refqmmol = QMMol(atoms = train_atoms[0], method = method, basis=basis, xc = xc)
Properties definition
Create a list
prop
with the properties you wish to store after running PySCF engine usingqmmol.run()
.Iterate over each geometry in
train_atoms
, run PySCF engine usingqmmol.run()
and keep the results inproperties
dictionary.
[7]:
prop = ['vext', 'gamma', 'energy', 'forces']
properties= { k: [] for k in prop}
for atoms in train_atoms:
qmmol = refqmmol.duplicate(atoms)
qmmol.run()
properties['vext'].append(qmmol.engine.vext)
properties['gamma'].append(qmmol.engine.gamma)
properties['energy'].append(qmmol.engine.etotal)
properties['forces'].append(qmmol.engine.forces)
Database init
Initialize the hdf5 database name and create the object
db
.
[8]:
dbfile = os.path.splitext(mdtraj)[0]+'_QML_set.hdf5'
db = DBHDF5(dbfile, qmmol=refqmmol)
Writing Database
Write the database object
db
and close it.
[9]:
db.write_qmmol(refqmmol)
db.write_images(train_atoms, prefix='train')
db.write_properties(properties, prefix='train')
print(db.names)
db.close()
['rks', 'rks/qmmol', 'rks/train_atoms_36', 'rks/train_props_36']
Check Database
Check the database by oppening and reading it.
[10]:
db = DBHDF5(dbfile, qmmol=refqmmol)
data = db.read_properties(db.get_names('*/train_props_*')[0])
db.close()
data['energy'][0]
[10]:
-76.09382403593892