Construction of graph_data.npz File
graph_data.npz is the core input format for HamGNN, containing material structure information and Hamiltonian matrix data. The following describes its construction method.
General Process
Regardless of which DFT software is used, the basic process for constructing graph_data.npz is as follows:
Structure File Conversion: Convert material structure files (such as POSCAR, CIF) to the corresponding DFT software input format
Non-self-consistent Hamiltonian Calculation: Use post-processing tools to generate files containing non-self-consistent Hamiltonian H0 (matrices that do not depend on self-consistent charge density, such as kinetic energy matrices)
DFT Calculation (Optional): Run complete DFT calculations to obtain real Hamiltonians (required for training set construction, can be skipped for pure prediction)
Data Packaging: Call the appropriate
graph_data_genscript to package structure and Hamiltonian matrix data intograph_data.npz
OpenMX Process
Structure Conversion: Edit the
poscar2openmx.yamlconfiguration file, setting appropriate paths and DFT parameters (for non-SOC data, setscf.SpinPolarization: Offandscf.SpinOrbit.Coupling: off; for SOC data, setscf.SpinPolarization: ncandscf.SpinOrbit.Coupling: on):system_name: 'Si' poscar_path: "/path/to/poscar/*.vasp" # Path to POSCAR or CIF files filepath: '/path/to/save/dat/file' # Directory to save OpenMX files basic_command: |+ # OpenMX calculation parameters # # File Name # System.CurrrentDirectory ./ # default=./ System.Name Si DATA.PATH /path/to/DFT_DATA # default=../DFT_DATA19 # ...other OpenMX parameters...
Non-self-consistent Hamiltonian Calculation: Run
openmx_postprocesson the generated.datfiles:mpirun -np ncpus ./openmx_postprocess openmx.dat
This will generate an
overlap.scfoutfile containing overlap matrices and non-self-consistent Hamiltonian information.DFT Calculation (Optional, required for training set):
mpirun -np ncpus openmx openmx.dat > openmx.std
This will generate a
.scfoutfile containing self-consistent Hamiltonians.Data Packaging: Configure
graph_data_gen.yaml:nao_max: 26 # Maximum number of atomic orbitals graph_data_save_path: '/path/to/save/graph_data' read_openmx_path: '/path/to/HamGNN/interfaces.openmx/read_openmx' max_SCF_skip: 200 scfout_paths: '/path/to/scfout/files' # Directory of .scfout files dat_file_name: 'openmx.dat' std_file_name: 'openmx.std' # Set to null if no DFT calculation scfout_file_name: 'openmx.scfout' # Use 'overlap.scfout' for prediction soc_switch: False # Set to True for SOC data
Then run:
graph_data_gen --config graph_data_gen.yaml
SIESTA/HONPAS Process
Structure Conversion: Edit the
poscar_pathandfilepathparameters in theposcar2siesta.pyscript, then run:python poscar2siesta.pyDFT Calculation: Use HONPAS to perform DFT calculations to obtain
.HSXHamiltonian matrix files.Non-self-consistent Hamiltonian Calculation: Run the following command to generate non-self-consistent Hamiltonians and overlap matrices:
mpirun -np Ncores honpas_1.2_H0 < input.fdf
This will generate an
overlap.HSXfile.Data Packaging: Modify the parameters in the
graph_data_gen_siesta.pyscript, then run:python graph_data_gen_siesta.py
ABACUS Process
Structure Conversion: Use
poscar2abacus.pyto generate ABACUS input files.DFT Calculation and Post-processing: Run ABACUS calculations and use
abacus_postprocessto extract H0 matrices.Data Packaging: Use the
graph_data_gen_abacus.pyscript to integrate data and generategraph_data.npz.