which is the most important file in the work unit results?

If you're new to FAH and need help getting started or you have very basic questions, start here.

Moderators: Site Moderators, FAHC Science Team

Post Reply
whocrazy
Posts: 96
Joined: Thu Mar 27, 2008 9:09 pm

which is the most important file in the work unit results?

Post by whocrazy »

Hi.
Just out of curiosity, I can remember with the first version of fah I tried, back in the old windows 9x and XP days, whenever the work unit finished, it would create a big text file .gro file, then it would somehow compress it and make it a binary file and then send the results, when the work units are submitted now however, it includes all the files, the .EDR, .XPC and all the log files, and the .gro file.
Which one is the most important file, and why are all the files included?
Thanks.
muziqaz
Posts: 1324
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: which is the most important file in the work unit results?

Post by muziqaz »

whocrazy wrote: Sat Mar 01, 2025 10:08 am Hi.
Just out of curiosity, I can remember with the first version of fah I tried, back in the old windows 9x and XP days, whenever the work unit finished, it would create a big text file .gro file, then it would somehow compress it and make it a binary file and then send the results, when the work units are submitted now however, it includes all the files, the .EDR, .XPC and all the log files, and the .gro file.
Which one is the most important file, and why are all the files included?
Thanks.
Can I ask for what purpose are you asking this? Which of your children are the most important to you?
I know that while folding wudata_01.dat is your WU. It is possible that this file is being sent back alongside science.log and other files. However it si possible wudata_01.dat is extracted into other files, this process happens quickly and since no one who folds really cares about it, it is not well documented, as long as results reach the server.
FAH Omega tester
Image
whocrazy
Posts: 96
Joined: Thu Mar 27, 2008 9:09 pm

Re: which is the most important file in the work unit results?

Post by whocrazy »

I only ask because I am interested and I like to know how it all works, there is no ulterior motive, I am not trying to hack anything or gain an unfair advantage, I am autistic and just very curious, I also like to know what all the other files do, like the .XTC and .EDR files.
PS: I have no children, and I am a little confused by your query.
muziqaz
Posts: 1324
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: which is the most important file in the work unit results?

Post by muziqaz »

To be fair I don't know exactly what each of the files is doing.
checkpt.crc is probably error checking file which checks if simulation is not looking for aliens
core.xml contains fahcore arguments
dhdl.xvg I think is simulation type file
ener.edr has something to do with energy field (probably)
frame22 files probably have snapshot of the frame (1%)
md.log molecular dynamics log from Gromacs/OpenMM (software used for simulations)
science.log has some extra simulation related logs which are not included in fahlog
state files are again snapshots of current state or something like that.
I'm, sure someone will correct me on most of them.
The point is, that a lot of the files are just proprietary files needed for the simulation, and none of them are more or less important. We cannot say which file is the most important file in the game or operating system. All of them are, because if you delete one random file your program might not start or work correctly :)
FAH Omega tester
Image
Joe_H
Site Admin
Posts: 8050
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Studio M1 Max 32 GB smp6
Mac Hack i7-7700K 48 GB smp4
Location: W. MA

Re: which is the most important file in the work unit results?

Post by Joe_H »

And some of the files may be used during the validation process on WUs received back by the WS before creating the next Gen WU to send out. With something missing the WU may fail validation, not earn credit, and be dumped by the server.
Image
arisu
Posts: 92
Joined: Mon Feb 24, 2025 11:11 pm

Re: which is the most important file in the work unit results?

Post by arisu »

Note and disclaimer: None of the information below was obtained through reverse engineering the proprietary FAH cores. All of this information comes from public sources pertaining to the open source version of GROMACS or are simple observations. There may be subtle differences between the public GROMACS and the version that FAH uses, but the below should be substantially correct.

The WU that you receive is a single file, wudata_01.dat. The FAH core processes it, creating some files in the process, and then packages some of those files into wuresults_01.dat which is then sent to the collection server, to be converted automatically into a new wudata_01.dat which will be sent to the next person to continue the simulation from where you left off. The .dat files are .tar files (a file archive format similar to zip that is not unique to GROMACS) with a special header prepended to it.

Files received from the work server:
wudata_01.dat which contains core.xml and frameN.tpr

Files created and sent to the collection server:
wuresults_01.dat which contains dhdl.xvg, frameN.gro, frameN.xtc, state.cpt, logfile_01.txt, science.log, and md.log

Files created but used only temporarily:
checkpt.crc, ener.edr, state_prev.cpt, and state_stepN.cpt

The N in frameN refers to the Generation of the WU, so a project with Run 10, Clone 14, Gen 26 would have frameN files named frame26.

Here are a description of what those files are for as best I understand them:

Configuration and core parameters (core.xml)
The .xml file (core.xml) contains parameters for the simulation that are passed to mdrun (the command that reads the files and writes output files and does the simulation magic). If core.xml contains different parameters, then files different than the ones described below may be used.

Molecular structure (frameN.gro)
The .gro file (frameN.gro) contain the molecular structure in a GROMACS-specific format. By concatenating multiple .gro files, you have a trajectory file (I don't think FAH uses it that way though). Each line describes one atom, names the molecule that it's a part of, and specifies its position and its velocity vectors. This seems to be the largest file.

Energy information (ener.edr)
The .edr file (ener.edr) contains the energy of the system in a portable (machine-independent) format. The binary equivalent is .ene and they can be inter-converted but a .ene file created on one architecture might not be readable if moved to another architecture. I don't think FAH uses .ene files.

Trajectory information (frameN.xtc and frameN.trr)
The .xtc file (frameN.xtc) is a portable (machine-independent) file containing low-precision trajectories. It stores a list of steps (and their timestamps) and a list of atoms' coordinates. The .trr file is the trajectory file like the .xtc file. Unlike the .xtc file, it contains full-precision trajectories as well as velocities, forces, and energies. I don't see the .trr file on any WUs I am running, but some FAH projects use it in addition to the reduced-precision .xtc file.

Simulation state checkpoints (state.cpt, state_prev.cpt, and state_stepN.cpt)
The .cpt file (state.cpt) is just a checkpoint file. It contains everything necessary to resume the simulation. It also contains the checksums of other important files to protect from silent data corruption (including the .edr file even though it's not uploaded, apparently). There is also a state_stepN.cpt (N is the step number that the checkpoint is created for). When a checkpoint is made, the new state is written to state_stepN.cpt. Once that file is written, state.cpt is renamed to state_prev.cpt and state_stepN.cpt is renamed to state.cpt.

Checkpoint checksums?? (checkpt.crc)
I don't know what the .crc file (checkpt.crc) is. I can't find anything about it in the GROMACS code so I guess it is something proprietary and custom for FAH. All I can tell is that it's 1760 bytes long and every few minutes, all but the initial 168 bytes are updated. I suspect it contains CRC checksums for components of the checkpoint files based on the name and the fact that it gets updated every time a new checkpoint is made. It doesn't look like it gets sent to the server in the WU results file, anyhow.

Molecular topology and simulation parameters (frameN.tpr)
The .tpr file (frameN.tpr) contains the starting state of the simulation (atom positions, energies, velocities, etc) and simulation parameters. It can be thought of as the "initial checkpoint" that is used when you first start running the WU. If there are no checkpoint files, then this file is used instead and the simulation starts at 0%. I'm guessing that the .cpt file from a WU result is converted into the .tpr file for the new WU for the next Generation somehow.

Log files (logfile_01.txt, science.log, and md.log)
The log files contain human-readable log information. The logfile_01.txt file contains the core's output log. The client's log that you can view through the web app contains this log (as well as lines from other running cores and the client itself). The science.log file contains some information about mdrun (the command within GROMACS that actually reads and processes all the files described here). The md.log file contains information about the state of the simulation at various points in time as well as information about the GROMACS build that is being used and its configuration.

Graphical plot (dhdl.xvg)
The .xvg file (dhdl.xvg) is a graphical plot file that contains plots of various components of the simulation over time. For FAH core a8 at least with the WU I looked at, it contains a graph of dH/dλ (dH/dL, hence the name dhdl.xvg) which describes the difference between enthalpy (internal energy + pressure x volume) at two lambda states and plots it over time. Don't ask me what a labmda state is, but apparently the rate at which enthalpy changes over lambda states is important enough to save. Here's an example plot generated from a .xvg file on a WU I completed recently:

Image

I created that image from dhdl.xvg with the following commands (on Linux, this needs the packages grace, ghostscript, and imagemagick):

Code: Select all

gracebat dhdl.xvg
gs -dQUIET -dBATCH -dTextAlphaBits=4 -sDEVICE=pnggray -r96 -odhdl.png dhdl.ps
mogrify -transverse -flip -trim -strip -border 16 -bordercolor white -quality 100 dhdl.png
Different FAH GROMACS cores may be using slightly different files. Even different individual projects may be using different files (it is probably up to the researcher, and he/she can adjust that by changing core.xml) or even different file names (some projects use mdN instead of frameN). And the GPU cores, based on OpenMM instead of GROMACS, are much different (I think the only file they have in common is a .xtc file). The OpenMM cores mostly use .xml files instead of arcane custom formats. Interestingly, OpenMM seems to support encrypted input files. I don't know what the purpose of that would be since the process would need access to the key to make use of the input files in the first place...

I can't really say what "the most important" file is because they are all important. I suppose the .xvg file and log files aren't strictly necessary for the next Generation WU to be created, but they are important to the researchers. I don't know if any of the files hold redundant information. I suspect the .cpt file can be used to re-generate a missing .gro file because the .gro looks like it is only created at the end of the simulation (unlike the other files which are updated as the simulation progresses). Uploading the .gro file doesn't make too much sense to me. If it can be generated at the end of the simulation from other existing files in the work directory, why aren't those files uploaded to the collection server instead, and have the collection server generate the .gro if it's really needed? That would save more than 60% on upload bandwidth, probably more for large WUs.
arisu
Posts: 92
Joined: Mon Feb 24, 2025 11:11 pm

Re: which is the most important file in the work unit results?

Post by arisu »

The answer might simply be everything but the frameN.gro file. I tried vanilla GROMACS and found I could recreate it from the .cpt and .xtc files, so it holds 100% redundant information. Although the collection server would probably reject an upload without the file, there's no reason why the collection server needs it.

Code: Select all

$ mkdir wudata wuresults
$ tail -c+513 wudata_01.dat | tar -C wudata -x
$ tail -c+513 wuresults_01.dat | tar -C wuresults -x
$ gmx convert-trj -f wuresults/state.cpt -s wudata/frame4.tpr -o out.gro 2>/dev/null
$ diff wuresults/frame4.gro out.gro
1c1
< FOO in water
---
> frame t= 25000.000
Post Reply