SMP unit just hangs

Moderators: Site Moderators, FAHC Science Team

SMP unit just hangs

Postby dmearns » Tue Dec 04, 2007 7:22 pm

One of my Linux boxes is trying to process Project: 2605 (Run 7, Clone 506, Gen 8). I have tried twice and each time it hangs right at the start of processing.
Code: Select all
[16:10:10] + Processing work unit
[16:10:10] Core required: FahCore_a1.exe
[16:10:10] Core found.
[16:10:10] Working on Unit 07 [December 4 16:10:10]
[16:10:10] + Working ...
[16:10:10] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a1.exe -dir work/ -suffix 07 -checkpoint 15 -forceasm -verbose -lifeline 21965 -version 600'

[16:10:10]
[16:10:10] *------------------------------*
[16:10:10] Folding@Home Gromacs SMP Core
[16:10:10] Version 1.74 (November 27, 2006)
[16:10:10]
[16:10:10] Preparing to commence simulation
[16:10:10] - Ensuring status. Please wait.
[16:10:27] - Assembly optimizations manually forced on.
[16:10:27] - Not checking prior termination.
[16:10:28] - Expanded 2419998 -> 12897049 (decompressed 532.9 percent)
[16:10:28] - Starting from initial work packet
[16:10:28]
[16:10:28] Project: 2605 (Run 7, Clone 506, Gen 8)
[16:10:28]
[16:10:28] Assembly optimizations on if available.
[16:10:28] Entering M.D.
[16:10:34] Finalizing output

I don't remember seeing that message about "Finalizing output" before, does anyone know what it means? This box has been folding perfectly for several weeks, so I really don't think this is a hardware issue. Should I just delete this work unit and move on?

Thanks.

- Dave
dmearns
 
Posts: 11
Joined: Tue Dec 04, 2007 6:29 pm
Location: Columbia MD USA

Postby toTOW » Tue Dec 04, 2007 8:10 pm

How long did you wait :?:

What happens if you restart the client :?:

If the WU didn't make any progress before, you can delete it, there are good chance that it's a bad WU :(
Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.

FAH-Addict : latest news, tests and reviews about Folding@Home project.

Image
User avatar
toTOW
Site Moderator
 
Posts: 5652
Joined: Sun Dec 02, 2007 11:38 am
Location: Bordeaux, France

Postby dmearns » Tue Dec 04, 2007 9:12 pm

The first time it was a little over 5 hours. It tried to send unsent work units at four hours so that part of the program was still alive. There was only one FahCore_a1.exe process running which usually means trouble. The processes did not respond to normal kill commands, so I had to use kill -9. On a restart, it did exactly the same thing. That one has been running about 4 hours now, and has produced no output after the "Finalizing output" line.

I will kill it again, and delete the unit.

Thanks

- Dave
dmearns
 
Posts: 11
Joined: Tue Dec 04, 2007 6:29 pm
Location: Columbia MD USA

Postby toTOW » Tue Dec 04, 2007 11:32 pm

The WU seems to have a problem yes :(
User avatar
toTOW
Site Moderator
 
Posts: 5652
Joined: Sun Dec 02, 2007 11:38 am
Location: Bordeaux, France


Return to Issues with a specific WU

Who is online

Users browsing this forum: No registered users and 2 guests

cron