Project 3043: Run 4, Clone 45, Gen 13 - FAH client hang

Moderators: Site Moderators, FAHC Science Team

Post Reply
Ivoshiee
Site Moderator
Posts: 822
Joined: Sun Dec 02, 2007 12:05 am
Location: Estonia

Project 3043: Run 4, Clone 45, Gen 13 - FAH client hang

Post by Ivoshiee »

Somehow the FAH client stop when it will encounter the WU from project 3043 (Run 4, Clone 45, Gen 13).

Code: Select all

[ivo@ragana ~]$ /etc/init.d/folding status 1 viewlog

['folding' ver. 5.9]

Status of running FAH client(s) on 4 processor(s):
Status on all possible FAH client(s):
fah6 (pid 2042) is running
Status on all possible FAH cores (FahCore_a1.exe):
FahCore_a1.exe (pid 2239 2240) is running
Status on all possible 'FaH' scripts:
FaH (pid 2040) is running


Processes running from /home/ivo/foldingathome/CPU1 directory:
FAH client pids: 2042
FAH core pids: 2239 2240
FaH pids: 2040

FAH client flags: '-advmethods -forceasm -smp -verbosity 9'

 Index 2: folding now 1461.00 pts
  server: 171.64.65.63:8080; project: 3043, "9684 p3029_SProtein: 9684 p3029_SMP-emsv-03Extra SSE boost OK."
  Folding: run 4, clone 45, generation 13; benchmark 0; misc: 500, 200
  issue: Fri Apr 25 07:07:07 2008; begin: Tue Apr 29 13:12:28 2008
  due: Fri May  2 13:12:28 2008 (3 days)
  preferred: Thu May  1 01:12:28 2008 (36 hours)
  core URL: http://www.stanford.edu/~pande/Linux/x86/Core_a1.fah (V1.74)
  CPU: 1,0 x86; OS: 4,0 Linux
  smp cores: 4
  tag: P3043R4C45G13
  memory: 2014 MB
  assignment info (le): Tue Apr 29 13:12:24 2008; A4F0B7C8
  CS: 171.64.122.76; P limit: 524286976
  user: Ivoshiee; team: 385; ID: 532C760102AAA70A; mach ID: 1
  work/wudata_02.dat file size: 289920; WU type: Folding@Home
Average download rate 384.921 KB/s (u=4); upload rate 1074.352 KB/s (u=4)
Performance fraction 0.773749 (u=4)
Average pph: 88.990, ppd: 2135.76, ppw: 14950.3, ppy: 780066

Last 10 lines of /home/ivo/foldingathome/CPU1/FAHlog.txt:

[10:12:45] Project: 3043 (Run 4, Clone 45, Gen 13)
[10:12:45] 
[10:12:45] Assembly optimizations on if available.
[10:12:45] Entering M.D.
[10:12:51] Protein: 9684 p3029_SProtein: 9684 p3029_SMP-emsv-03Extra SSE boost OK.
[10:12:51] Finalizing output
[10:12:51] K.
[10:12:51] Warning:  long 1-4 interactions
[10:12:51] Writing local files
[10:12:51] Completed 0 out of 10000000 steps  (0 percent)



Status of FAH client(s): OK 
[ivo@ragana ~]$ 
One more symptom:

Code: Select all

[ivo@ragana ~]$ dmesg
FahCore_a1.exe[2245] general protection rip:69b2f4 rsp:409fe210 error:0
.tgz of the FAH directory:
http://ra.vendomar.ee/~ivo/p3043_hang.tgz

Code: Select all

[ivo@ragana CPU1]$ cat /proc/cpuinfo 
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 16
model           : 2
model name      : AMD Phenom(tm) 9500 Quad-Core Processor
stepping        : 2
cpu MHz         : 2200.155
cache size      : 512 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs
bogomips        : 4552.25
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

processor       : 1
vendor_id       : AuthenticAMD
cpu family      : 16
model           : 2
model name      : AMD Phenom(tm) 9500 Quad-Core Processor
stepping        : 2
cpu MHz         : 2200.155
cache size      : 512 KB
physical id     : 0
siblings        : 4
core id         : 1
cpu cores       : 4
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs
bogomips        : 4400.39
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

processor       : 2
vendor_id       : AuthenticAMD
cpu family      : 16
model           : 2
model name      : AMD Phenom(tm) 9500 Quad-Core Processor
stepping        : 2
cpu MHz         : 2200.155
cache size      : 512 KB
physical id     : 0
siblings        : 4
core id         : 2
cpu cores       : 4
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs
bogomips        : 4400.29
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

processor       : 3
vendor_id       : AuthenticAMD
cpu family      : 16
model           : 2
model name      : AMD Phenom(tm) 9500 Quad-Core Processor
stepping        : 2
cpu MHz         : 2200.155
cache size      : 512 KB
physical id     : 0
siblings        : 4
core id         : 3
cpu cores       : 4
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs
bogomips        : 4400.28
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

[ivo@ragana CPU1]$ cat /proc/meminfo 
MemTotal:      2062352 kB
MemFree:       1436464 kB
Buffers:         16536 kB
Cached:         276400 kB
SwapCached:          0 kB
Active:         360948 kB
Inactive:       179956 kB
SwapTotal:     2008116 kB
SwapFree:      2008116 kB
Dirty:             888 kB
Writeback:           0 kB
AnonPages:      248020 kB
Mapped:          68716 kB
Slab:            33836 kB
SReclaimable:    11920 kB
SUnreclaim:      21916 kB
PageTables:      22192 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:   3039292 kB
Committed_AS:   565692 kB
VmallocTotal: 34359738367 kB
VmallocUsed:     18380 kB
VmallocChunk: 34359719899 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
HugePages_Surp:      0
Hugepagesize:     2048 kB
[ivo@ragana CPU1]$ cat /etc/redhat-release 
Fedora release 8 (Werewolf)
[ivo@ragana CPU1]$ uname -a
Linux ragana 2.6.24.4-64.fc8 #1 SMP Sat Mar 29 09:15:49 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
[ivo@ragana CPU1]$ 
Reboot does not help, it is time to dump the WU.
Flathead74
Posts: 266
Joined: Sun Dec 02, 2007 6:08 pm
Location: Central New York
Contact:

Re: Project 3043: Run 4, Clone 45, Gen 13 - FAH client hang

Post by Flathead74 »

This WU was also bad when I tried to run it on 12/1/2007.

Project: 3043 (Run 4, Clone 45, Gen 13
[12:03:25] Warning: long 1-4 interactions
[12:03:25] Writing local files
[12:03:25] Completed 0 out of 10000000 steps (0 percent)
[12:03:29] CoreStatus = 0 (0)
[12:03:29] Client-core communications error: ERROR 0x0
[12:03:29] Deleting current work unit & continuing...
Flathead74 - 12/1/2007
*I ran qfix on this WU, to no avail.
qfix reported that the files were ok.
Q6600 @ 3.2GHz : Gigabyte 965P-DS3 : 2MB PC6400 : Suse 10.2
Post Reply