Project: 7611 (Run 1, Clone 0, Gen 294)

Moderators: Site Moderators, FAHC Science Team

Post Reply
professorvorston
Posts: 2
Joined: Tue Nov 20, 2012 1:31 pm

Project: 7611 (Run 1, Clone 0, Gen 294)

Post by professorvorston »

I recently picked up P7611, R1, C0, G294 and it seems to cause an invalid memory access after only a few seconds.

Here is the relevant section from the log:

Code: Select all

23/11/12:08:17:16:WU01:FS00:Connecting to assign3.stanford.edu:8080
23/11/12:08:17:17:WU01:FS00:News: Welcome to Folding@Home
23/11/12:08:17:17:WU01:FS00:Assigned to work server 171.64.65.104
23/11/12:08:17:17:WU01:FS00:Requesting new work unit for slot 00: RUNNING smp:4 from 171.64.65.104
23/11/12:08:17:17:WU01:FS00:Connecting to 171.64.65.104:8080
23/11/12:08:17:18:WU01:FS00:Downloading 29.65KiB
23/11/12:08:17:18:WU01:FS00:Download complete
23/11/12:08:17:18:WU01:FS00:Received Unit: id:01 state:DOWNLOAD error:OK project:7611 run:1 clone:0 gen:294 core:0xa4 unit:0x00000185664f2dd04df0f4fbabb702b0
23/11/12:08:17:18:WU01:FS00:Downloading project 7611 description
23/11/12:08:17:18:WU01:FS00:Connecting to fah-web.stanford.edu:80
23/11/12:08:17:19:WU01:FS00:Project 7611 description downloaded successfully
23/11/12:08:17:46:WU01:FS00:Starting
23/11/12:08:17:46:WU01:FS00:Running FahCore: /home/folder/FAHCoreWrapper /home/folder/cores/www.stanford.edu/~pande/Linux/AMD64/Core_a4.fah/FahCore_a4 -dir 01 -suffix 01 -version 701 -lifeline 14985 -checkpoint 15 -np 4 -forceasm
23/11/12:08:17:46:WU01:FS00:Started FahCore on PID 8393
23/11/12:08:17:46:Started thread 128 on PID 14985
23/11/12:08:17:46:WU01:FS00:Core PID:8397
23/11/12:08:17:46:WU01:FS00:FahCore 0xa4 started
23/11/12:08:17:47:WU01:FS00:0xa4:
23/11/12:08:17:47:WU01:FS00:0xa4:*------------------------------*
23/11/12:08:17:47:WU01:FS00:0xa4:Folding@Home Gromacs GB Core
23/11/12:08:17:47:WU01:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
23/11/12:08:17:47:WU01:FS00:0xa4:
23/11/12:08:17:47:WU01:FS00:0xa4:Preparing to commence simulation
23/11/12:08:17:47:WU01:FS00:0xa4:- Assembly optimizations manually forced on.
23/11/12:08:17:47:WU01:FS00:0xa4:- Not checking prior termination.
23/11/12:08:17:47:WU01:FS00:0xa4:- Expanded 29850 -> 644556 (decompressed 2159.3 percent)
23/11/12:08:17:47:WU01:FS00:0xa4:Called DecompressByteArray: compressed_data_size=29850 data_size=644556, decompressed_data_size=644556 diff=0
23/11/12:08:17:47:WU01:FS00:0xa4:- Digital signature verified
23/11/12:08:17:47:WU01:FS00:0xa4:
23/11/12:08:17:47:WU01:FS00:0xa4:Project: 7611 (Run 1, Clone 0, Gen 294)
23/11/12:08:17:47:WU01:FS00:0xa4:
23/11/12:08:17:47:WU01:FS00:0xa4:Assembly optimizations on if available.
23/11/12:08:17:47:WU01:FS00:0xa4:Entering M.D.
23/11/12:08:17:53:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
23/11/12:08:17:53:WU01:FS00:Starting
23/11/12:08:17:53:WU01:FS00:Running FahCore: /home/folder/FAHCoreWrapper /home/folder/cores/www.stanford.edu/~pande/Linux/AMD64/Core_a4.fah/FahCore_a4 -dir 01 -suffix 01 -version 701 -lifeline 14985 -checkpoint 15 -np 4 -forceasm
23/11/12:08:17:53:WU01:FS00:Started FahCore on PID 8404
23/11/12:08:17:53:Started thread 129 on PID 14985
23/11/12:08:17:53:WU01:FS00:Core PID:8408
23/11/12:08:17:53:WU01:FS00:FahCore 0xa4 started
23/11/12:08:17:53:WU01:FS00:0xa4:
23/11/12:08:17:53:WU01:FS00:0xa4:*------------------------------*
23/11/12:08:17:53:WU01:FS00:0xa4:Folding@Home Gromacs GB Core
23/11/12:08:17:53:WU01:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
23/11/12:08:17:53:WU01:FS00:0xa4:
23/11/12:08:17:53:WU01:FS00:0xa4:Preparing to commence simulation
23/11/12:08:17:53:WU01:FS00:0xa4:- Ensuring status. Please wait.
23/11/12:08:18:02:WU01:FS00:0xa4:- Assembly optimizations manually forced on.
23/11/12:08:18:02:WU01:FS00:0xa4:- Not checking prior termination.
23/11/12:08:18:03:WU01:FS00:0xa4:- Expanded 29850 -> 644556 (decompressed 2159.3 percent)
23/11/12:08:18:03:WU01:FS00:0xa4:Called DecompressByteArray: compressed_data_size=29850 data_size=644556, decompressed_data_size=644556 diff=0
23/11/12:08:18:03:WU01:FS00:0xa4:- Digital signature verified
23/11/12:08:18:03:WU01:FS00:0xa4:
23/11/12:08:18:03:WU01:FS00:0xa4:Project: 7611 (Run 1, Clone 0, Gen 294)
23/11/12:08:18:03:WU01:FS00:0xa4:
23/11/12:08:18:03:WU01:FS00:0xa4:Assembly optimizations on if available.
23/11/12:08:18:03:WU01:FS00:0xa4:Entering M.D.
23/11/12:08:18:09:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
23/11/12:08:18:53:WU01:FS00:Starting
23/11/12:08:18:53:WU01:FS00:Running FahCore: /home/folder/FAHCoreWrapper /home/folder/cores/www.stanford.edu/~pande/Linux/AMD64/Core_a4.fah/FahCore_a4 -dir 01 -suffix 01 -version 701 -lifeline 14985 -checkpoint 15 -np 4 -forceasm
23/11/12:08:18:53:WU01:FS00:Started FahCore on PID 8420
23/11/12:08:18:53:Started thread 130 on PID 14985
23/11/12:08:18:53:WU01:FS00:Core PID:8424
23/11/12:08:18:53:WU01:FS00:FahCore 0xa4 started
23/11/12:08:18:53:WU01:FS00:0xa4:
23/11/12:08:18:53:WU01:FS00:0xa4:*------------------------------*
23/11/12:08:18:53:WU01:FS00:0xa4:Folding@Home Gromacs GB Core
23/11/12:08:18:53:WU01:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
23/11/12:08:18:53:WU01:FS00:0xa4:
23/11/12:08:18:53:WU01:FS00:0xa4:Preparing to commence simulation
23/11/12:08:18:53:WU01:FS00:0xa4:- Ensuring status. Please wait.
23/11/12:08:19:03:WU01:FS00:0xa4:- Assembly optimizations manually forced on.
23/11/12:08:19:03:WU01:FS00:0xa4:- Not checking prior termination.
23/11/12:08:19:03:WU01:FS00:0xa4:- Expanded 29850 -> 644556 (decompressed 2159.3 percent)
23/11/12:08:19:03:WU01:FS00:0xa4:Called DecompressByteArray: compressed_data_size=29850 data_size=644556, decompressed_data_size=644556 diff=0
23/11/12:08:19:03:WU01:FS00:0xa4:- Digital signature verified
23/11/12:08:19:03:WU01:FS00:0xa4:
23/11/12:08:19:03:WU01:FS00:0xa4:Project: 7611 (Run 1, Clone 0, Gen 294)
23/11/12:08:19:03:WU01:FS00:0xa4:
23/11/12:08:19:03:WU01:FS00:0xa4:Assembly optimizations on if available.
23/11/12:08:19:03:WU01:FS00:0xa4:Entering M.D.
23/11/12:08:19:09:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
23/11/12:08:20:30:WU01:FS00:Starting
23/11/12:08:20:30:WU01:FS00:Running FahCore: /home/folder/FAHCoreWrapper /home/folder/cores/www.stanford.edu/~pande/Linux/AMD64/Core_a4.fah/FahCore_a4 -dir 01 -suffix 01 -version 701 -lifeline 14985 -checkpoint 15 -np 4 -forceasm
23/11/12:08:20:30:WU01:FS00:Started FahCore on PID 8451
23/11/12:08:20:30:Started thread 131 on PID 14985
23/11/12:08:20:30:WU01:FS00:Core PID:8455
23/11/12:08:20:30:WU01:FS00:FahCore 0xa4 started
23/11/12:08:20:31:WU01:FS00:0xa4:
23/11/12:08:20:31:WU01:FS00:0xa4:*------------------------------*
23/11/12:08:20:31:WU01:FS00:0xa4:Folding@Home Gromacs GB Core
23/11/12:08:20:31:WU01:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
23/11/12:08:20:31:WU01:FS00:0xa4:
23/11/12:08:20:31:WU01:FS00:0xa4:Preparing to commence simulation
23/11/12:08:20:31:WU01:FS00:0xa4:- Ensuring status. Please wait.
23/11/12:08:20:40:WU01:FS00:0xa4:- Assembly optimizations manually forced on.
23/11/12:08:20:40:WU01:FS00:0xa4:- Not checking prior termination.
23/11/12:08:20:40:WU01:FS00:0xa4:- Expanded 29850 -> 644556 (decompressed 2159.3 percent)
23/11/12:08:20:40:WU01:FS00:0xa4:Called DecompressByteArray: compressed_data_size=29850 data_size=644556, decompressed_data_size=644556 diff=0
23/11/12:08:20:40:WU01:FS00:0xa4:- Digital signature verified
23/11/12:08:20:40:WU01:FS00:0xa4:
23/11/12:08:20:40:WU01:FS00:0xa4:Project: 7611 (Run 1, Clone 0, Gen 294)
23/11/12:08:20:40:WU01:FS00:0xa4:
23/11/12:08:20:40:WU01:FS00:0xa4:Assembly optimizations on if available.
23/11/12:08:20:40:WU01:FS00:0xa4:Entering M.D.
23/11/12:08:20:46:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
23/11/12:08:23:07:WU01:FS00:Starting
23/11/12:08:23:07:WU01:FS00:Running FahCore: /home/folder/FAHCoreWrapper /home/folder/cores/www.stanford.edu/~pande/Linux/AMD64/Core_a4.fah/FahCore_a4 -dir 01 -suffix 01 -version 701 -lifeline 14985 -checkpoint 15 -np 4 -forceasm
23/11/12:08:23:08:WU01:FS00:Started FahCore on PID 8474
23/11/12:08:23:08:Started thread 132 on PID 14985
23/11/12:08:23:08:WU01:FS00:Core PID:8478
23/11/12:08:23:08:WU01:FS00:FahCore 0xa4 started
23/11/12:08:23:08:WU01:FS00:0xa4:
23/11/12:08:23:08:WU01:FS00:0xa4:*------------------------------*
23/11/12:08:23:08:WU01:FS00:0xa4:Folding@Home Gromacs GB Core
23/11/12:08:23:08:WU01:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
23/11/12:08:23:08:WU01:FS00:0xa4:
23/11/12:08:23:08:WU01:FS00:0xa4:Preparing to commence simulation
23/11/12:08:23:08:WU01:FS00:0xa4:- Ensuring status. Please wait.
23/11/12:08:23:17:WU01:FS00:0xa4:- Assembly optimizations manually forced on.
23/11/12:08:23:17:WU01:FS00:0xa4:- Not checking prior termination.
23/11/12:08:23:17:WU01:FS00:0xa4:- Expanded 29850 -> 644556 (decompressed 2159.3 percent)
23/11/12:08:23:17:WU01:FS00:0xa4:Called DecompressByteArray: compressed_data_size=29850 data_size=644556, decompressed_data_size=644556 diff=0
23/11/12:08:23:17:WU01:FS00:0xa4:- Digital signature verified
23/11/12:08:23:17:WU01:FS00:0xa4:
23/11/12:08:23:17:WU01:FS00:0xa4:Project: 7611 (Run 1, Clone 0, Gen 294)
23/11/12:08:23:17:WU01:FS00:0xa4:
23/11/12:08:23:17:WU01:FS00:0xa4:Assembly optimizations on if available.
23/11/12:08:23:17:WU01:FS00:0xa4:Entering M.D.
23/11/12:08:23:24:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
23/11/12:08:27:22:WU01:FS00:Starting
23/11/12:08:27:22:WU01:FS00:Running FahCore: /home/folder/FAHCoreWrapper /home/folder/cores/www.stanford.edu/~pande/Linux/AMD64/Core_a4.fah/FahCore_a4 -dir 01 -suffix 01 -version 701 -lifeline 14985 -checkpoint 15 -np 4 -forceasm
23/11/12:08:27:22:WU01:FS00:Started FahCore on PID 8514
23/11/12:08:27:22:Started thread 133 on PID 14985
23/11/12:08:27:22:WU01:FS00:Core PID:8518
23/11/12:08:27:22:WU01:FS00:FahCore 0xa4 started
23/11/12:08:27:22:WU01:FS00:0xa4:
23/11/12:08:27:22:WU01:FS00:0xa4:*------------------------------*
23/11/12:08:27:22:WU01:FS00:0xa4:Folding@Home Gromacs GB Core
23/11/12:08:27:22:WU01:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
23/11/12:08:27:22:WU01:FS00:0xa4:
23/11/12:08:27:22:WU01:FS00:0xa4:Preparing to commence simulation
23/11/12:08:27:22:WU01:FS00:0xa4:- Ensuring status. Please wait.
23/11/12:08:27:32:WU01:FS00:0xa4:- Assembly optimizations manually forced on.
23/11/12:08:27:32:WU01:FS00:0xa4:- Not checking prior termination.
23/11/12:08:27:32:WU01:FS00:0xa4:- Expanded 29850 -> 644556 (decompressed 2159.3 percent)
23/11/12:08:27:32:WU01:FS00:0xa4:Called DecompressByteArray: compressed_data_size=29850 data_size=644556, decompressed_data_size=644556 diff=0
23/11/12:08:27:32:WU01:FS00:0xa4:- Digital signature verified
23/11/12:08:27:32:WU01:FS00:0xa4:
23/11/12:08:27:32:WU01:FS00:0xa4:Project: 7611 (Run 1, Clone 0, Gen 294)
23/11/12:08:27:32:WU01:FS00:0xa4:
23/11/12:08:27:32:WU01:FS00:0xa4:Assembly optimizations on if available.
23/11/12:08:27:32:WU01:FS00:0xa4:Entering M.D.
23/11/12:08:27:38:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
23/11/12:08:34:13:WU01:FS00:Starting
23/11/12:08:34:13:WU01:FS00:Running FahCore: /home/folder/FAHCoreWrapper /home/folder/cores/www.stanford.edu/~pande/Linux/AMD64/Core_a4.fah/FahCore_a4 -dir 01 -suffix 01 -version 701 -lifeline 14985 -checkpoint 15 -np 4 -forceasm
23/11/12:08:34:13:WU01:FS00:Started FahCore on PID 8572
23/11/12:08:34:13:Started thread 134 on PID 14985
23/11/12:08:34:13:WU01:FS00:Core PID:8576
23/11/12:08:34:13:WU01:FS00:FahCore 0xa4 started
23/11/12:08:34:14:WU01:FS00:0xa4:
23/11/12:08:34:14:WU01:FS00:0xa4:*------------------------------*
23/11/12:08:34:14:WU01:FS00:0xa4:Folding@Home Gromacs GB Core
23/11/12:08:34:14:WU01:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
23/11/12:08:34:14:WU01:FS00:0xa4:
23/11/12:08:34:14:WU01:FS00:0xa4:Preparing to commence simulation
23/11/12:08:34:14:WU01:FS00:0xa4:- Ensuring status. Please wait.
23/11/12:08:34:23:WU01:FS00:0xa4:- Assembly optimizations manually forced on.
23/11/12:08:34:23:WU01:FS00:0xa4:- Not checking prior termination.
23/11/12:08:34:23:WU01:FS00:0xa4:- Expanded 29850 -> 644556 (decompressed 2159.3 percent)
23/11/12:08:34:23:WU01:FS00:0xa4:Called DecompressByteArray: compressed_data_size=29850 data_size=644556, decompressed_data_size=644556 diff=0
23/11/12:08:34:23:WU01:FS00:0xa4:- Digital signature verified
23/11/12:08:34:23:WU01:FS00:0xa4:
23/11/12:08:34:23:WU01:FS00:0xa4:Project: 7611 (Run 1, Clone 0, Gen 294)
23/11/12:08:34:23:WU01:FS00:0xa4:
23/11/12:08:34:23:WU01:FS00:0xa4:Assembly optimizations on if available.
23/11/12:08:34:23:WU01:FS00:0xa4:Entering M.D.
23/11/12:08:34:29:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
23/11/12:08:44:15:WARNING:Caught signal SIGINT(2) on PID 14985
23/11/12:08:44:15:Exiting, please wait. . .
23/11/12:08:44:16:Clean exit
I tried it again with -forceasm removed at it did not seem to be much different:

Code: Select all

23/11/12:08:56:34:WU01:FS00:Starting
23/11/12:08:56:34:WU01:FS00:Running FahCore: /home/folder/FAHCoreWrapper /home/folder/cores/www.stanford.edu/~pande/Linux/AMD64/Core_a4.fah/FahCore_a4 -dir 01 -suffix 01 -version 701 -lifeline 8780 -checkpoint 15 -np 4
23/11/12:08:56:34:WU01:FS00:Started FahCore on PID 8788
23/11/12:08:56:34:WU01:FS00:Core PID:8792
23/11/12:08:56:34:WU01:FS00:FahCore 0xa4 started
23/11/12:08:56:34:WU01:FS00:0xa4:
23/11/12:08:56:34:WU01:FS00:0xa4:*------------------------------*
23/11/12:08:56:34:WU01:FS00:0xa4:Folding@Home Gromacs GB Core
23/11/12:08:56:34:WU01:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
23/11/12:08:56:34:WU01:FS00:0xa4:
23/11/12:08:56:34:WU01:FS00:0xa4:Preparing to commence simulation
23/11/12:08:56:34:WU01:FS00:0xa4:- Ensuring status. Please wait.
23/11/12:08:56:44:WU01:FS00:0xa4:- Looking at optimizations...
23/11/12:08:56:44:WU01:FS00:0xa4:- Working with standard loops on this execution.
23/11/12:08:56:51:WU01:FS00:0xa4:- Created dyn
23/11/12:08:56:51:WU01:FS00:0xa4:- Files status OK
23/11/12:08:56:51:WU01:FS00:0xa4:- Expanded 29850 -> 644556 (decompressed 2159.3 percent)
23/11/12:08:56:51:WU01:FS00:0xa4:Called DecompressByteArray: compressed_data_size=29850 data_size=644556, decompressed_data_size=644556 diff=0
23/11/12:08:56:51:WU01:FS00:0xa4:- Digital signature verified
23/11/12:08:56:51:WU01:FS00:0xa4:
23/11/12:08:56:51:WU01:FS00:0xa4:Project: 7611 (Run 1, Clone 0, Gen 294)
23/11/12:08:56:51:WU01:FS00:0xa4:
23/11/12:08:56:51:WU01:FS00:0xa4:Entering M.D.
23/11/12:08:56:57:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
23/11/12:08:56:57:WU01:FS00:Starting
23/11/12:08:56:57:WU01:FS00:Running FahCore: /home/folder/FAHCoreWrapper /home/folder/cores/www.stanford.edu/~pande/Linux/AMD64/Core_a4.fah/FahCore_a4 -dir 01 -suffix 01 -version 701 -lifeline 8780 -checkpoint 15 -np 4
23/11/12:08:56:57:WU01:FS00:Started FahCore on PID 8801
23/11/12:08:56:57:WU01:FS00:Core PID:8805
23/11/12:08:56:57:WU01:FS00:FahCore 0xa4 started
23/11/12:08:56:58:WU01:FS00:0xa4:
23/11/12:08:56:58:WU01:FS00:0xa4:*------------------------------*
23/11/12:08:56:58:WU01:FS00:0xa4:Folding@Home Gromacs GB Core
23/11/12:08:56:58:WU01:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
23/11/12:08:56:58:WU01:FS00:0xa4:
23/11/12:08:56:58:WU01:FS00:0xa4:Preparing to commence simulation
23/11/12:08:56:58:WU01:FS00:0xa4:- Ensuring status. Please wait.
23/11/12:08:57:07:WU01:FS00:0xa4:- Looking at optimizations...
23/11/12:08:57:07:WU01:FS00:0xa4:- Working with standard loops on this execution.
23/11/12:08:57:07:WU01:FS00:0xa4:- Previous termination of core was improper.
23/11/12:08:57:07:WU01:FS00:0xa4:- Files status OK
23/11/12:08:57:07:WU01:FS00:0xa4:- Expanded 29850 -> 644556 (decompressed 2159.3 percent)
23/11/12:08:57:07:WU01:FS00:0xa4:Called DecompressByteArray: compressed_data_size=29850 data_size=644556, decompressed_data_size=644556 diff=0
23/11/12:08:57:07:WU01:FS00:0xa4:- Digital signature verified
23/11/12:08:57:07:WU01:FS00:0xa4:
23/11/12:08:57:07:WU01:FS00:0xa4:Project: 7611 (Run 1, Clone 0, Gen 294)
23/11/12:08:57:07:WU01:FS00:0xa4:
23/11/12:08:57:07:WU01:FS00:0xa4:Entering M.D.
23/11/12:08:57:13:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
Here is the output of Linux's kernel log, showing the segmentation faults:

Code: Select all

[12741583.086434] FahCore_a4[8398]: segfault at 7f4fdc0d1d30 ip 00000000004aaf36 sp 00007f51ee1853a8 error 4 in FahCore_a4[400000+5e9000]
[12741599.250069] FahCore_a4[8410]: segfault at 7f1a140912c0 ip 00000000004aaf36 sp 00007f1c2167b3a8 error 4 in FahCore_a4[400000+5e9000]
[12741659.221647] FahCore_a4[8426]: segfault at 7fe2480d6910 ip 00000000004aaf36 sp 00007fe459a133a8 error 4 in FahCore_a4[400000+5e9000]
[12741756.462775] FahCore_a4[8457]: segfault at fffffffe023bfa40 ip 00000000004aaf36 sp 00007ffa3810c3a8 error 4 in FahCore_a4[400000+5e9000]
[12741913.694341] FahCore_a4[8479]: segfault at 7f643c0d2130 ip 00000000004aaf36 sp 00007f664c6ce3a8 error 4 in FahCore_a4[400000+5e9000]
[12742167.910641] FahCore_a4[8520]: segfault at 7f9d1c0ec520 ip 00000000004aaf36 sp 00007f9f2cc693a8 error 4 in FahCore_a4[400000+5e9000]
[12742579.109308] FahCore_a4[8584]: segfault at 7efd000b21f0 ip 00000000004aaf36 sp 00007eff0c1bb3a8 error 4 in FahCore_a4[400000+5e9000]
[12743926.744753] FahCore_a4[8794]: segfault at 7fd8d40f8260 ip 00000000004aaf36 sp 00007fdade33f3a8 error 4 in FahCore_a4[400000+5e9000]
[12743942.740007] FahCore_a4[8806]: segfault at fffffffe01169870 ip 00000000004aaf36 sp 00007f2e02f763a8 error 4 in FahCore_a4[400000+5e9000]
Should I keep trying, investigate further or try to dump the work unit?
professorvorston
Posts: 2
Joined: Tue Nov 20, 2012 1:31 pm

Re: Project: 7611 (Run 1, Clone 0, Gen 294)

Post by professorvorston »

This probably isn't worthwhile but just in case it is useful, using Linux core FahCore_a4 with SHA1 checksum 9dd5643da2519b06773e09b88b483e0644a55f77, this is GDB's output for the situation:

Code: Select all

23/11/12:09:59:18:WU01:FS00:0xa4:Entering M.D.
[New LWP 9638]

Program received signal SIGSEGV, Segmentation fault.
[Switching to LWP 9638]
0x00000000004aaf36 in ?? ()
(gdb) bt
#0  0x00000000004aaf36 in ?? ()
#1  0x00000000004b77a4 in ?? ()
#2  0x0000000000444cba in ?? ()
#3  0x00000000004e9d38 in ?? ()
#4  0x000000000041c7db in ?? ()
#5  0x0000000000410a6d in ?? ()
#6  0x000000000040fd07 in ?? ()
#7  0x00000000004090f2 in ?? ()
#8  0x00000000004054eb in ?? ()
#9  0x0000000000405d81 in ?? ()
#10 0x0000000000405e1d in ?? ()
#11 0x0000000000770a5d in ?? ()
#12 0x00000000008c56b9 in ?? ()
(gdb) disassemble 0x4aaf28,0x4aaf48
Dump of assembler code from 0x4aaf28 to 0x4aaf48:
   0x00000000004aaf28:  subss  %xmm13,%xmm15
   0x00000000004aaf2d:  subss  %xmm1,%xmm0
   0x00000000004aaf31:  subss  %xmm2,%xmm14
=> 0x00000000004aaf36:  addss  (%r14,%r9,4),%xmm15
   0x00000000004aaf3c:  mov    0x0(%r13,%r9,4),%r9d
   0x00000000004aaf41:  mov    %r9d,(%rsi,%rax,1)
   0x00000000004aaf45:  mov    (%r12,%r8,4),%r10d
End of assembler dump.
(gdb) info all-registers
rax            0x0      0
rbx            0x7fffec0bd830   140737153587248
rcx            0x7fffec9b6160   140737162994016
rdx            0x1      1
rsi            0x7fffec9e5ac0   140737163188928
rdi            0x7fffec76b390   140737160590224
rbp            0x80000000       0x80000000
rsp            0x7ffff7eb83a8   0x7ffff7eb83a8
r8             0xffffffff80000000       -2147483648
r9             0xffffffff80000000       -2147483648
r10            0x80000000       2147483648
r11            0x7fffec0bea20   140737153591840
r12            0x7fffec0be4e0   140737153590496
r13            0x7fffec0bdfc0   140737153589184
r14            0x7fffec0be250   140737153589840
r15            0x7fffec0be770   140737153591152
rip            0x4aaf36 0x4aaf36
eflags         0x10202  [ IF RF ]
cs             0x33     51
ss             0x2b     43
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x0      0
st0            -nan(0x000000034)        (raw 0xffff0000000000000034)
st1            -nan(0x00000000c)        (raw 0xffff000000000000000c)
st2            -inf     (raw 0xffff0000000000000000)
st3            -inf     (raw 0xffff0000000000000000)
st4            0        (raw 0x00000000000000000000)
st5            0        (raw 0x00000000000000000000)
st6            -nan(0x66e3000000000000) (raw 0xffff66e3000000000000)
st7            -inf     (raw 0xffff0000000000000000)
fctrl          0x37f    895
fstat          0x20     32
ftag           0xffff   65535
fiseg          0x0      0
fioff          0x868db7 8818103
foseg          0x7fff   32767
fooff          0xf7ebbc58       -135545768
fop            0x0      0
mxcsr          0x1fb1   [ IE UE PE IM DM ZM OM UM PM ]
ymm0           *value not available*
ymm1           *value not available*
ymm2           *value not available*
ymm3           *value not available*
ymm4           *value not available*
ymm5           *value not available*
ymm6           *value not available*
ymm7           *value not available*
ymm8           *value not available*
ymm9           *value not available*
ymm10          *value not available*
ymm11          *value not available*
ymm12          *value not available*
ymm13          *value not available*
ymm14          *value not available*
ymm15          *value not available*
bollix47
Posts: 2941
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: Project: 7611 (Run 1, Clone 0, Gen 294)

Post by bollix47 »

Welcome to the folding support forum professorvorston.

Have you run memtest on that computer to determine if there are any memory problems?

Unfortunately we can't tell if the work unit is a bad one at this time. That's not to say it isn't, just that there are no returns for it yet. Bad WUs are rare but they do happen.
Image
bollix47
Posts: 2941
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: Project: 7611 (Run 1, Clone 0, Gen 294)

Post by bollix47 »

Project: 7611 (Run 1, Clone 0, Gen 294) has now been returned by 2 folders. It looks like the work unit took a long time for both of them with a small credit at the end but I have no way of knowing their computer specs or their folding habits so I can't say for sure if it was one of 7611s that people are reporting as having longer than usual frame times as reported in this thread.
Image
leibold
Posts: 3
Joined: Sun Dec 09, 2012 6:15 pm

Re: Project: 7611 (Run 1, Clone 0, Gen 294)

Post by leibold »

I'm having the very same issue as professorvorston with a different workunit in the same project: Project: 7611 (Run 3, Clone 56, Gen 180)

I'm running version 7.1.52 on SuSE Linux and as you can see from the system messages I'm getting the same segmentation violations in the FahCore_a4 executable at the same address.

Code: Select all

[17186452.307159] FahCore_a4[15577]: segfault at 7f9778145400 ip 00000000004aaf36 sp 00007f997e5fa3a8 error 4 in FahCore_a4[400000+5e9000]
[17186468.811458] FahCore_a4[15607]: segfault at fffffffe00d4f020 ip 00000000004aaf36 sp 00007fb02a5353a8 error 4 in FahCore_a4[400000+5e9000]
[17186528.825196] FahCore_a4[15627]: segfault at fffffffe00d4e930 ip 00000000004aaf36 sp 00007fd4841e03a8 error 4 in FahCore_a4[400000+5e9000]
[17186626.116805] FahCore_a4[15655]: segfault at fffffffe00e297e0 ip 00000000004aaf36 sp 00007fbf693ad3a8 error 4 in FahCore_a4[400000+5e9000]
[17186783.414954] FahCore_a4[15689]: segfault at 7f86700d8c00 ip 00000000004aaf36 sp 00007f8877a533a8 error 4 in FahCore_a4[400000+5e9000]
[17187037.737651] FahCore_a4[15760]: segfault at fffffffe00e06710 ip 00000000004aaf36 sp 00007f89a21c93a8 error 4 in FahCore_a4[400000+5e9000]
[17187449.119709] FahCore_a4[15877]: segfault at fffffffe00d4ec50 ip 00000000004aaf36 sp 00007f5d029193a8 error 4 in FahCore_a4[400000+5e9000]
[17188114.594475] FahCore_a4[16018]: segfault at 7fea5409b870 ip 00000000004aaf36 sp 00007fec5dd6c3a8 error 4 in FahCore_a4[400000+5e9000]
[17189191.449464] FahCore_a4[16279]: segfault at fffffffe00d4ed60 ip 00000000004aaf36 sp 00007f919832d3a8 error 4 in FahCore_a4[400000+5e9000]
[17190933.713845] FahCore_a4[17031]: segfault at fffffffe00d4e930 ip 00000000004aaf36 sp 00007f11663f03a8 error 4 in FahCore_a4[400000+5e9000]
[17193752.572033] FahCore_a4[17670]: segfault at 7fe014096b40 ip 00000000004aaf36 sp 00007fe21a2973a8 error 4 in FahCore_a4[400000+5e9000]
[17198313.474348] FahCore_a4[19286]: segfault at fffffffe00d4ed20 ip 00000000004aaf36 sp 00007f3917b263a8 error 4 in FahCore_a4[400000+5e9000]
[17205693.115175] FahCore_a4[21998]: segfault at fffffffe00d54820 ip 00000000004aaf36 sp 00007f3dc77fd3a8 error 4 in FahCore_a4[400000+5e9000]
[17217633.465297] FahCore_a4[24885]: segfault at fffffffe00d4eda0 ip 00000000004aaf36 sp 00007f286eb7e3a8 error 4 in FahCore_a4[400000+5e9000]
[17236953.316580] FahCore_a4[30216]: segfault at 7f60580d9ad0 ip 00000000004aaf36 sp 00007f6261d123a8 error 4 in FahCore_a4[400000+5e9000]
[17258553.353533] FahCore_a4[3586]: segfault at 7fda90148850 ip 00000000004aaf36 sp 00007fdc9634c3a8 error 4 in FahCore_a4[400000+5e9000]
[17280153.540098] FahCore_a4[9593]: segfault at fffffffe00e152e0 ip 00000000004aaf36 sp 00007f160da4e3a8 error 4 in FahCore_a4[400000+5e9000]
[17301753.685597] FahCore_a4[16678]: segfault at 7f7e3c144670 ip 00000000004aaf36 sp 00007f80433003a8 error 4 in FahCore_a4[400000+5e9000]
[17323353.900047] FahCore_a4[26729]: segfault at 7fb110096bb0 ip 00000000004aaf36 sp 00007fb319a8a3a8 error 4 in FahCore_a4[400000+5e9000]
[17337928.085190] FahCore_a4[671]: segfault at 7fe378140f90 ip 00000000004aaf36 sp 00007fe5809543a8 error 4 in FahCore_a4[400000+5e9000]
[17337944.451452] FahCore_a4[684]: segfault at fffffffe00e010c0 ip 00000000004aaf36 sp 00007f3b61f9c3a8 error 4 in FahCore_a4[400000+5e9000]
[17338004.490726] FahCore_a4[707]: segfault at fffffffe00da6440 ip 00000000004aaf36 sp 00007f6e4665a3a8 error 4 in FahCore_a4[400000+5e9000]
[17338101.758161] FahCore_a4[756]: segfault at 7f379c07b6e0 ip 00000000004aaf36 sp 00007f39a38063a8 error 4 in FahCore_a4[400000+5e9000]
[17338259.062045] FahCore_a4[834]: segfault at fffffffe00da6460 ip 00000000004aaf36 sp 00007fbf32fb13a8 error 4 in FahCore_a4[400000+5e9000]
[17338513.391671] FahCore_a4[900]: segfault at 7fda780fddc0 ip 00000000004aaf36 sp 00007fdc7ffb13a8 error 4 in FahCore_a4[400000+5e9000]
[17338924.796472] FahCore_a4[1017]: segfault at 7fe26c09b780 ip 00000000004aaf36 sp 00007fe4735fa3a8 error 4 in FahCore_a4[400000+5e9000]
I'm posting this to point out that this is not an isolated incident but a reproduceable problem effecting Linux users who get one of those "bad" Project 7611 workunits (it appears that those workunits continue to run on windows, however very slowly).

After the initial quick restarts of the crashing A4 core the client continued to try the same workunit over and over again in 4 hour intervals (the F@H client being idle in between). At one point I did shutdown FAHControl and FAHClient and restarted them with no change (FAHClient continued to restart FahCore_a4 on the same bad workunit). After more then 20 crashes of the A4 core I moved the workunit aside and restarted FAHClient which then fetched a new workunit (different project) that is working fine.

It seems to me that unattended Linux computers receiving one of these workunits will become permanently disabled from further participation in Folding at Home.

fahlog is available if needed but looks the same as the ones posted by professorvorston. After 8 crashes it added the message "Examination of work files indicates 8 consecutive improper terminations of core." (same message on all subsequent retries).
Member of dslreports.com Team Helix (Folding@Home team number 4)
codysluder
Posts: 1024
Joined: Sun Dec 02, 2007 12:43 pm

Re: Project: 7611 (Run 1, Clone 0, Gen 294)

Post by codysluder »

How many SMP threads are being used?
leibold
Posts: 3
Joined: Sun Dec 09, 2012 6:15 pm

Re: Project: 7611 (Run 1, Clone 0, Gen 294)

Post by leibold »

In my case 2 SMP threads: Processor is Intel Core2Duo 6300
Member of dslreports.com Team Helix (Folding@Home team number 4)
mmonnin
Posts: 324
Joined: Wed Dec 05, 2007 1:27 am

Re: Project: 7611 (Run 1, Clone 0, Gen 294)

Post by mmonnin »

I've returned 5 of these WUs myself.
1,30,220
1,50,282
0,0,229
0,13,266
4,23,242

i7 3770 in Linux VM. 8 threads. Seems normal for me.
GreyWhiskers
Posts: 660
Joined: Mon Oct 25, 2010 5:57 am
Hardware configuration: a) Main unit
Sandybridge in HAF922 w/200 mm side fan
--i7 2600K@4.2 GHz
--ASUS P8P67 DeluxeB3
--4GB ADATA 1600 RAM
--750W Corsair PS
--2Seagate Hyb 750&500 GB--WD Caviar Black 1TB
--EVGA 660GTX-Ti FTW - Signature 2 GPU@ 1241 Boost
--MSI GTX560Ti @900MHz
--Win7Home64; FAH V7.3.2; 327.23 drivers

b) 2004 HP a475c desktop, 1 core Pent 4 HT@3.2 GHz; Mem 2GB;HDD 160 GB;Zotac GT430PCI@900 MHz
WinXP SP3-32 FAH v7.3.6 301.42 drivers - GPU slot only

c) 2005 Toshiba M45-S551 laptop w/2 GB mem, 160GB HDD;Pent M 740 CPU @ 1.73 GHz
WinXP SP3-32 FAH v7.3.6 [Receiving Core A4 work units]
d) 2011 lappy-15.6"-1920x1080;i7-2860QM,2.5;IC Diamond Thermal Compound;GTX 560M 1,536MB u/c@700;16GB-1333MHz RAM;HDD:500GBHyb w/ 4GB SSD;Win7HomePrem64;320.18 drivers FAH 7.4.2ß
Location: Saratoga, California USA

Re: Project: 7611 (Run 1, Clone 0, Gen 294)

Post by GreyWhiskers »

My HFM.net records (maybe incomplete) show one 7611: 1, 56, 45 returned on May 18, 2012 on a Uniprocessor client using Core A4 2.27. This was run on a Pentium 4/HT 3.2 GHz single core, two thread CPU as a Uniprocessor with a 1 hour 15 minute 15 second TPF ( :roll: ) taking over 4 days to complete. :|

Don't know what to conclude since there has been so much time since this was completed.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 7611 (Run 1, Clone 0, Gen 294)

Post by bruce »

GreyWhiskers wrote:My HFM.net records (maybe incomplete) show one 7611: 1, 56, 45 returned on May 18, 2012 on a Uniprocessor client using Core A4 2.27. This was run on a Pentium 4/HT 3.2 GHz single core, two thread CPU as a Uniprocessor with a 1 hour 15 minute 15 second TPF ( :roll: ) taking over 4 days to complete. :|

Don't know what to conclude since there has been so much time since this was completed.
Hi GreyWhiskers (team 0),
Your WU (P7611 R1 C56 G45) was added to the stats database on 2012-05-18 02:07:48 for 1461.74 points of credit.
Days taken to complete WU: 4.27
leibold
Posts: 3
Joined: Sun Dec 09, 2012 6:15 pm

Re: Project: 7611 (Run 1, Clone 0, Gen 294)

Post by leibold »

mmonnin wrote:I've returned 5 of these WUs myself.
I'm aware that not all project 7611 workunits are bad. The same workstation processed another 7611 workunit in October: Project: 7611 (Run 1, Clone 75, Gen 98)

That one ran for 1 1/2 days and completed successfully. Since that is the only successful 7611 workunit on this machine I have no reference to tell whether or not that is the normal speed.

In that other thread about slow 7611 workunits (linked above) it was mentioned that less then .1% of the 7611 workunits are problematic.
Member of dslreports.com Team Helix (Folding@Home team number 4)
Post Reply