Project 2682 malloc error

The most demanding Projects are only available to a small percentage of very high-end servers.

Moderators: Site Moderators, PandeGroup

Project 2682 malloc error

Postby stevew » Fri Aug 06, 2010 12:39 pm

98 Bigadv WUs completed and I didn't save the terminal output, dumb. Rebooted and only have FAHlog.txt which does not show the error.

OS X 10.6.4. 8-core, 12 GB

Server 171.67.108.22, Project 2682, 19 retries so far since 2010-08-06 03:12:25. Loaded queue successfully" each time and then tries again.

From memory then, first 2682 WU throws a malloc error "line 22", "cannot allocate region".

Code: Select all
[09:24:47] Project: 2682 (Run 3, Clone 7, Gen 18)
[09:24:47]
[09:24:49] Entering M.D.
[09:25:22] mdrun returned 12
[09:25:22] Going to send back what have done -- stepsTotalG=250000
[09:25:22] Work fraction=0.0000 steps=250000.
[09:25:26] logfile size=11881 infoLength=11881 edr=0 trr=25
[09:25:26] logfile size: 11881 info=11881 bed=0 hdr=25
[09:25:26] - Writing 12419 bytes of core data to disk...
[09:25:26]   ... Done.
[09:25:31]
[09:25:31] Folding@home Core Shutdown: EARLY_UNIT_END
[09:25:31] CoreStatus = 72 (114)
[09:25:31] Sending work to server
[09:25:31] Project: 2682 (Run 3, Clone 7, Gen 18)

=== cut ====

[09:27:05] Project: 2682 (Run 3, Clone 7, Gen 18)
[09:27:05]
[09:27:07] Assembly optimizations on if available.
[09:27:07] Entering M.D.
[09:27:39] mdrun returned 12
[09:27:39] Going to send back what have done -- stepsTotalG=250000
[09:27:39] Work fraction=0.0000 steps=250000.
[09:27:43] logfile size=11881 infoLength=11881 edr=0 trr=25
[09:27:43] logfile size: 11881 info=11881 bed=0 hdr=25
[09:27:43] - Writing 12419 bytes of core data to disk...
[09:27:43]   ... Done.
[09:27:49]
[09:27:49] Folding@home Core Shutdown: EARLY_UNIT_END
[09:27:49] CoreStatus = 72 (114)
stevew
 
Posts: 143
Joined: Mon Dec 03, 2007 11:53 pm
Location: Team Hack-A-Day

Re: Project 2682 malloc error

Postby Parja » Fri Aug 06, 2010 4:31 pm

FahCore_A3.exe keeps crashing on me on the exact same WU 2682 (R3,C7,G18) in Win7 Ultimate x64.

Here's what I keep getting in the terminal...

Code: Select all
[16:07:56] Project: 2682 (Run 3, Clone 7, Gen 18)
[16:07:56]
[16:07:58] Assembly optimizations on if available.
[16:07:58] Entering M.D.
[16:08:14] Gromacs cannot continue further.
[16:08:14] Going to send back what have done -- stepsTotalG=250000
[16:08:14] Work fraction=0.0000 steps=250000.
[16:08:20] CoreStatus = C0000005 (-1073741819)
[16:08:20] Client-core communications error: ERROR 0xc0000005
[16:08:20] Deleting current work unit & continuing...
[16:08:40] Trying to send all finished work units
[16:08:40] + No unsent completed units remaining.
[16:08:40] - Preparing to get new work unit...
Parja
 
Posts: 22
Joined: Sat Jun 28, 2008 1:38 am

Re: Project 2682 malloc error

Postby toTOW » Fri Aug 06, 2010 4:36 pm

Count me in ...

Same WU (P/R/C/G) as both of you, and same error as Parja's. On XP SP3 32 bits.
Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.

FAH-Addict : latest news, tests and reviews about Folding@Home project.

Image
User avatar
toTOW
Site Moderator
 
Posts: 8763
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France

Re: Project 2682 malloc error

Postby stevew » Fri Aug 06, 2010 5:04 pm

It would be good to find a way to break off from that assignment server and get WUs to fold. My box could have finished a 6701 in the down time. . . Will wait patiently and admire my cpus' temps which haven't been this low in ages, 98% idle instead of 2% idle :)
stevew
 
Posts: 143
Joined: Mon Dec 03, 2007 11:53 pm
Location: Team Hack-A-Day

Re: Project 2682 malloc error

Postby zero2dash » Fri Aug 06, 2010 5:40 pm

Just picked up one of these myself after finishing a 2685....my 6.30 drop in/no MPICH isn't having any trouble working on it - no crash for me. :?:
Same one too. Project: 2682 (Run 3, Clone 7, Gen 18)
Code: Select all
[17:27:01] + Processing work unit
[17:27:01] Core required: FahCore_a3.exe
[17:27:01] Core found.
[17:27:01] Working on queue slot 03 [August 6 17:27:01 UTC]
[17:27:01] + Working ...
[17:27:01] - Calling '.\FahCore_a3.exe -dir work/ -nice 19 -suffix 03 -np 8 -checkpoint 15 -verbose -lifeline 5432 -version 630'

[17:27:01]
[17:27:01] *------------------------------*
[17:27:01] Folding@Home Gromacs SMP Core
[17:27:01] Version 2.22 (Mar 12, 2010)
[17:27:01]
[17:27:01] Preparing to commence simulation
[17:27:01] - Looking at optimizations...
[17:27:01] - Created dyn
[17:27:01] - Files status OK
[17:28:08] - Expanded 30329586 -> 159726549 (decompressed 101.8 percent)
[17:28:08] Called DecompressByteArray: compressed_data_size=30329586 data_size=159726549, decompressed_data_size=159726549 diff=0
[17:28:08] - Digital signature verified
[17:28:08]
[17:28:08] Project: 2682 (Run 3, Clone 7, Gen 18)
[17:28:08]
[17:28:10] Assembly optimizations on if available.
[17:28:10] Entering M.D.
[17:28:26] Completed 0 out of 250000 steps  (0%)

Checked my cpu utilization, I'm at 100% across all 8 cores, so it is working. :o
i7 920 @ 4.011, 12gb ram, 7x64
Last edited by zero2dash on Fri Aug 06, 2010 5:42 pm, edited 1 time in total.
Working on GROwing Monsters And Cloning Shrimps
[H] is for hex. Got yours? -tjmagneto
Image
User avatar
zero2dash
 
Posts: 19
Joined: Tue Mar 23, 2010 1:43 pm
Location: Fenton, MO USA

Re: Project 2682 malloc error

Postby kasson » Fri Aug 06, 2010 5:41 pm

Sounds like this may have been a tricky work unit. Glad someone's able to crunch it--otherwise we'd tag and remove it.
User avatar
kasson
Pande Group Member
 
Posts: 1906
Joined: Thu Nov 29, 2007 9:37 pm

Re: Project 2682 malloc error

Postby toTOW » Fri Aug 06, 2010 6:40 pm

The WU might have been bad, but something is messing the AS ... I still can't get a WU :

Code: Select all
[18:38:16] Initial: 0000; - Error: Bad packet type from server, expected work assignment
[18:38:18] - Attempt #3  to get work failed, and no other work to do.
Waiting before retry.
[18:38:47] + Attempting to get work packet
[18:38:47] Passkey found
[18:38:47] - Will indicate memory of 2037 MB
[18:38:47] - Connecting to assignment server
[18:38:47] Connecting to http://assign.stanford.edu:8080/
[18:38:48] Posted data.
[18:38:48] Initial: 43AB; - Successful: assigned to (171.67.108.22).
[18:38:48] + News From Folding@Home: Welcome to Folding@Home
[18:38:48] Loaded queue successfully.
[18:38:49] Sent data
[18:38:49] Connecting to http://171.67.108.22:8080/
[18:38:49] Posted data.
[18:38:49] Initial: 0000; - Error: Bad packet type from server, expected work assignment
[18:38:50] - Attempt #4  to get work failed, and no other work to do.
Waiting before retry.


:(
User avatar
toTOW
Site Moderator
 
Posts: 8763
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France

Re: Project 2682 malloc error

Postby kasson » Fri Aug 06, 2010 6:55 pm

It's actually a WS rather than an AS issue. It's trying to remove the previous job it gave you. Try deleting your machinedependent.dat file.
User avatar
kasson
Pande Group Member
 
Posts: 1906
Joined: Thu Nov 29, 2007 9:37 pm

Re: Project 2682 malloc error

Postby stevew » Fri Aug 06, 2010 7:06 pm

That's got going again. Thank you.
Code: Select all
 Index 4: folding now
  server: 171.67.108.22:8080; project: 2686
  Folding: run 2, clone 2, generation 7; benchmark 0; misc: 500, 200
stevew
 
Posts: 143
Joined: Mon Dec 03, 2007 11:53 pm
Location: Team Hack-A-Day

Re: Project 2682 malloc error

Postby toTOW » Fri Aug 06, 2010 10:35 pm

I got that WU again 30 minutes ago ... :?

I guess I'll have to wait for 6 day before getting something else ... or to change my flags ... :(
User avatar
toTOW
Site Moderator
 
Posts: 8763
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France

Re: Project 2682 malloc error

Postby Parja » Fri Aug 06, 2010 10:57 pm

toTOW wrote:I got that WU again 30 minutes ago ... :?

I guess I'll have to wait for 6 day before getting something else ... or to change my flags ... :(


Yup, I just kept getting the same WU over and over again, so I wiped out all of my files and started over without the -bigadv flag.
Parja
 
Posts: 22
Joined: Sat Jun 28, 2008 1:38 am

Re: Project 2682 malloc error

Postby GeneralRavel » Fri Aug 06, 2010 11:31 pm

I'm running a P2682 in Windows 7 x64.....
It seems to use alot more memory than the 2685s. Current usage is over 2,000,000 K of RAM.
Not sure why, but total system usage is about 5.2Gigs and I am not running anything besides the SMP and 2 GPU clients.
You guys have plenty of RAM available for this one?
Anyone remember Marty's Quake II Playhouse? :)
Image
User avatar
GeneralRavel
 
Posts: 59
Joined: Sun May 23, 2010 10:18 am
Location: Ohio

Re: Project 2682 malloc error

Postby stevew » Fri Aug 06, 2010 11:40 pm

@GeneralRavel, Glad that you have a 2682 running. I took kasson's advice and got rid of machinedependent.dat, restarted fah6 and immediately got a different WU, a 2686. As for RAM, there was more than 10 GB free when I started the 2682 WU so it should have had room.
stevew
 
Posts: 143
Joined: Mon Dec 03, 2007 11:53 pm
Location: Team Hack-A-Day

Re: Project 2682 malloc error

Postby Grandpa_01 » Sat Aug 07, 2010 12:44 am

I have this WU running also it seems to be running OK so far on a 920@4.3Ghz so I would say it most likely isn't a WU issue. I would start looking for something in common with the machines that have been having problems with it. My specks are as follows.
Windows 7 64bit
I7 920@4.3Ghz @ 1.34v running at 88C to 92C
6GB SuperTalent 1800 @ 1700MHz @ 1.64v 8-8-8-21-60-1T
Drooped in 6.30 into previous MPICH folder did not remove previous install and running as service.
It is using about 1GB more ram than previous 26XX using 3471MB
29min 25sec frame times currently at 5%
Running ATI GPU 4870 X2 with latest drivers but not folding with them.

Code: Select all
[21:52:16] *------------------------------*
[21:52:16] Folding@Home Gromacs SMP Core
[21:52:16] Version 2.22 (Mar 12, 2010)
[21:52:16]
[21:52:16] Preparing to commence simulation
[21:52:16] - Looking at optimizations...
[21:52:16] - Created dyn
[21:52:16] - Files status OK
[21:53:17] - Expanded 30329586 -> 159726549 (decompressed 101.8 percent)
[21:53:17] Called DecompressByteArray: compressed_data_size=30329586 data_size=159726549, decompressed_data_size=159726549 diff=0
[21:53:19] - Digital signature verified
[21:53:19]
[21:53:19] Project: 2682 (Run 3, Clone 7, Gen 18)
[21:53:19]
[21:53:19] Assembly optimizations on if available.
[21:53:19] Entering M.D.
[21:53:34] Completed 0 out of 250000 steps  (0%)
[22:22:58] Completed 2500 out of 250000 steps  (1%)
[22:52:21] Completed 5000 out of 250000 steps  (2%)
[23:21:49] Completed 7500 out of 250000 steps  (3%)
[23:51:16] Completed 10000 out of 250000 steps  (4%)
[00:20:57] Completed 12500 out of 250000 steps  (5%)
Image
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
User avatar
Grandpa_01
 
Posts: 1757
Joined: Wed Mar 04, 2009 7:36 am

Re: Project 2682 malloc error

Postby tear » Sat Aug 07, 2010 3:26 am

WU isn't an issue but decompressor code probably is. The fact of issue occurring only with some RTLs suggests it's a buffer overrun/uninitialized variable or similar.

I've seen one today with P2682 (don't remember RCG and I'm too lazy to turn the machine back on); it consitently failed a number (5+) of times.
Every one looked the same. Decompression ate the CPU for ~eight minutes and then... KABOOM.

Given number of field reports there should be no problems with lab reproduction.


tear
One man's ceiling is another man's floor.
Image
tear
 
Posts: 857
Joined: Sun Dec 02, 2007 4:08 am
Location: Rocky Mountains

Next

Return to SMP with bigadv

Who is online

Users browsing this forum: No registered users and 9 guests

cron