How many -bigadv A2's are left? [Ans: for Linux, NONE]

The most demanding Projects are only available to a small percentage of very high-end servers.

Moderators: Site Moderators, PandeGroup

Re: How many -bigadv A2's are left?

Postby 7im » Wed Jun 02, 2010 8:02 pm

Yes, thank you detailing the potential downsides that OTHER users might run in to. Users who aren't already running a VM, and who do have more than 8 cores, neither of which apply to this OP. ;)
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
User avatar
7im
 
Posts: 15147
Joined: Thu Nov 29, 2007 4:30 pm
Location: Arizona

Re: How many -bigadv A2's are left?

Postby zero2dash » Tue Jun 08, 2010 5:20 am

The thing that's killing me is the instability of the A3 bigadv. The other day I closed the VM as I should have (control+C, wait, then shutdown the VM). Upon restarting (after a system restart - that's why I had to close it down temporarily), it flushed itself and DL another A3. Granted it wasn't a huge loss, it was only around 12% completed...but still, that's over 12hrs of time just flushed down the toilet. (CoreStatus=FF, checkpoint out of sync etc. or whatever it's a flush to me. :lol:)

A2's seem pretty hardy; they take quite a beating (IME). I've had system crashes and no proper VM shutdown and the A2 still goes about it's business, chugging along merrily. ;)
A3's scare the heck out of me. :o I finished an A2 earlier on i7-1 and it DL an A3...I ended up switching over to Win SMP on that machine. It's kinda gotten to a point where I weigh out the pros and cons and there's really no benefit to keep chugging A3 bigadv in the VM. Even in a worst case scenario I still break even on 4 days of processing time on normal A3's with bonus. Best case the A3 bigadv finishes, I may get a few K higher on a few hours less time. It's like being on the 'folding fence'. :o
Working on GROwing Monsters And Cloning Shrimps
[H] is for hex. Got yours? -tjmagneto
Image
User avatar
zero2dash
 
Posts: 19
Joined: Tue Mar 23, 2010 1:43 pm
Location: Fenton, MO USA

Re: How many -bigadv A2's are left?

Postby k1wi » Tue Jun 08, 2010 6:27 am

What VM image were you using?

Edit: I see your using LinuxFAH's VM...

I've experience a number of unsuccessful restarts with that VM on A2s, so I'm not sure its valid to point to the core as the problem.

k1wi
Image
k1wi
 
Posts: 1321
Joined: Tue Sep 22, 2009 10:48 pm

Re: How many -bigadv A2's are left?

Postby toTOW » Tue Jun 08, 2010 8:36 am

When VMs are involved, it's always hard to point at the core fault :(
Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.

FAH-Addict : latest news, tests and reviews about Folding@Home project.

Image
User avatar
toTOW
Super Moderator
 
Posts: 8798
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France

Re: How many -bigadv A2's are left?

Postby Wrish » Tue Jun 08, 2010 11:38 am

My only A3 bigadv to be shut down also flushed and restarted... and I'm on native Ubuntu 8.04. It's not very stable compared to A2, which withstood two dozen shut downs without a loss. Lost 1.5 days of work on that A3.
Wrish
 
Posts: 390
Joined: Thu Jan 28, 2010 5:09 am

Re: How many -bigadv A2's are left?

Postby zero2dash » Tue Jun 08, 2010 3:59 pm

Sorry, I didn't mean to point the blame at the core as being the problem; I just generalize everything in two categories, flush and no flush. :lol:
The fear of A3 bigadv instability has just gotten to me at this point. Really the side/other reason I went to native Win SMP2 now is that my primary machine will be taken down for an hour or two later, I have a few things to do including a HD swap, HSF swap, and other general maintenance/cable work to do. I know I could've let that machine chug the bigadv A3 from last night til tonight when I shut it down for a bit, but honestly I'm too scared that once I bring the machine back up, the A3 will flush and I'll lose a day's worth of bigadv A3 work when I could have gotten the 15k normal A3 credit (according to HFM.net) for chugging out a few normal A3 WU's throughout the day.

I will say in all honesty once this quick cleaning is over with and the normal A3 at that time finishes, I will *probably* switch back on the VM and let it run. At that point I'll know the machine won't really have to be taken down anytime soon, so I can let the A3 run and run and run and not worry about a crash and flush from closing it. I want to help the cause and help finish the A2's that are out there, I just hope I grab some. :)
User avatar
zero2dash
 
Posts: 19
Joined: Tue Mar 23, 2010 1:43 pm
Location: Fenton, MO USA

Re: How many -bigadv A2's are left?

Postby leexgx » Tue Jun 08, 2010 4:44 pm

A3 seem less robust then A2, i tend to have random fails on A3 work units (just wish it do it at 1-3% not Time wasteing 35%) and thats Normal running i dont dare CTRL+c it or use th backup and restore option that LinuxFAH 1.2 has

fail seem to be something like cant find end of flile

Better to Suspend and power off the VM as that seem to work flawlessly
Image
leexgx
 
Posts: 562
Joined: Mon Dec 03, 2007 8:05 am

Re: How many -bigadv A2's are left?

Postby bruce » Tue Jun 08, 2010 6:29 pm

I don't know how to identify the reasons why native A3 may be more fragile than other SMP cores. Personally, I have had very few problems with any SMP WUs, even the A1's probably because I typically run 24x7 and I typically use no overclocking or I'm very conservative with the settings. When I do shut down (often to add -oneunit so I can switch to a different client or different configuration or test something new) it has always restarted reliably for me. That's also true on a couple of systems where I have a scheduled task that kills fah6.exe early in the morning.

If there's a specific combination of events that makes A3 less reliable -- other than issues that can be attributed to filesystems that don't flush the buffers to disk (such as improper shutdowns), we need to figure out those conditions are.
bruce
Site Admin
 
Posts: 20180
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: How many -bigadv A2's are left?

Postby autogrog » Tue Jun 08, 2010 6:56 pm

P5-133XL wrote:One thing to note about the unstable nature of the new A3 -bigadv's is for those that are using -smp #. It was specifically mentioned in a long thread about i980x's that were using -smp 11and they could not complete a WU but -smp 12 worked fine. Kasson, said that it was a known problem and suggested using a non-prime number of cores.


Would you post the link to this thread? I haven't been able to find it.
autogrog
 
Posts: 48
Joined: Mon Aug 18, 2008 3:38 pm
Location: Halifax, Nova Scotia

Re: How many -bigadv A2's are left?

Postby radekboktor » Wed Jun 09, 2010 1:54 am

Hey bruce, I'm having the exact same shutdown/restart problem on a MacPro (stable and obviously no OC), except that it's always been A2 units. I think out of five stops, only one has restarted successfully. I don't know if this helps with the debugging, but I had no trouble restarting it from a backup, it read the checkpoint data just fine... the corruption or whatever seems to occur when using kill -15 to stop the cores.

I wanna point out that the standard A2 cores on OSX 10.5 have always left phantom processes out there that had to be killed manually. However, they didn't corrupt the wuresults.dat file. And the bigadv units aren't leaving phantom cores behind. Nonetheless, manual stopping has always been problematic with the A2 core. Not sure what an improper shutdown would be... I usually use Increase.
radekboktor
 
Posts: 29
Joined: Tue May 11, 2010 5:34 am

Re: How many -bigadv A2's are left?

Postby bruce » Wed Jun 09, 2010 7:07 am

radekboktor wrote:...the corruption or whatever seems to occur when using kill -15 to stop the cores.


I don't run a Mac personally, so this is a general statement that might not apply to you but I think it does.

In a normal shutdown, the client receives a CTRL-C signal (or someone picks Quit on the menu of a Windows Systray client) and the client shuts down. The FahCore(s) continue to run until they detect that the client is no longer running and they shut themselves down. The FahCore(s) have segments of code that should not be interrupted followed by points at which they can shut themselves down cleanly. They only check for the client at clean-shutdown points.

A forced kill of the FahCore(s) at the wrong time might be the cause of your problems.

Different cores have different frequencies of clean-shutdown points and it probably also depends on the characteristics of the WU. I have no way of knowing how long you should wait.

In the MPI cores (a1/a2) with V5, there were cases where the FahCore hangs and never shuts down. That was a bug -- and conceivably may still be a bug in a3 with v6. Diagnosing a core that hangs is very difficult, particularly if it always shuts down properly in the lab but hangs for an unknown reason on your system.

Do you (or anybody else) know how to reproduce a hung core, and what might be done to avoid this situation?
bruce
Site Admin
 
Posts: 20180
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: How many -bigadv A2's are left?

Postby PantherX » Wed Jun 09, 2010 11:42 am

autogrog wrote:
P5-133XL wrote:One thing to note about the unstable nature of the new A3 -bigadv's is for those that are using -smp #. It was specifically mentioned in a long thread about i980x's that were using -smp 11and they could not complete a WU but -smp 12 worked fine. Kasson, said that it was a known problem and suggested using a non-prime number of cores.


Would you post the link to this thread? I haven't been able to find it.

Link: (http://foldingforum.org/viewtopic.php?f=58&t=14423&p=143042#p143042)
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Chrome Folding App (Beta) Ӂ Troubleshooting "Bad WUs" Ӂ Troubleshooting Server Connectivity Issues
User avatar
PantherX
 
Posts: 6614
Joined: Wed Dec 23, 2009 9:33 am

Re: How many -bigadv A2's are left?

Postby DrSpalding » Tue Jun 29, 2010 5:24 am

Bump.

I haven't gotten an A2 bigadv WU for a couple of days now. I also haven't gotten any A3 bigadv WUs for 24 hours or so now either. It may be that the project 2684 instabilities that have evidently caused many of the last week's WUs to be flushed and either restarted or EUE returns have put me below the 80% line and I am not getting any -bigadv WUs. Can someone tell us if the A2 -bigadv pool is now empty or not or am I just having to build back up to the 80%+ success rate?
Not a real doctor, I just play one on the 'net!
Image
DrSpalding
 
Posts: 177
Joined: Wed May 27, 2009 4:48 pm

Re: How many -bigadv A2's are left?

Postby PantherX » Tue Jun 29, 2010 6:15 am

DrSpalding wrote:It may be that the project 2684 instabilities that have evidently caused many of the last week's WUs to be flushed and either restarted or EUE returns have put me below the 80% line and I am not getting any -bigadv WUs. Can someone tell us if the A2 -bigadv pool is now empty or not or am I just having to build back up to the 80%+ success rate?

AFAIK, the 80% completion rate only effects your bonus points. It doesn't effect what WUs are available for you to fold. Also if the Machine connects to the Servers and uploads some data I think that it would be counted in your 80% rate as you processed the WU. Dumping a WU would mean that the Servers never got any response from your Machine about that WU which will effect your 80% rate.
User avatar
PantherX
 
Posts: 6614
Joined: Wed Dec 23, 2009 9:33 am

Re: How many -bigadv A2's are left?

Postby kasson » Tue Jun 29, 2010 9:42 am

We've turned off bigadv on linux for the time being. We're switching bigadv over to A3, but as you may have noticed there's a bug in the linux A3 core that affects bigadv stability. We've been working on the bug, but it's still in hiding at this time. If you're running VM's under windows, I'd suggest trying the native windows client, which is still enabled for bigadv. OS/X is also still enabled.
User avatar
kasson
Pande Group Member
 
Posts: 1909
Joined: Thu Nov 29, 2007 9:37 pm

PreviousNext

Return to SMP with bigadv

Who is online

Users browsing this forum: No registered users and 1 guest

cron