Postby mc84ss » Sun Mar 04, 2012 11:33 am

This is what i'm getting from my Linux Ubuntu 10.04LTS Running Project 6098 run 6, clone 77, gen 80
This happens for some of the units I finish. Some are around 2% which is normal.
I tried to find more info on load balancing and found that it happens and is ok, but could only find results with 1 - 8% inbalance, not 151245856%
I'm still getting about 14k PPD and the units keep chugging along.
I'm running a AMD X6 1055t

Average load imbalance: -100 %
Part of the total run time spent waiting due to load imbalance: 151245856.0%
Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0 %
NOTE: 151245856.0 % performance was lost due to load imbalance
in the domain decomposition.

Is there something I need to change are the units getting done correctly?

Posts: 4
Joined: Fri Apr 22, 2011 7:23 pm

Postby codysluder » Sun Mar 04, 2012 1:13 pm

One suggestion: Are you using a realistic number of threads when you set -smp N? If you're running applications which continuously use CPU time at a nice level the causes it to preempt the FahCore, there will be a large imbalance as n-1 of FAH's threads complete but then have to wait for the last thread to catch up. I've seen recommendations for Windows clients that running -smp 6 on an 8-thread hyperthreaded i7 can easily run faster than running -smp 8 during the times that a long-term virus scan is running or a filesystem is being compressed or something like that. Nobody has ever reported it imbalance numbers, but I'll bet they're horrible if someone tries to run boinc and FAH at the same time.

Automatic load rebalancing of the domain decomposition can only help if thread N is consistently slower than the others, and the task scheduler is going to pick a different thread every time.

Use only cpu resources that will be free almost all of the time.
Posts: 2128
Joined: Sun Dec 02, 2007 12:43 pm

Postby mc84ss » Mon Mar 05, 2012 2:01 am

I did not adjust the smp flag to change it, it just detects 6 processors so it uses all 6. The computer sat idle with almost no use during the 17 hours it did that unit. It also posted my highest point value for that unit during that run, so it's not like the computer was being used. Also on the block before the load imbalance was like 2% and the block after was 95940840.0%. So when it says something like

"NOTE: 95940840.0% performance was lost due to load imbalance
in the domain decomposition."

I just wanted to make sure that with wacky numbers like that nothing was wrong with the units that I had and, more importantly, the data that I was returning.
Posts: 4
Joined: Fri Apr 22, 2011 7:23 pm

