WU reassigned before timeout or after "Ok" completion?

Moderators: Site Moderators, FAHC Science Team

WU reassigned before timeout or after "Ok" completion?

Postby Hopfgeist » Fri Jul 24, 2020 11:57 am

I sometimes look up the stats of individual work units I have done, sometimes to discover that they had been previously assigned, but returned "Faulty 2" (sounds almost like "42" :lol: ), or vice-versa, that my client had returned a fault, or had let a WU run past the timeout, so it got reassigned. I fully understand that, so please don't explain the normal procedure to me.

But take a look at this work unit: it has been reassigned three times within roughly one hour or less after the previous assignment, long before any timeout.

What makes this even weirder is that the units got reassigned shortly after the "Credited" time (sometimes within less than a minute), which is when the unit was actually uploaded to the work server, but before the "Returned" date, which I figure is the time a server one rung up from the download/upload servers takes a look at it, updates the database, and creates follow-up work units (next "Gen")

How and why does this happen? It seems like a big waste of computing power if this happens regularly (I found at least two instances just searching for a minute through random permutations of RCG parameters for Project 14717).


Cheers,
HG
Image
Sun Fire X2270 M2: 2x Xeon X5675
Hopfgeist
 
Posts: 30
Joined: Thu Jul 09, 2020 1:07 pm
Location: Germany

Re: WU reassigned before timeout or after "Ok" completion?

Postby Neil-B » Fri Jul 24, 2020 1:43 pm

So first thing ..., Credited and Received are labeled incorrectly and should be labeled the other way round - known issue.

As to rapid reissues of same WU ... There has been on occasion a bug with the generation scripts of the WS where it fails to increment up ... reports such as yours allow these to be spotted (if already not spotted by researcher and the server sorted) ... In the past this has happened when projects have been move from one server to another (fairly rare occurrence) which sometimes breaks the scripts and I believe this may have happened to some projects as a result of the recent issues with various servers being down.

Hopefully a message will be forwarded to the researcher concerned to check the scripts on the server.
1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent, Quadro K420 1GB, FAH 7.6.13
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro, Quadro M1000M 2GB, FAH 7.6.13
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro, GTX 750Ti 2GB, FAH 7.6.13
Neil-B
 
Posts: 1409
Joined: Sun Mar 22, 2020 6:52 pm
Location: UK

Re: WU reassigned before timeout or after "Ok" completion?

Postby Hopfgeist » Fri Jul 24, 2020 2:21 pm

Thanks, Neil-B, I though it might have to do with a slight hiccup when shuffling projects between servers in the middle of a run.

I don't mind how the columns are labelled, I just noticed that the "Credited" column coincided exactly with the time my client reports a successful upload. And it makes sense to award points for that exact time because any subsequent processing server-side is out of the control of the client and should thus not be penalised when calculating the bonus, so if that is the way it works, I consider the "Credited" column labelled correctly, but indeed, "Returned" would certainly be correct.

Maybe "Returned" should be labelled "Processed" or something, but I don't really know enough of the internals to say for sure what that time represents.

I'm not really in it for the points anyway (although I like to keep track), but to do my little part in supporting science and fighting disease.

Is there a another place (besides this forum) to report such occurrences?

Cheers,
HG
Hopfgeist
 
Posts: 30
Joined: Thu Jul 09, 2020 1:07 pm
Location: Germany

Re: WU reassigned before timeout or after "Ok" completion?

Postby bruce » Sat Jul 25, 2020 3:20 am

Hopfgeist wrote: I though it might have to do with a slight hiccup when shuffling projects between servers in the middle of a run.


That would be my guess.

If a project is on server A which is either running out of disk space or seeing too much traffic, changes need to be made. Moving it to server B would seem to be simple (it's not) but there are already a number of WUs being processed by you folks which are pre-programmed to be returned to server A. The project owner needs to capture those WUs that arrive at A and manually transfer them to B.
bruce
 
Posts: 20009
Joined: Thu Nov 29, 2007 11:13 pm
Location: So. Cal.


Return to Issues with a specific WU

Who is online

Users browsing this forum: No registered users and 2 guests

cron