Wither the bigadv WUs?

The most demanding Projects are only available to a small percentage of very high-end servers.

Moderators: Site Moderators, PandeGroup

Wither the bigadv WUs?

Postby DrSpalding » Sat Oct 29, 2011 2:45 am

Starting yesterday (Thursday, 27 October) my SMP -bigadv clients started getting standard SMP WUs. I looked around and didn't see anyone else asking about it, so I am wondering if it is just me (unlikely) or if there are too few -bigadv WUs out there for too many -bigadv clients. Just wondering mostly, since the machines are still busy with standard SMP WUs instead of stalled completely as they have been when server problems prevented them from moving to the standard WUs.
Not a real doctor, I just play one on the 'net!
Image
DrSpalding
 
Posts: 177
Joined: Wed May 27, 2009 4:48 pm

Re: Wither the bigadv WUs?

Postby Grandpa_01 » Sat Oct 29, 2011 2:54 am

I saw over at the [H]ard Forums yesterday where some people were saying they were getting -smp WU's on bigadv rigs.
Image
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
User avatar
Grandpa_01
 
Posts: 1863
Joined: Wed Mar 04, 2009 7:36 am

Re: Wither the bigadv WUs?

Postby bollix47 » Sat Oct 29, 2011 3:16 am

I received a regular smp on one of my bigadv clients earlier today but then it went back to bigadv and another client went from bigadv to bigadv ... so not too serious a problem if indeed one exists.
bollix47
 
Posts: 3345
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: Wither the bigadv WUs?

Postby kasson » Sat Oct 29, 2011 6:23 am

We're shifting some servers around, so you may see a slight decrease in bigadv WU availability over the next ~2 weeks. The clients should roll over to standard SMP in the meantime if we run short.
User avatar
kasson
Pande Group Member
 
Posts: 1909
Joined: Thu Nov 29, 2007 9:37 pm

Re: Wither the bigadv WUs?

Postby Nathan_P » Sat Oct 29, 2011 11:50 am

Grandpa_01 wrote:I saw over at the [H]ard Forums yesterday where some people were saying they were getting -smp WU's on bigadv rigs.


Yeah, Have not had a bigadv on my dual 5670 machine since the 25th.
Image
Nathan_P
 
Posts: 1584
Joined: Wed Apr 01, 2009 9:22 pm
Location: Jersey, Channel islands

Re: Wither the bigadv WUs?

Postby DrSpalding » Sat Oct 29, 2011 1:55 pm

kasson wrote:We're shifting some servers around, so you may see a slight decrease in bigadv WU availability over the next ~2 weeks. The clients should roll over to standard SMP in the meantime if we run short.


Thanks kasson.

Overnight, two of the three machines picked up P6900 WUs.
DrSpalding
 
Posts: 177
Joined: Wed May 27, 2009 4:48 pm

Re: Wither the bigadv WUs?

Postby arvidab » Wed Nov 02, 2011 9:26 pm

Could this have something to do with the difficulties I have uploading finished -bigadvs? It takes sever retries by the client over several hours. Last one was completed 15h ago and hasn't been successfully sent yet, but I got another one sent away 30min before that completed ~7h earlier (and picked up a 6900). Uploading at ~100 KiB/s and stops uploading after ~40MiB has been sent.
arvidab
 
Posts: 10
Joined: Tue Oct 12, 2010 9:06 am

Re: Wither the bigadv WUs?

Postby Grandpa_01 » Thu Nov 03, 2011 1:42 am

Sounds like you have a connection speed problem and you are timing out before it uploads. What is you internet connection speed ?
User avatar
Grandpa_01
 
Posts: 1863
Joined: Wed Mar 04, 2009 7:36 am

Re: Wither the bigadv WUs?

Postby Jesse_V » Thu Nov 03, 2011 2:44 am

Yes the Internet speed would help. Please go to speedtest.net and give us the number, since often times that's more accurate than the figure your ISP sells you. :)
User avatar
Jesse_V
 
Posts: 2893
Joined: Mon Jul 18, 2011 4:44 am
Location: Logan, Utah, USA

Re: Wither the bigadv WUs?

Postby GreyWhiskers » Thu Nov 03, 2011 7:31 am

I also noted a slowdown in the -bigadv uploads this afternoon. I completed one just after 0000GMT (1720 PDT - Stanford local). I've completed about 100 -bigadv WUs since April this year, and through my Comcast cable internet connection I have a typical 6 minute upload time for the ~100Mbyte product. That's just about the size of the one it uploaded this afternoon.

Today, it took 16m 48s to upload the P6900 WU. Thereafter, it downloaded a non-bigadv WU (p6099) which it is currently chewing on.

I just ran a DSLReports speed test (Silicon Valley, CA to Los Angeles) which showed 11,477 Kbps download and 1,864 Kbps upload. Of course, Comcast lets you have a superfast "turbo boost" for the first couple of seconds of your upload/download, and reverts to a slower speed after that, but the connection seems to be working well. I also executed a tracert to the WS, which is one of those in Sweden. See below.

EDIT: here's the speedtest.net on my line to a server in Sweden, so it should be roughly comparable.
Image

Code: Select all
[00:20:51] Folding@home Core Shutdown: FINISHED_UNIT
[00:21:08] CoreStatus = 64 (100)
[00:21:08] Unit 7 finished with 60 percent of time to deadline remaining.
[00:21:08] Updated performance fraction: 0.781201
[00:21:08] Sending work to server
[00:21:08] Project: 6900 (Run 28, Clone 21, Gen 69)

[00:21:08] + Attempting to send results [November 3 00:21:08 UTC]
[00:21:08] - Reading file work/wuresults_07.dat from core
[00:21:08]   (Read 100191549 bytes from disk)
[00:21:08] Connecting to http://130.237.232.141:8080/
[00:37:56] Posted data.
[00:37:56] Initial: 0000; - Uploaded at ~97 kB/s
[00:37:56] - Averaged speed for that direction ~286 kB/s
[00:37:56] + Results successfully sent
[00:37:56] Thank you for your contribution to Folding@Home.
[00:37:56] + Number of Units Completed: 120



Code: Select all
Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

C:\Users\xxx>tracert 130.237.232.141

Tracing route to folding-4.pdc.kth.se [130.237.232.141]
over a maximum of 30 hops:

  1     *        4 ms     1 ms  192.168.1.1
  2    44 ms    27 ms    28 ms  24.6.168.1
  3    12 ms    13 ms    12 ms  te-2-3-ur04.santaclara.ca.sfba.comcast.net [68.8
7.198.165]
  4    16 ms    13 ms    12 ms  te-1-11-0-3-ar01.oakland.ca.sfba.comcast.net [68
.86.143.98]
  5    15 ms    18 ms    17 ms  pos-2-2-0-0-cr01.sacramento.ca.ibone.comcast.net
 [68.86.90.137]
  6    16 ms    17 ms    16 ms  pos-0-8-0-0-cr01.sanjose.ca.ibone.comcast.net [6
8.86.85.78]
  7    25 ms    23 ms    20 ms  pos-0-5-0-0-pe01.11greatoaks.ca.ibone.comcast.ne
t [68.86.87.162]
  8    35 ms    18 ms    21 ms  xe-9-3-0.sjc10.ip4.tinet.net [213.200.80.165]
  9   185 ms     *      209 ms  xe-0-0-0.cph10.ip4.tinet.net [89.149.185.181]
 10   205 ms   204 ms   202 ms  nordunet-gw.ip4.tinet.net [77.67.73.78]
 11   207 ms   208 ms   210 ms  se-tug.nordu.net [109.105.97.9]
 12   210 ms   227 ms   244 ms  t1tug.sunet.se [109.105.102.18]
 13   235 ms   211 ms   210 ms  t1fre-ae0-v1.sunet.se [130.242.83.37]
 14   210 ms   208 ms   221 ms  m1fre-ae1-v1.sunet.se [130.242.83.45]
 15   225 ms   205 ms   205 ms  ls-kth-br1.sunet.se [193.11.0.194]
 16   209 ms   208 ms   206 ms  pdc1-br1g-p2p.gw.kth.se [130.237.0.3]
 17   209 ms   210 ms   213 ms  pdc-juniper03-juniper01.pdc.kth.se [192.36.253.2
33]
 18   212 ms   213 ms   212 ms  folding-4.pdc.kth.se [130.237.232.141]

Trace complete.

C:\Users\xxxx>
User avatar
GreyWhiskers
 
Posts: 792
Joined: Mon Oct 25, 2010 5:57 am
Location: Saratoga, California USA

Re: Wither the bigadv WUs?

Postby [Ars] For Caitlin » Thu Nov 03, 2011 6:21 pm

I had a slow uploading -bigadv that I noticed because I happened to be watching the console when it uploaded. After about 15 minutes, I figured that I was having a network switch misconfiguration problem on my end that I had encountered previously, but ifconfig did not show the high numbers of errors, overruns, and collisions. I cranked up tcpdump and watched for awhile and it looked to my untrained eye that 1) Do not fragment was turned on, and 2) packet sizes were very small. Perhaps with the server movement at Stanford, something has been misconfigured on their end.
[Ars] For Caitlin
 
Posts: 40
Joined: Sun Jan 06, 2008 11:06 pm

Re: Wither the bigadv WUs?

Postby [Ars] For Caitlin » Thu Nov 03, 2011 6:52 pm

Upon further review of my logs, it looks like upload speed has really been affected over the last few days. I take about 48 hours to do a wu, so each one of those uploads is about 2 days apart. The last upload was today. You (Stanford) may want to check your switch configurations and take a look at dmesg and see what is going on when the interfaces come up.

Code: Select all
[folding@adcrac2q3 folding]$ cat nohup.out | grep "Uploaded at"
[06:23:08] Initial: 0000; - Uploaded at ~3242 kB/s
[21:00:43] Initial: 0000; - Uploaded at ~2385 kB/s
[06:02:15] Initial: 0000; - Uploaded at ~1825 kB/s
[20:40:07] Initial: 0000; - Uploaded at ~3154 kB/s
[11:15:13] Initial: 0000; - Uploaded at ~3259 kB/s
[02:04:27] Initial: 0000; - Uploaded at ~1995 kB/s
[02:06:45] Initial: 0000; - Uploaded at ~2748 kB/s
[02:09:28] Initial: 0000; - Uploaded at ~1644 kB/s
[01:57:10] Initial: 0000; - Uploaded at ~2616 kB/s
[02:07:09] Initial: 0000; - Uploaded at ~582 kB/s
[02:29:58] Initial: 0000; - Uploaded at ~445 kB/s
[02:55:16] Initial: 0000; - Uploaded at ~147 kB/s
[03:11:31] Initial: 0000; - Uploaded at ~502 kB/s
[03:14:58] Initial: 0000; - Uploaded at ~2649 kB/s
[03:23:14] Initial: 0000; - Uploaded at ~558 kB/s
[03:36:55] Initial: 0000; - Uploaded at ~119 kB/s
[04:35:02] Initial: 0000; - Uploaded at ~267 kB/s
 
[Ars] For Caitlin
 
Posts: 40
Joined: Sun Jan 06, 2008 11:06 pm

Re: Wither the bigadv WUs?

Postby bruce » Thu Nov 03, 2011 7:26 pm

I'm not a network expert, but doesn't the setting for Packet size and for Do_not_fragment get adjusted based on any kind of errors that are encountered between point A and point B? You seem to be assuming that the problem is in the server -- and it certainly might be -- but isn't there also the possibility of a problem in any router or any segment between you and the server?

Does traceroute (or some other similar tool, if such exists) shed any additional light on the problem?

Kasson's earlier comment about "...shifting some servers around..." might also be related. What are the IP addresses associated with your grep "Uploaded at" data?
bruce
Site Admin
 
Posts: 20181
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Wither the bigadv WUs?

Postby [Ars] For Caitlin » Thu Nov 03, 2011 9:54 pm

Sorry, yes, those are all the -bigadv server at 130.237.232.237. In my limited understanding, Do Not Fragment gets negotiated between the endpoints, and then if some hop along the path has to fragment because of MTU, the packets get silently dropped. I've worked an issue where a router got misconfigured to force DNF, and a hop somewhere along the way did not properly report its MTU (it was some bizarre intermediate service provider in Europe), and if the application sent more than 1480 something bytes including headers in a request, it disappeared. Took us awhile to figure that out, and we finally just set an artificially low MTU on that interface... but I digress. It was just odd to see DNF.

A traceroute shows things disappearing in Sweden, so I can't tell what is happening as it gets close to the work server.

Code: Select all
 tracert -F 130.235.237.235 1472
traceroute to 130.235.237.235 (130.235.237.235), 30 hops max, 1472 byte packets
 <snip>
 5  POS2-2.GW2.ATL4.ALTER.NET (157.130.90.249)  3.812 ms  3.814 ms  3.852 ms
 6  0.so-2-0-1.XT1.ATL4.ALTER.NET (152.63.86.150)  4.457 ms  4.095 ms  3.278 ms
 7  TenGigE0-4-0-2.GW7.ATL4.ALTER.NET (152.63.86.178)  3.387 ms TenGigE0-4-0-0.GW7.ATL4.ALTER.NET (152.63.86.162)  2.906 ms  4.946 ms
 8  teliasonera-gw.customer.alter.net (157.130.90.238)  4.808 ms  4.913 ms  4.542 ms
 9  ash-bb1-link.telia.net (80.91.252.213)  26.783 ms  24.253 ms  24.282 ms
10  nyk-bb2-link.telia.net (80.91.245.99)  84.341 ms  82.119 ms  82.151 ms
11  kbn-bb2-link.telia.net (80.91.254.90)  117.677 ms  116.531 ms  116.570 ms
12  s-bb2-link.telia.net (213.155.130.174)  126.191 ms  125.672 ms  125.596 ms
13  s-b3-link.telia.net (80.91.247.107)  125.478 ms  125.480 ms  124.570 ms
14  nordunet-113055-s-b3.c.telia.net (213.248.97.18)  118.608 ms  118.528 ms  118.556 ms
15  t1fre.sunet.se (109.105.102.10)  125.481 ms  119.034 ms  119.062 ms
16  m1fre-ae1-v1.sunet.se (130.242.83.45)  120.639 ms  120.590 ms  120.898 ms
17  lu-br1-xe-1-2-0.sunet.se (130.242.85.2)  134.663 ms  133.984 ms  133.974 ms
18  lu-g.sunet.se (193.11.20.10)  147.095 ms  144.638 ms  145.125 ms
19  c002--x001.net.lu.se (130.235.217.13)  133.398 ms  133.433 ms  131.593 ms
20  d001a--c002.net.lu.se (130.235.217.34)  140.298 ms  137.724 ms  139.006 ms
21  d001a--c002.net.lu.se (130.235.217.34)  4241.373 ms !H  4651.388 ms !H *
[Ars] For Caitlin
 
Posts: 40
Joined: Sun Jan 06, 2008 11:06 pm

Re: Wither the bigadv WUs?

Postby arvidab » Sat Nov 05, 2011 9:04 am

Having problems with normal SMP WU too, so might be my connection. The symptoms are that it starts to upload a completed WU and gets to somewhere around 40-60% done, and then the network activity drops to zilch (looking in Ubuntu System Monitor) and the log doesn't say anything until after some time. On the bigadv it uploads ~40MB and the latest SMP WU it got to 8MB (12MB total) and then nothing.

My connection speed:
Image

Against a server in Virgina, which is where server the client tries to upload too (if I read it correct):
Image

Latest faillog of a SMP WU:
Code: Select all
Executable: ./fah6
Arguments: -smp 6 -advmethods -verbosity 9 -send all

[08:21:10] - Ask before connecting: No
[08:21:10] - User name: arvidab (Team 37451)
[08:21:10] - User ID: 750EA39A3F31F478
[08:21:10] - Machine ID: 8
[08:21:10]
[08:21:10] Loaded queue successfully.
[08:21:10] Attempting to return result(s) to server...
[08:21:10] Trying to send all finished work units
[08:21:10] Project: 7511 (Run 0, Clone 111, Gen 25)


[08:21:10] + Attempting to send results [November 5 08:21:10 UTC]
[08:21:10] - Reading file work/wuresults_02.dat from core
[08:21:10]   (Read 12784618 bytes from disk)
[08:21:10] Connecting to http://128.143.199.97:8080/
[08:39:05] - Couldn't send HTTP request to server
[08:39:05] + Could not connect to Work Server (results)
[08:39:05]     (128.143.199.97:8080)
[08:39:05] + Retrying using alternative port
[08:39:05] Connecting to http://128.143.199.97:80/
[08:56:53] - Couldn't send HTTP request to server
[08:56:53] + Could not connect to Work Server (results)
[08:56:53]     (128.143.199.97:80)
[08:56:53] - Error: Could not transmit unit 02 (completed November 4) to work server.
[08:56:53] - 4 failed uploads of this unit.


[08:56:53] + Attempting to send results [November 5 08:56:53 UTC]
[08:56:53] - Reading file work/wuresults_02.dat from core
[08:56:53]   (Read 12784618 bytes from disk)
[08:56:53] Connecting to http://130.237.165.141:8080/
arvidab
 
Posts: 10
Joined: Tue Oct 12, 2010 9:06 am

Next

Return to SMP with bigadv

Who is online

Users browsing this forum: No registered users and 0 guests

cron