Donor & Team lists issues

Moderators: Site Moderators, FAHC Science Team

pafka
Posts: 9
Joined: Mon Jan 21, 2008 10:37 am

Donor & Team lists issues

Post by pafka »

Downloading the daily_user_summary.txt and daily_team_summary.txt files is OK.

Issues:

1. Importing it has revealed that non alphanumeric ( even non ascii ) names exist, which does not match the site requirement?
- probably not really an issue, while i can still save those binary;
- but still would be a good hint if we know how F@H handles these names.

2. There are non existent teams ( e.g. 1565204469 ), which confronts "Default (includes all those WU returned without valid team number) (0)"?!

3. There are a number of repeating records for a given name and team ( e.g. andy from 0 team ), which implies a few issues.
- F@H site does not show more than one record, which obviously can not be true?!
- if F@H site has any id for those users that would probably provide a little help!
- it is also obvious there is no way for an end user to distinguish which of the 8 records corresponds to which user in the previous list - I'd DO appreciate any help from F@H site on that one!!
codysluder
Posts: 1024
Joined: Sun Dec 02, 2007 12:43 pm

Re: Donor & Team lists issues

Post by codysluder »

Regarding item 3: FAH accepts email addresses as names but does not publish email addresses in the interest of minimizing spam. Thus andy@domain1.com and andy@domain2.com and andy will appear as three distinct records but will all show only "andy" for the name. There is no way to tell them apart from the external data provided by Stanford.

Regarding items 1 and 2: Even though new information is probably checked for conformance with the current rules, that was not always true. The fundamental statement that data will not be adjusted after the results are uploaded applies to everything, even data that was improperly accepted before the uncoming data was checked for conformance.
pafka
Posts: 9
Joined: Mon Jan 21, 2008 10:37 am

Re: Donor & Team lists issues

Post by pafka »

codysluder wrote:Regarding item 3: FAH accepts email addresses as names but does not publish email addresses in the interest of minimizing spam. Thus andy@domain1.com and andy@domain2.com and andy will appear as three distinct records but will all show only "andy" for the name. There is no way to tell them apart from the external data provided by Stanford.
Well, I am familiar with the way emails are treated when used as name.

Still, that does not answer the question how F@H are officially treating those names, and how these are used to display user stats!

So, there might be an Andy wishing to get his certificate for let me say 1234 WUs he has truly done, but he only receives the other andy's certificate which states 1 WU.
Of course, you may check the reverse situation.
codysluder
Posts: 1024
Joined: Sun Dec 02, 2007 12:43 pm

Re: Donor & Team lists issues

Post by codysluder »

To get a certificate, you have to use the official Stanford stats. If andy enters his name as andy@domain1.com, the stats will know exactly who he is and give him the correct certificate. This cannot be done on a 3rd party stats site that depends on the donor and team lists as the source for their information.
pafka
Posts: 9
Joined: Mon Jan 21, 2008 10:37 am

Re: Donor & Team lists issues

Post by pafka »

codysluder wrote:To get a certificate, you have to use the official Stanford stats. If andy enters his name as andy@domain1.com, the stats will know exactly who he is and give him the correct certificate. This cannot be done on a 3rd party stats site that depends on the donor and team lists as the source for their information.
That does not look like a problem solved situation - it only looks like you do understand the issue. I would appreciate it if you stick to a solution next time.
Well, I'd expect we'd soon agree that F@H put some IDs on those users, would you?
We also know that it would not be so smart to expect F@H to provide the emails as IDs ... at least I do not.
Any integer or hash value, etc. would do just fine.
anandhanju
Posts: 526
Joined: Mon Dec 03, 2007 4:33 am
Location: Australia

Re: Donor & Team lists issues

Post by anandhanju »

If you look at the MyFolding file (which is the correct place to see a person's stats), you'll see that a donor's page is determined by the username AND the team. If Andy from Team FF wished to see his stats, he'd access it via this link. If you search for donors by name and see the page for Andy, you'll see the rolled up credits for all users with that id. FAH has no way of determining if it was one andy who worked on all these or if they were different donors.

The passkey functionality in the new clients has been introduced keeping this enhancement in mind. There is no estimate when this will be enabled on the server side.

For 1), names can contain special characters. The non alphanumeric and non ASCII characters are represented as URL safe Unicode. E.g. See this URL.

For 2), I'm not sure what you mean by confronts but I think these "non-existant" teams are GAH teams that were transitioned over to FAH but were abandoned when the team numbering was restarted.
pafka
Posts: 9
Joined: Mon Jan 21, 2008 10:37 am

Re: Donor & Team lists issues

Post by pafka »

anandhanju wrote:If you look at the MyFolding file (which is the correct place to see a person's stats), you'll see that a donor's page is determined by the username AND the team. If Andy from Team FF wished to see his stats, he'd access it via this link. If you search for donors by name and see the page for Andy, you'll see the rolled up credits for all users with that id. FAH has no way of determining if it was one andy who worked on all these or if they were different donors.

The passkey functionality in the new clients has been introduced keeping this enhancement in mind. There is no estimate when this will be enabled on the server side.

For 1), names can contain special characters. The non alphanumeric and non ASCII characters are represented as URL safe Unicode. E.g. See this URL.

For 2), I'm not sure what you mean by confronts but I think these "non-existant" teams are GAH teams that were transitioned over to FAH but were abandoned when the team numbering was restarted.
You are so wrong about that!
- F@H would not show you a rolled up credits for all the "andy"s!
- F@H DO have a way to determine and distinguish users using emails! ( One can check that on the user stats pages. I hope I'm not revealing a sort of an inside secret here. )

F@H uses some sort of encryption of the user names, including the email part. Well, those encrypted emails look like a win win situation at least until the passkey starts to work ( which in fact may not happen ).
Well, I am too old to reverse engineer the encryption used, just to find out its a one way encryption of a sort ( which, at a first glance, seems not to be the case ).
Frankly, I am confident it'd be better to ask for a third "daily_user_summary.txt" file with the encrypted names included, instead of scanning the site for these.

And again:
for 1) I have a solution for that ... forget about it, or consider it just as a note.
for 2) "confront" should be self explaining when you look at the description of the Default team, which is expected to collect all those WUs returned with an invalid team number.

PS: I do hope a member of F@Hs development team would finally look here and provide useful hint/information even in private messaging, because I do want to remove those thoughts about reverse engineering and parsing the F@H site from my head!
Thanks.
VijayPande
Pande Group Member
Posts: 2058
Joined: Fri Nov 30, 2007 6:25 am
Location: Stanford

Re: Donor & Team lists issues

Post by VijayPande »

pafka wrote:
codysluder wrote:Regarding item 3: FAH accepts email addresses as names but does not publish email addresses in the interest of minimizing spam. Thus andy@domain1.com and andy@domain2.com and andy will appear as three distinct records but will all show only "andy" for the name. There is no way to tell them apart from the external data provided by Stanford.
Well, I am familiar with the way emails are treated when used as name.

Still, that does not answer the question how F@H are officially treating those names, and how these are used to display user stats!

So, there might be an Andy wishing to get his certificate for let me say 1234 WUs he has truly done, but he only receives the other andy's certificate which states 1 WU.
Of course, you may check the reverse situation.
When a donor puts their full donorname (including email) we parse it on the other side and make sure it goes to the right account, even if it's not in the text pages. The stats are more complex than what's in the text pages, since we must hide emails in the text pages.
Prof. Vijay Pande, PhD
Departments of Chemistry, Structural Biology, and Computer Science
Chair, Biophysics
Director, Folding@home Distributed Computing Project
Stanford University
MtM
Posts: 1579
Joined: Fri Jun 27, 2008 2:20 pm
Hardware configuration: Q6600 - 8gb - p5q deluxe - gtx275 - hd4350 ( not folding ) win7 x64 - smp:4 - gpu slot
E6600 - 4gb - p5wdh deluxe - 9600gt - 9600gso - win7 x64 - smp:2 - 2 gpu slots
E2160 - 2gb - ?? - onboard gpu - win7 x32 - 2 uniprocessor slots
T5450 - 4gb - ?? - 8600M GT 512 ( DDR2 ) - win7 x64 - smp:2 - gpu slot
Location: The Netherlands
Contact:

Re: Donor & Team lists issues

Post by MtM »

pafka wrote:PS: I do hope a member of F@Hs development team would finally look here and provide useful hint/information even in private messaging, because I do want to remove those thoughts about reverse engineering and parsing the F@H site from my head!
Thanks.
Which isn't allowed :?: ;)
codysluder
Posts: 1024
Joined: Sun Dec 02, 2007 12:43 pm

Re: Donor & Team lists issues

Post by codysluder »

VijayPande wrote:When a donor puts their full donorname (including email) we parse it on the other side and make sure it goes to the right account, even if it's not in the text pages. The stats are more complex than what's in the text pages, since we must hide emails in the text pages.
True, by why can't you assign an index key such as andy@1 and andy@2 . . . so the text files provide enough information to distinguish between the various emails without divulging the actual email address (even in encrypted form). Microsoft figured out how to do that with the old DOS 8.3 filenames.
pafka
Posts: 9
Joined: Mon Jan 21, 2008 10:37 am

Re: Donor & Team lists issues

Post by pafka »

codysluder wrote:
VijayPande wrote:When a donor puts their full donorname (including email) we parse it on the other side and make sure it goes to the right account, even if it's not in the text pages. The stats are more complex than what's in the text pages, since we must hide emails in the text pages.
True, by why can't you assign an index key such as andy@1 and andy@2 . . . so the text files provide enough information to distinguish between the various emails without divulging the actual email address (even in encrypted form). Microsoft figured out how to do that with the old DOS 8.3 filenames.
Well, I wouldn't come so far if that was an option.
You see - it looks like F@H provide the exports ordered by credit ( descending ), which you have to agree ruins the idea in time.
Yes, that would be an option if lists come ordered by time a name has been first seen.

I am so sorry we're still missing a valuable opinion from F@H member.
VijayPande
Pande Group Member
Posts: 2058
Joined: Fri Nov 30, 2007 6:25 am
Location: Stanford

Re: Donor & Team lists issues

Post by VijayPande »

pafka wrote: I am so sorry we're still missing a valuable opinion from F@H member.
Sorry, with literally hundreds of threads, it can take a while before we get to all, especially if we have recently replied given thread.

We'll look into this, but with all that we have going on right now (in particular shoring up GPU2 and SMP/SMP2), this may have to wait a bit.
Prof. Vijay Pande, PhD
Departments of Chemistry, Structural Biology, and Computer Science
Chair, Biophysics
Director, Folding@home Distributed Computing Project
Stanford University
VijayPande
Pande Group Member
Posts: 2058
Joined: Fri Nov 30, 2007 6:25 am
Location: Stanford

Re: Donor & Team lists issues

Post by VijayPande »

codysluder wrote:
VijayPande wrote:When a donor puts their full donorname (including email) we parse it on the other side and make sure it goes to the right account, even if it's not in the text pages. The stats are more complex than what's in the text pages, since we must hide emails in the text pages.
True, by why can't you assign an index key such as andy@1 and andy@2 . . . so the text files provide enough information to distinguish between the various emails without divulging the actual email address (even in encrypted form). Microsoft figured out how to do that with the old DOS 8.3 filenames.
The tricky part here is that the keys have to be consistent from update to update. Let's say andy@domainA.com has 100 points and andy@domainB.com has 50 in one update and we list it as
andy@1 100
andy@2 50
based on ordering by highest to lowest. However, there's no guarantee that ordering will hold. It gets even more complex when andy@domainC.com comes in. I don't see any solution here that would work without storing some additional state info, since the list can (and will) change from update to update, so simply ordering won't work.
Prof. Vijay Pande, PhD
Departments of Chemistry, Structural Biology, and Computer Science
Chair, Biophysics
Director, Folding@home Distributed Computing Project
Stanford University
pafka
Posts: 9
Joined: Mon Jan 21, 2008 10:37 am

Re: Donor & Team lists issues

Post by pafka »

VijayPande wrote:
codysluder wrote:
VijayPande wrote:When a donor puts their full donorname (including email) we parse it on the other side and make sure it goes to the right account, even if it's not in the text pages. The stats are more complex than what's in the text pages, since we must hide emails in the text pages.
True, by why can't you assign an index key such as andy@1 and andy@2 . . . so the text files provide enough information to distinguish between the various emails without divulging the actual email address (even in encrypted form). Microsoft figured out how to do that with the old DOS 8.3 filenames.
The tricky part here is that the keys have to be consistent from update to update. Let's say andy@domainA.com has 100 points and andy@domainB.com has 50 in one update and we list it as
andy@1 100
andy@2 50
based on ordering by highest to lowest. However, there's no guarantee that ordering will hold. It gets even more complex when andy@domainC.com comes in. I don't see any solution here that would work without storing some additional state info, since the list can (and will) change from update to update, so simply ordering won't work.
There is a solution w/o storing additional information.

Currently ( if ordered by credit ):
  • andy@1 100 3 0
    andy@2 50 1 0
could become in the next update:
  • andy@2 123 4 0
    andy@1 100 3 0
and seen in the donor list like:
  • andy 123 4 0
    andy 100 3 0
and the swap is impossible to detect.


But, if the list is ordered by the moment ( unix time, datetime, timestamp, etc... ) an user has been first spotted ( e.g. reported his first unit ) by the system we'd have:
  • andy@1 100 3 0 ( 2008-11-23 01:23:45 )
    andy@2 50 1 0 ( 2008-12-01 10:45:23 )
and in the next update the order would look like:
  • andy@1 100 3 0 ( 2008-11-23 01:23:45 )
    andy@2 123 4 0 ( 2008-12-01 10:45:23 )
    andy@N 12 1 0 ( 2008-12-02 10:45:23 )
and the donor list would look like:
  • andy 100 3 0
    andy 123 4 0
    andy 12 1 0
which order, one could use to auto ID the users in a database, e.g.:
  • 21 andy 100 3 0
    22 andy 123 4 0
    23 andy 12 1 0
and we can see a swap can not occur, but the information needed exists and the file's structure has been preserved.


Those IDs are not supposed to come from F@H system, but are my responsibility to add and follow.
That would allow third party stats to show all users' stats and user's using those could self locate and keep track of their personal stats.

That is all possible without changing the structure of the donor list.

PS: Of course, a case exists, where F@H would take care of some sort of IDs.
That would require additional column and could be provided e.g. in another file.
But I have already given up demanding such a file and that is really out of my scope right now.
Would probably get back to that when ( if ever ) the password functionality becomes mandatory.
codysluder
Posts: 1024
Joined: Sun Dec 02, 2007 12:43 pm

Re: Donor & Team lists issues

Post by codysluder »

VijayPande wrote:I don't see any solution here that would work without storing some additional state info, since the list can (and will) change from update to update, so simply ordering won't work.
Well, one way to to it is to store the state information for your andy example as
domainA.com=1
domainB.com=2

There's another possiblity: Don't you already have a way to store that information? If each of the donors has a passkey, whether they've ever used it or not. Suppose that andy@domainA.com has never used a passkey. Create a dummy passkey for him. If andy@domainB.com has used their passkey, then you can use it. I'm sure there's a reasonably direct way to convert all these passkeys into a series of integers. I'd think that would be a lot better than whatever method you presently use for the stats, but even that would be a third choice.
Post Reply