Multiple client stalled due to "database locked"

Moderators: Site Moderators, FAHC Science Team

Post Reply
markfw
Posts: 142
Joined: Mon Feb 04, 2008 3:32 pm

Multiple client stalled due to "database locked"

Post by markfw »

I have had my entire farm off-line most of the day until I noticed that they all we stalled due to "database locked" . After rebooting them (an all day job) I was able to get them all working. But my question is, why did this happen ? and will it happen again in the future ? These were all linux clients, and stopping and starting the client using

sudo service FAHClient stop
sudo service FAHClient start

Did not clear the error, only a reboot. Oh, and I did have one windows client, and I also had to reboot to get it to work.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Multiple client stalled due to "database locked"

Post by bruce »

Without seeing your recent logs, I can only guess.

The most common cause of "database locked" comes from starting a second copy of FAHClient (concurrently). Ordinarily FAHClient runs as a service and two copies can't work at the same time.
markfw
Posts: 142
Joined: Mon Feb 04, 2008 3:32 pm

Re: Multiple client stalled due to "database locked"

Post by markfw »

Well, I restarted all my hosts, so the logs are gone, but NO, these 14 different hosts running linux have all been running 24/7 for months, so only one instance was running. It just all of a sudden affected all my hosts, so I think something happened to the system.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Multiple client stalled due to "database locked"

Post by bruce »

One instance of FAHClient per Host, Right? -- and each host has it's own database that is locked by the FAHClient running on that host. So which database was locked and what happened on that particular host?
Joe_H
Site Admin
Posts: 7870
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Multiple client stalled due to "database locked"

Post by Joe_H »

markfw wrote:Well, I restarted all my hosts, so the logs are gone, ...
The client keeps copies of the last 16 log files by default, so unless you deleted them they should still be on your systems. The link in Bruce's sig includes directions on how to locate the log files depending on which OS you are running.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
markfw
Posts: 142
Joined: Mon Feb 04, 2008 3:32 pm

Re: Multiple client stalled due to "database locked"

Post by markfw »

Well, I looked on 2 different hosts, and could not find "database locked". If I find it on some host I will reply back. Just know that a very experienced user (15 years, and number 21 overall) had this issue, and I think something happened with the host servers. All of my hosts that have been up for months could not all have had the same issue without a system problem.
HaloJones
Posts: 920
Joined: Thu Jul 24, 2008 10:16 am

Re: Multiple client stalled due to "database locked"

Post by HaloJones »

it sounds like you hit the server that won't provide units properly and for some reason can't be fixed
single 1070

Image
rickoic
Posts: 322
Joined: Sat May 23, 2009 4:49 pm
Hardware configuration: eVga x299 DARK 2070 Super, eVGA 2080, eVga 1070, eVga 2080 Super
MSI x399 eVga 2080, eVga 1070, eVga 1070, GT970
Location: Mississippi near Memphis, Tn

Re: Multiple client stalled due to "database locked"

Post by rickoic »

I've been having the same problem and even after reboot the download is extremely slow. I have 4 different computers running with 2 gpus on 3 and 1 on the other (laptop).
I'm folding because Dec 2005 I had radical prostate surgery.
Lost brother to spinal cancer, brother-in-law to prostate cancer.
Several 1st cousins lost and a few who have survived.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Multiple client stalled due to "database locked"

Post by bruce »

Which WU are being downloade from which WorkServers?

When was the last time you restarted your LAN (Router)?
rickoic
Posts: 322
Joined: Sat May 23, 2009 4:49 pm
Hardware configuration: eVga x299 DARK 2070 Super, eVGA 2080, eVga 1070, eVga 2080 Super
MSI x399 eVga 2080, eVga 1070, eVga 1070, GT970
Location: Mississippi near Memphis, Tn

Re: Multiple client stalled due to "database locked"

Post by rickoic »

12:02:07:WU01:FS01:Connecting to 65.254.110.245:8080
12:02:07:WU01:FS01:Assigned to work server 155.247.166.220
12:02:07:WU01:FS01:Requesting new work unit for slot 01: RUNNING gpu:0:GP104 [GeForce GTX 1080] 8873 from 155.247.166.220
12:02:07:WU01:FS01:Connecting to 155.247.166.220:8080
12:02:08:WU01:FS01:Downloading 15.63MiB

And there it sat all night long.

Did a hard restart of LAN just 2-3 days ago.

Here's my log for this gpu after I rebooted.

14:33:13:WU00:FS01:Assigned to work server 128.252.203.10
14:33:14:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:GP104 [GeForce GTX 1080] 8873 from 128.252.203.10
14:33:14:WU00:FS01:Connecting to 128.252.203.10:8080
14:33:14:WU02:FS02:0x21: Found a checkpoint file
14:33:15:WU00:FS01:Downloading 69.98MiB
14:33:21:WU00:FS01:Download 24.74%
14:33:27:WU00:FS01:Download 50.19%
14:33:33:WU00:FS01:Download 79.93%
14:33:37:WU00:FS01:Download complete
14:33:38:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:14230 run:418 clone:1 gen:86 core:0x21 unit:0x0000007080fccb0a5d654d6f4cc8e9bf
14:33:38:WU00:FS01:Starting
14:33:38:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\ricko\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/Win32/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21.exe -dir 00 -suffix 01 -version 705 -lifeline 4752 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 1 -cuda-device 1 -gpu 1
14:33:38:WU00:FS01:Started FahCore on PID 9668
14:33:39:WU00:FS01:Core PID:9764
14:33:39:WU00:FS01:FahCore 0x21 started
14:33:42:WU00:FS01:0x21:*********************** Log Started 2019-10-01T14:33:42Z ***********************
14:33:42:WU00:FS01:0x21:Project: 14230 (Run 418, Clone 1, Gen 86)
14:33:42:WU00:FS01:0x21:Unit: 0x0000007080fccb0a5d654d6f4cc8e9bf
14:33:42:WU00:FS01:0x21:CPU: 0x00000000000000000000000000000000
14:33:42:WU00:FS01:0x21:Machine: 1
14:33:42:WU00:FS01:0x21:Reading tar file core.xml
14:33:42:WU00:FS01:0x21:Reading tar file integrator.xml
14:33:42:WU00:FS01:0x21:Reading tar file state.xml

Tks
I'm folding because Dec 2005 I had radical prostate surgery.
Lost brother to spinal cancer, brother-in-law to prostate cancer.
Several 1st cousins lost and a few who have survived.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Multiple client stalled due to "database locked"

Post by bruce »

I see no messages in what you posted mentioning database locked.

I do see that your report is associated with work server 155.247.166.* which has been experiencing network congestion ... and people are actively working on that problem. (See other discussions.) Please do a better job of reporting your actual problem.
Post Reply