Page 3 of 4

Re: BAD_WORK_UNIT (114 = 0x72)

Posted: Mon May 18, 2020 11:41 am
by Hoovoloo
Just realised that there are a pair of nvlddmkm errors for each failure. The pair for a failure are below.

Code: Select all

Log Name:      System
Source:        nvlddmkm
Date:          18/05/2020 12:17:20
Event ID:      13
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      DESKTOP-CDEFV44
Description:
The description for Event ID 13 from source nvlddmkm cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event: 

\Device\Video3
Graphics SM Warp Exception on (GPC 3, TPC 2, SM 1): Misaligned PC

The message resource is present but the message was not found in the message table

Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="nvlddmkm" />
    <EventID Qualifiers="49322">13</EventID>
    <Level>2</Level>
    <Task>0</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2020-05-18T11:17:20.126326100Z" />
    <EventRecordID>6991</EventRecordID>
    <Channel>System</Channel>
    <Computer>DESKTOP-CDEFV44</Computer>
    <Security />
  </System>
  <EventData>
    <Data>\Device\Video3</Data>
    <Data>Graphics SM Warp Exception on (GPC 3, TPC 2, SM 1): Misaligned PC</Data>
    <Binary>0000000002003000000000000D00AAC0000000000000000000000000000000000000000000000000</Binary>
  </EventData>
</Event>

Code: Select all

Log Name:      System
Source:        nvlddmkm
Date:          18/05/2020 12:17:20
Event ID:      13
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      DESKTOP-CDEFV44
Description:
The description for Event ID 13 from source nvlddmkm cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event: 

\Device\Video3
Graphics Exception: ESR 0x51d7b0=0xa0005 0x51d7b4=0x20 0x51d7a8=0x4c1eb72 0x51d7ac=0x174

The message resource is present but the message was not found in the message table

Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="nvlddmkm" />
    <EventID Qualifiers="49322">13</EventID>
    <Level>2</Level>
    <Task>0</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2020-05-18T11:17:20.126326100Z" />
    <EventRecordID>6992</EventRecordID>
    <Channel>System</Channel>
    <Computer>DESKTOP-CDEFV44</Computer>
    <Security />
  </System>
  <EventData>
    <Data>\Device\Video3</Data>
    <Data>Graphics Exception: ESR 0x51d7b0=0xa0005 0x51d7b4=0x20 0x51d7a8=0x4c1eb72 0x51d7ac=0x174</Data>
    <Binary>0000000002003000000000000D00AAC0000000000000000000000000000000000000000000000000</Binary>
  </EventData>
</Event>
they appear in the order that they are listed here

Re: BAD_WORK_UNIT (114 = 0x72)

Posted: Mon May 18, 2020 1:08 pm
by _r2w_ben

Code: Select all

\Device\Video3
Graphics SM Warp Exception on (GPC 3, TPC 2, SM 1): Misaligned PC
PC probably means Program Counter in this context. NVIDIA has a CUDA tool to help developers diagnose this type of issue but FAH's GPU core is built using OpenCL. Can someone recommend a good GPU memory test?

What is the manufacturer and model number of your 2070 Super? Does it have any odd characteristics like different amounts of L1/L2 cache depending on which shader core is used? If you look back in Event Viewer for other instances, are they also "GPC 3, TPC 2, SM 1"?

Re: BAD_WORK_UNIT (114 = 0x72)

Posted: Mon May 18, 2020 2:04 pm
by Hoovoloo
This is the card https://www.zotac.com/pk/product/graphi ... -mini#spec
Not sure about characteristics but the link is to the spec page
As far as i can see the error pairing are identical each time

Re: BAD_WORK_UNIT (114 = 0x72)

Posted: Mon May 18, 2020 5:52 pm
by _r2w_ben
Hoovoloo wrote:As far as i can see the error pairing are identical each time
In the details of the error, is the part I put in the code block the same each time?

Re: BAD_WORK_UNIT (114 = 0x72)

Posted: Mon May 18, 2020 6:46 pm
by Hoovoloo
yes it is in that error. On the otherone it is a long string, i couldn't see any difference but might have missed something

Re: BAD_WORK_UNIT (114 = 0x72)

Posted: Fri May 22, 2020 9:53 am
by Hoovoloo
Well not sure if this is progress or just frustrating. Did a clean re-instal of windows and the OpenCl benchmark ran fine. tried FAH and it failed as before. Oddly if I delete the GPU slot, create a second CPU slot and then edit it to be a GPU slot, the first GPU job it tries runs ok but all subsequent ones fail. That has worked twice now. Go figure

Re: BAD_WORK_UNIT (114 = 0x72)

Posted: Fri May 22, 2020 6:59 pm
by PantherX
Please note that if the Slot is configured for CPU and has a WU, it can only run on the CPU. Changing it to the GPU Slot will cause it to work... the WU that it resumes can only run on the CPU. Maybe you can post the log file to see what's happening in the case where you manually changed the Slot type?

Re: BAD_WORK_UNIT (114 = 0x72)

Posted: Fri May 22, 2020 9:02 pm
by Hoovoloo
PantherX yes the job that it starts before the slot is edited to GPU runs on the CPU as you say but the next job it picks up runs on the GPU and you can see the load on the GPU is task manager confirming that is what is happening. Will have a look at the log and post in a few minutes if I can find that part. should be near the start of the log when filtered for Slot 1

Re: BAD_WORK_UNIT (114 = 0x72)

Posted: Sat May 23, 2020 8:47 am
by Hoovoloo
Didn't work this time around so nothing useful in the go. I am struggling to think where to go next with this. Seems really odd that a relatively high spec machine can't get this to run, especially as it was running absolutely fine for a quite a while before these issues

Re: BAD_WORK_UNIT (114 = 0x72)

Posted: Sat May 23, 2020 12:37 pm
by _r2w_ben

Code: Select all

<EventData>
    <Data>\Device\Video3</Data>
    <Data>Graphics SM Warp Exception on (GPC 3, TPC 2, SM 1): Misaligned PC</Data>
    <Binary>0000000002003000000000000D00AAC0000000000000000000000000000000000000000000000000</Binary>
  </EventData>
You mentioned that the errors are logged in pairs. For the errors that have this section, is it always Warp Exception and Misaligned PC? Are the GPC, TPC, and SM numbers always the same?

Re: BAD_WORK_UNIT (114 = 0x72)

Posted: Sat May 23, 2020 12:53 pm
by Hoovoloo
Yes they are and as far as I can see the strings in the other error are also the same each time

Re: BAD_WORK_UNIT (114 = 0x72)

Posted: Sat May 23, 2020 1:19 pm
by _r2w_ben
Have you run MemtestG80 or MemtestCL? I would suggest running a full test of both of them since there might be a small portion of your GPU that is faulty.
https://simtk.org/projects/memtest
https://www.majorgeeks.com/files/detail ... stg80.html
https://www.majorgeeks.com/files/details/memtestcl.html

Re: BAD_WORK_UNIT (114 = 0x72)

Posted: Sat May 23, 2020 1:47 pm
by Hoovoloo
both of them are throwing errors. Seems to vary between runs

Re: BAD_WORK_UNIT (114 = 0x72)

Posted: Sat May 23, 2020 3:03 pm
by Hoovoloo
Have raised it with the vendors support team

Re: BAD_WORK_UNIT (114 = 0x72)

Posted: Sat May 23, 2020 3:06 pm
by bruce
I would try underclocking.