-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indiana Jones and the Great Circle - DLSS Frame Gen #9
Comments
There is https://github.com/nvpro-samples/vk_streamline, but it will likely refuse to work because legacy Vulkan Reflex is not supported in Wine/Proton. (I have a pile of hacks for that though, which I can't currently publish because of header licensing 😩) Fwiw of all the Vulkan games I'm aware of that come with DLSS Frame Generation (Portal RTX, Portal Prelude RTX, No Man's Sky and Indy, please let me know if you're aware of other ones), only Indy ever tried to actually call into nvcuda. Maybe this is specific to the nvngx_dlssg snippet version that's shipped with this game, or maybe the game does this on its own on purpose, but I'm not sure. There is, however, a concerning pattern where if the game is using Streamline (so, every one except Portal RTX) then Vulkan DLFG will refuse to work and die. Only Portal RTX, the sole game that was able to avoid Streamline due to dxvk-remix having a custom, blessed by Nvidia, DLFG integration, currently works if you pass |
Yeah, i compiled vk_streamline and it does actually fire up. Using Bottles for this test with regular wine, but i should probably try GE-proton or something with that hardware_scheduling thing.. But it does actually start up. Enabling/disabling reflex does not indicate much but probably due to latencyflex or whatsitsname is not working.. dunno. Enabling DLSS does not work, and DLFG (DLSS-G it seems to be called in that log) shows this hardware scheduling thing. viewing at some logs it does indicate something in the line of Ill do some spying on what possible nvcuda calls if any is made in windows 🤔 |
Huh, nice find. Good to know there is actual Cuda interop there, that explains stuff… but we shouldn't be failing the check at https://github.com/NVIDIAGameWorks/Streamline/blob/f9fc648591a88d6accf859cd5c36010c25b6ab7b/source/platforms/sl.chi/vulkan.cpp#L2610 🤔 |
So.. fiddling a bit with this using GE-proton-9.20, i atleast get a different error for DLFG "Error 6".. Running this in bottles i had to add the registry entry for the "RealPath" thing for NGXCore.. i thought the "DLSS script" in bottles actually would do this, but it did not. Anyway, a small snippit from the log:
I do believe that even tho nvcuda.dll is loaded in this case (for the vk_streamline demo thing), the I was looking into this on some other demos using a libcuda relay i made loading it with EDIT: Doh.. if i had half a brain, it would be a better time for me.. ofc it wont work with DLFG on my stupid old linux-hack-box.. RTX2070 aint good enough. 😞 |
Yeah, the GPU needs to support On my system with GeForce RTX 4080 Mobile, vk_streamline sample when launched with Proton Experimental never calls into nvcuda. Instead, I see stuff like
which then hangs. Oh well. But at least I do pass the sample's native OFA check, so I'm not sure why I would fail it with Indy. |
Does it hang immediately when you enable Frame Gen? I installed the latencyflex binaries to my distro, and used GE-Proton-20 binaries, and created a fresh prefix with whatever needed and ran with Ofc no Frame Gen due to the old card... just out of interest if the case is that it crashes "no matter what" for you and we can start blaming nvofapi64 🤣 |
And using my linux libcuda.so relay, cuda is used directly by the nvngx.dll's:
This "Nvngx_funcX" functions is one of those internal hidden API thingys in nvcuda.
|
I can chime in here, I've seen different results. For me it never worked, regardless if DLSS-FG is set to off. For some people, the game seems to crash as soon as they enable FG, however I'm unable to verify this. Saancreed got some logs from me... Btw, Maybe Nvidia needs to step in and help here also... |
I don't have the game, so i cant test that... but what i tested was this
The func12 is supposed to take 2 "context" addresses, so something is missing.. But then again, spoofing AD100 wont make me have that optical flow vulkan extension anyway, so might be it. There is also a call made to Other calls that are made in this demo i do not think would be needed could be things like:
None of these are documented in the open API.. And i have not checked if they are used in Indiana Jones game, so in that sense it is a huge wall of text not really related to the game itself.. just figuring out what is needed for Frame Gen in the vk_streamline demo. |
It doesn't help that we are effectively trying to troubleshoot three different issues here:
We should probably make sure that the issue we are trying to resolve is not hidden behind a manifestation of another issue, so to speak. There's a nonzero chance that any attempt to debug the first issue will be harder because of the third one.
I currently have
LatencyFleX won't be able to help us here. It supports only D3D flavor of Reflex and only partially. Vulkan one is a slightly different beast that goes
This could be the private API DLSS Frame Gen uses to support that VK-CUDA interop you found.
Yeah, I think that's the case here.
🙁 Okay, I can imagine this being a problem.
Correct.
Well, I also have no idea what that is, or how critical it would be for DLFG to work. I should probably recheck if Portal RTX tries to call it. |
Since i dont own any of the games, if you could attach a DXVK-NVAPI log from both games i can see if any of those "unknown" addresses are similar. Preferrably with some indication with/without FG usage. The reason i mentioned latencyflex in regards to this vk_streamline demo was not that i expected it to "work".. it just seemed a requirement for this particular demo to even start... Bottles has this as a toggle, but my "manual" wineprefix did not, and the demo did not even start up without having these. After adding the latencyflex binaries "distro wide" + LFX=1 option, i could run the vk_streamline demo with DLSS, but crashing if i enable FG. (Using GE-Proton-9.20) If you think it could help in any way i could give you access to my libcuda.so relay library, and you could perhaps get some more info from that (just regular C code compiled with a makefile)? I dont have Linux on my 4070 gaming rig, or else i could have done more testing there WITH the proper vulkan extensions 😢 |
Here is a log from Portal RTX, with both Reflex and Frame Gen enabled and (as far as I can tell) working correctly: steam-2012840.log Here is Indy, with GPU reported as Ampere: steam-2677660-ampere.log Here is Indy, with GPU reported as Ada: steam-2677660-ada.log And here is vk_streamline, which hangs the moment I click on the Frame Generation toggle: I think at least the DLFG's VK-CUDA interop is bailing out in Indy because it could be loading
I can try, if I find some spare time. |
Okay, so I spent the last few days finishing my Vulkan Reflex implementation, but with that out of the way I got the logs from Indy with libcuda relay: steam-2677660-relay.log However, the behavior changed with this library preloaded. The game never loaded |
Short answer: Yes. Theory: Same with libcuda.so.xxx.xxx (orignal driver versioned) library. There is a .rodata field like this:
compare that with my relay:
Quite more "data" in that segment for sure.. so no problem seing that there might be a lot more too it than just relaying the cu** calls. I do not know if it is nvngx.dll that is responsible for setting up calls to nvofapi64.dll or not tho, but it is highly likely that some data in that read-only field contains something that the relay library does not have... EDIT: Quick look with IDA kinda shows that this _RDATA segment in the windows .dll is some 4 jumptables with 10+ entries in each pointing to some internal offsets. I have no clue what this is.. Could be anything 😢 I do feel this is leaning towards nvngx.dll functionality that does not currently work - eg. NVIDIA. Implementing this .rdata segment is beyond me for sure. |
Looking at the logfile, it does somewhat seem to "work", but all of those Nvngx_func xx calls are more or less speculation "working until it crashes" kind of thing. So, it may ALSO very well be that some of those calls needs more parameters. Since i do not actually log the result of the return CUresult, they can just as well fail a call. I suppose that could be added to more easily see if something more is up too 😄 |
I just want to say I appreciate all your efforts! ❤️ For me, the game always crashes and never loads nvofapi64 now. It used to load, now it simply just loads nvcuda and goes boom. I also tried to replicate getting into the game with FG off but I don't know how some people manage to do it (from what I read) |
Ill add some return checks then.. This is what happens when i enable FG when spoofing AD100 on my 2070 card:
Makes perfectly sense, since it probably tries to use a AD100 kernel on my TU100 gpu...
Yeah.. it needs to take 2 parameters that call.. but gets a nullptr and thus fails (probably due to the previous error). Ill start on that then, and we see if there is anything useful to be gathered. |
@Saancreed I pushed some error code checking. It will ofc not solve anything, but could be interesting to see if one of those nvngx calls fails somewhere 👍 I should probably use that more in the code ofc.. for nvcuda too i suppose, but the overhead of calling -> returning might be more than just returning outright and let the app/game handle any errors, especially for those weirdo calls that gets used to an insane degree 😏 (But for libcuda relay it does not matter, cos its just for snooping purposes anyway and not something used regularly). |
Haven't seen any errors that caught my eye but maybe I missed something. Fwiw at some point the game just stops logging anything CUDA related and just waits for me to terminate it. The log is ~7.9 MiB whether I let it run for a minute or two. |
Ah, with native |
Almost, as that's the old version of nvofapi, new (5.0) header is here: https://github.com/jp7677/dxvk-nvapi/blob/v0.8.0/inc/nvofapi/nvOpticalFlowCuda.h |
I can't help wondering if the usage of CUDA here is some sort of fallback mechanism. Guess ill have to set aside some partition on my 4070 rig to see if i can look into this a bit more. Somewhat hard to compare workings when it is not the same gpu with the missing VK_NV_optical_flow extension i guess. |
What do you mean by fallback? Natively the game loads nvcuda.dll from system32 and nvcuda64.dll from DriverStore in Windows, this is with an RTX 4070. |
By the way, FG makes the game run worse for me with RT on in Windows for some reason. With that being said, if you manage to fix this it might help other games... |
I just wonder why it seems to attempt to use Lets say the default mechanic is to use Some say the game is good, so it might not be a total waste of $ if i buy it AND set up a linux partition for this testing... christmas and all 🤣 |
That's unsurprising,
It very well might, but only at the level internal to Linux driver so we don't have to care about this. Would be nice to know if the game is using CUDA-based or VK-based optical flow on Windows, because Indy appears to be using the manual hooking method for Streamline, and this has some additional caveats with regard to DLFG in Vulkan: If CUDA is used directly instead of Vulkan even on Windows, it could be that the interop is there by design (or due to failure to satisfy native Vulkan OF requirements?) and just nobody cares because it works anyway.
Or maybe it doesn't work and that's the reason why 🙃 |
I created a quicky relay for the nvofapi64.dll on windows without looking into the structs.. just for logging purposes, and the game does use So, atleast we know that for now... Things could be failing on windows too tho, but starting to look a bit less likely perhaps. |
So, do you mean to say that it crashes no matter what setting you use if you use nvcuda from nvidia-libs? Even if DLSS and FG is NOT enabled? Because that does not make sense at all... |
Yeah, seems like it. |
Changing scenes could indeed use more vram, so when actually loading a save, it might just do that. Snippit from the logfile you posted above:
Snippit from logfile where it works for me:
Snippit from logfile where i allocate 8GB Vram BEFORE launching the game:
So.. i am fairly confident you ARE actually running out of vram of some kind. Starting the game using 1440p resolution and "Low" gfx setting, and RT on "medium" (the lowest) the game uses 10.6GB vram for me. Then enabling DLSS and FG it uses slightly less - 10.3GB. However, re-launching the game (since i suppose some graphics settings need that), the game uses 11.2GB! Now, with browser + steam launcher + a couple of CLI windows up, it uses for me a total of 12.05GB. If that happens for you = crash since that is > 12GB. So, even at everything at "LOW", it WILL eat up > 11GB of vram.. and that is really in danger territory it seems. I can however imagine some effects or scenes using some +/- amount here, so if you are "lucky" and manage to load this with nothing in the background and hovering around 11GB vram used, i would still consider it somewhat "danger territory". I would say you can first start off without DLSS/FG, and see how far down you can go.. Keep an eye on It is not extremely far fetched that there can be memory leaks in nvcuda, or some issues releasing memory, so i will look a bit into that to see if i can spot anything suspicious 😄 |
Hmm.. I fear it is a bit more to this. Playing this for a "while" 20-30 minutes, even tho i turned down things to i was hovering around 12-13GB max TOTAL (game +++), i tend to crash..
And there it froze... And what is return code 2 you say? So it seems it does run out of memory eventually, even tho i was probably around 90% vmem usage, so even tho its not oom, might be some starvation of sorts. Guess i have to do some more logging 🤔 |
Yeah, I can't really make much sense out of my memory usage because it seems to crash even if there is memory available. It could be a game compatibility or driver issue of some sort too. It would be nice if someone with a 12Gb Ada card could test this too... just to see if they run into the same issue. |
I think I figured it out... it's __GL_13ebad=0x1 Update: |
Even if you rename nvcuda.dll in c:\windows\system32, it can still be loaded in windows.. just so you are aware, windows uses a slightly different dll-path resolution thing than LD_LIBRARY_PATH in linux so to speak. All nvidia "system libraries" including nvcuda.dll (which is named nvcuda64.dll and nvcuda32.dll 64/32 bit), aswell as nvapi64.dll and whatnot is located in the "DriverStore" folder in windows, and ARE loaded from there... but some apps tends to use a compatibility mode thing where it sometimes is loaded from c:\windows\system32 for then to be unloaded and re-loaded from the DriverStore folder system. (Not gonna explain that one). Anyho.. I am not certain what this GL_13ebad setting actually does as this is some internal setting thing, but it is clear after some testing that there is some sort of strange memory leak where 2MB chunks of vram is being eaten steadily every 3-4 seconds when running with FrameGen. Trying to investigate this, but not overly easy. The amount of videomem in use by the GAME does not change much once you are in-game, but sys-mem is increasing aswell as some amount of "unknown" videomemory. If you look at nvtop when this is happening, adding up the "app/game usage" fields with ACTUAL videomemory used, it does not seem to be the same amount, so... strange indeed. It is possibly some special cuda thing, as if i do allocate 4096MB of vram using cumemalloc or whatnot, it will use > 4096MB of video memory for some reason. This is probably a documented feature, some allocationbuffer or somecrap. Not sure. The cuda functions triggered by nvOFExecute does not seem to do any memory allocation, but there is some context swapping back and forth so maybe something is not freed as it should be.. I am a bit at a loss here atm. Maybe in some register, the "win32 handle" is tied to a spesific cuda context, and what we do is creating a "linux fd handle" to do the function, and return the pointer to some cuda-kernel-blob thing. Wine (proton) does not tie win32 <-> fd handles together possibly, and if the game then is supposed to free that context tied to this "win32 handle", i do not know what will happen. Since this context is not being actively freed using a cuda function, maybe it is trying to free some vulkan thing with a pointer to the linux fd, and it goes tits up from there.. I dunno. @Saancreed Do you have any theories? If you run the game with the highest settings you can, with DLSS/FG enabled, try to look at vmem usage in nvtop over a few minutes.. rises steadily for me. The GAME's videomemory seems stable, but the "overall" videomemory is used up. Also when this is high enough (close to 100%), there will be some stuttering aswell. Maybe that bios setting i cant remember the name of has some amount of pageable memory thing (3-400MB or whatnot), and it will eat and eat out of that until completely oom and freeze up? |
Okay, so as it was explained to me, the game (actually most if not all idTech and now also MOTOR games) have this bug where they request memory allocations for performance-critical resources to be done in system memory instead of video memory. This isn't a problem on Windows because WDDM just magically moves stuff on its own behind your back but there is no such OS-wide mechanism on Linux, where applications are just trusted to not do anything nonsensical. This variable enables hacky promotion of unmapped sysmem allocations to video memory so allocations end up in a better place… but I don't know if it's sophisticated enough to actually leave in system memory those allocations that aren't performance-critical, so I wouldn't be surprised if it led to higher VRAM requirements on Linux.
IIRC Vulkan spec documents a difference where at least importing one resource on one OS consumes the fd/handle and another case in which it doesn't. Maybe there's a similar difference in CUDA and we should be closing the handle/fd in nvcuda after calling the Linux-side function? I'm not sure if I got it right.
Not really, but I'd try force-enabling
Which sounds promising, but then there is…
So considering the allocation promotion hack… 🤷 |
Made some changes here bd24982 (testing branch) Atleast it works the same, and attempt to close the handles, although not closing the Opening this again, as it still crashes for me with a bit of time.. 15-30 minutes ingame, and it either claims it is out of memory even if i have 1-2GB free vram according to nvtop, and the game sais it uses 12-13GB, or just stops. Have done some testing with fsync/esync and without any, but it does not really seem to matter either. Maybe it is a gamebug that pops up with the usage of CUDA like this, considering this shady One "round" of
That ContextStorage_Get bit could very well be iffy also, as this is one of those hidden api's, but it does not seem to cause issues other places, but i suppose no other game using CUDA is THIS memory hungry. I mean, a game using PhysX that uses like 4-6GB vram would probably spend a lot longer going oom. |
I'm a bit further into the game and it crashes even without nvcuda after a while. While it might be possible to optimize nvcuda I think it's basically a game engine/driver workaround issue like @Saancreed said. |
The "memory leakage" where the game seems to .. well.. somewhat allocate or map memory from host-mem -> videomem, possibly due to this workaround seems present without FG. However, it does seem to be a lot WORSE when using framegen/nvcuda implementation. I am not sure if it is a nvcuda bug, in that the game is "better" at freeing/moving memory without it, or if using FG just increases the underlying problem. @Saancreed I have not found any worsening, or improving using the |
We can only hope Nvidia comes up with a more stable fix for this game, at least we got the workaround for now... FWIW I checked the game profile with Nvidia Profile Inspector in windows, there seem to be a few "quirks" added for this game in Windows, I can't tell what all of them do. |
So, i have moved stuff around a bit, and now the active development of nvcuda is on the Anyway, the last two commits to Another part is the way we "translate" win32 handles -> fd handles. This ended up not being too hard after @Saancreed came up with that idea, and i have done a bit of testing, however i do believe this is somewhat only half the story. Ref to cuda documentation those different "types" of handles are treated completely oppositely when it comes to CUDA. Yay!
So, this i would think means what it sais - CUDA claims "ownership" of the FD handle
but in the case of a win32 type of handle, the ownership is NOT transfered to CUDA! So, completely opposite. This probably also means that the game/app is supposed to handle its lifetime, and CUDA would just used it for what it needs it for internally. No problems there, as long as the app does cleanups and whatnot... The issue here i THINK could be in the way CUDA is supposed to "free" the resource.. and if or how wineserver figures that out, i do not know.. wineserver/winevulkan may do it correctly, and completely out of sync with CUDA and/or CUDA could internally attempt to free this and not being able to? I do not know... I would also think that one of the reasons win32 timeline semaphores work with winevulkan could be due to the vulkan extension in use as well, but we do not have any sort of such extension for CUDA. There is a note about KMT handle in the winevulkan source also:
Something that kind of indicate to me that it is not trouble-free there either. I will keep stabbing in the dark with this, but my knowledge is frightfully limited, so for now i suppose the latest hack is the best i got 🤣 |
Yeah, this was one thing I was also not sure about. With the difference being that importing Win32 handles does not consume them, but importing FDs does, I expected seeing some issues in a scenario where (but keep in mind, I don't exactly remember what Winevulkan's shared resources patchset does exactly so some of it is a complete guesswork):
What if 😩 |
Yeah, its a pickle... I was hoping wineserver would be oki with duplicating like this. It did not fix stuttering or memoryleak doing this, so it ended up being the same anyway. Thats one of the reasons i believe it is not just 1 issue we are dealing with here either, although I am sure it is one of them. I have not tested extensively setting this CUDA "fallback to sysmem" to disabled in windows, so i am going to do some more testing there, and see if i end up oom there aswell. Still do not think the problem would be as big there due to us needing this The cuda sysmem fallback thing is something ppl have requested for a long time, as this is not only for gaming, but other cuda apps that easily run oom on linux but works just fine in windows. Hopefully things could improve a bit if that was to come around 👍 Tracking memory allocations with nvidia-smi or other tools is impossible it seems, as it will NOT show things like shared memory or managed memory (cuMemAllocManaged), so you could end up oom with only a few GB videomemory "used". I do suspect this __GL_13ebad quirk is using some hackery here that will hide some allocations perhaps. Not to mention the ever present culprit of fragmentation due to frequent cuda allocations (imports?) that may have a MUCH slower rate of freeing than windows perhaps. As i understand this Cuda Frame Generation thing (until DLSS 4.0 comes i guess), is that lets say vulkan render/generate 2 images. How many "saved frames for later use" is supposed to be in vmem? Dunno.. Is it "always" generating a new image? Dunno... Aggressively calling |
Fun fact: with DLSS v310 snippets borrowed from Cyberpunk 2077's latest update, Frame Gen appears to be working in Indy without anything CUDA-related having to be installed. (Something something Optical Flow Accelerators are no longer used, which likely means the same for CUDA interop if not the entire NVOFAPI library. At least from a quick look at Proton logs, (Using R570 driver currently requires Proton Experimental bleeding-edge, but I expect this to be working even with R565, although I didn't test that particular scenario.) |
So.. This is supposedly the new DLSS4.0 then.. and as it sais in the article - NVIDIA has moved away from OpticalFlow usage. I see that on the great circle "update 3" notes, it sais that they will add support for Blackwell GPU's (50xx), which will hopefully also update the games DLSS/sl.xx dll's to this new version. Although i admit it is still "typical NVIDIA" version scheme to use v310.xx for the DLSS4 binaries.. or whatever... 🤣 Does work, although i do get quite a few spikes. Gonna do some comparison with windows. PS. In case ppl do not own Cyberpunk - the files can be downloaded as listed here in this reddit post: Also note that the streamline files that is needed is listed in that post under "Edit 5" |
At the very least we should get a showcase of upcoming Blackwell-exclusive
Maybe all this time we were supposed to be adding all the digits together 🙃 |
Hm, game started fine and played fine with those files added to the streamline directory, but as soon as i enabled FG, game went to a black screen and crashed. |
Why is your setup so cursed… Proton logs please? |
From jp7677/dxvk-nvapi#245 (comment) , from what I understood (but you should know better ;) ) Edit: ah, forget what I said, those newer endpoints are only relevant for D3D12. |
Heh, OK, I'll wait till that get sorted out. Feels like it's too much of a WIP right now. |
I did not immediately get this to work either, so i dunno if it is "just" to copy all the .dll's from cyberpunk or not.. but even on windows i ended up hard-rebooting after doing that! However, i downloaded the DLSS_Swapper tool for windows, and used that.. then copied the "streamline" folder over to my linux steam version of Great Circle.. and that did work 👍 I attach a .zip of the streamline folder with the working files here: Then make sure to delete Then replace the "streamline" folder in the game directory with the one you download above. If that still does not work AT ALL.. i have no other ideas than either some special "cachy" tweaks, or some hardware issues perhaps.. I dunno. You still have to run the game with |
Jeez.. what the bloody f is a "datacenter driver"? So.. not only is the CUDA driver 570.85.10 "too old" - but its still not released as a standalone driver.. now one need to get some "datacenter driver" to get up-to-date? lulz.. I mean.. sheesh... |
I wouldn't worry about it, the actual GeForce R570 is supposed to be released later today. |
I heavily suspect the R570 release will be the the same driver anyway (570.86.15) but who knows.. .:) |
Right.. so as this seems to work even better than the nvofapi workaround, i dropped this and we made some upstream dxvk-nvapi changes just in case. This means that IF you want to use Frame Generation with this game - until they hopefully update it - you need to copy the DLSS4 binaries from either downloading or other game. The DLSS-Swapper software that "everybody" is using seems to indicate that this is very much used for a lot of games in windows. So, i feel confident that this issue can be worked around using this method, and hopeful that this will be completely solved in the next game update that their patchnotes indicate should add support for RTX50xx cards. (Eg. Update to DLSS). Still the occational stutter, and possible over-use of vram, but the stuttering seems a bit less with DLSS4, and crashing happened here and there anyway.. all in all a win. Thanks for helping out with this long issue. Glad it started, because i have actually enjoyed playing that game 👍 |
#8
Since this pull did not fix the issue, even tho it may or may not be related to cuda, ill keep it open a while to see if there COULD be something there with Cuda Context creation or something.
I do not own this game, but maybe someone could come up with a link to a vulkan demo or something using DLFG? The NVIDIA "Donut" demo does use DLSS, but does not use Frame Gen to my knowledge.. and that does not work when running with the
-vulkan
option at all currently for me.It does not load nvcuda either, so probably not comparable other than maaaaaybe something with winevulkan in general.
The text was updated successfully, but these errors were encountered: