When we copy into a buffer, it might contain data modified from the GPU
on the same pages. Because of this, we have to flush the contents before
writing new data.
An alternative approach would be to write the data in place, but games
can also write data in other ways, invalidating our contents.
Fixes geometry in Zombie Panic in Wonderland DX.
Setting __GL_SHADER_DISK_CACHE_PATH we can force the cache directory to
be in yuzu's user directory to stop commonly distributed malware from
deleting our driver shader cache. And by setting
__GL_SHADER_DISK_CACHE_SKIP_CLEANUP we can have an unbounded shader
cache size.
This has only been implemented on Windows, mostly because previous tests
didn't seem to work on Linux.
Disable the precompiled cache on Nvidia's driver. There's no need to
hide information the driver already has in its own cache.
Silence the new validation layer error about SPIR-V not allowing OpUndef
on a OpTypeVoid, even when the SPIR-V spec doesn't say anything against
it.
They will be inserted as an undefined int to avoid SPIRV-Cross and
validation errors, but only when a debugging tool is attached.
Allow users of the allocator to hint memory usage for downloads. This
removes the non-descriptive boolean passed for "host visible" or not
host visible memory commits, and uses an enum to hint device local,
upload and download usages.
Fix a bug where the memory allocator could leave gaps between commits.
To fix this the allocation algorithm was reworked, although it's still
short in number of lines of code.
Rework the allocation API to self-contained movable objects instead of
naively using an unique_ptr to do the job for us. Remove the VK prefix.
Stops us from merging code with unused functions in the future.
If something is invoked behind conditionally evaluated code in
a way that the language can't see it (e.g. preprocessor macros), the
potentially unused function should use [[maybe_unused]].
It keeps track of the modified CPU and GPU ranges on a CPU page
granularity, notifying the given rasterizer about state changes
in the tracking behavior of the buffer.
Use a small vector optimization to store buffers smaller than 256 KiB
locally instead of using free store memory allocations.
With timeline semaphores we can avoid creating objects. Instead of
creating an event, grab the current tick from the scheduler and flush
the current command buffer. When the fence has to be queried/waited, we
can do so against the master semaphore instead of spinning on an event.
If Vulkan supported NVN like events or fences, we could signal from the
command buffer and wait for that without splitting things in two
separate command buffers.
Intel and AMD proprietary drivers are incapable of rendering to texture
views of different formats than the original texture. Avoid creating
these at a cache level. This will consume more memory, emulating them
with copies.
This breaks accelerated decoders trying to imageStore into images with
sRGB. The decoders are currently disabled so this won't cause issues at
runtime.
The "VK" prefix predates the "Vulkan" namespace. It was carried around
the codebase for consistency. "VKDevice" currently is a bad alias with
"VkDevice" (only an upcase character of difference) that can cause
confusion. Rename all instances of it.
For listing the available physical devices we can use Vulkan 1.0.
Now that MoltenVK supports 1.1 we can require it for running games.
Add missing documentation.
VKDevice::IsSuitable was not being called. To address this issue, check
suitability before initialization and throw an exception if it fails.
By doing this, we can deduplicate some code on queue searches.
Previosuly we would first search if a present and graphics queue
existed, then on initialization we would search again to find the index.
Report device enumeration errors with exceptions to be consistent with
other initialization related function calls. Reduces the amount of code
to maintain.
Move surface initialization code to a separate file. It's unlikely to
use this code outside of Vulkan, but keeping platform-specific code
(Win32, Xlib, Wayland) in its own translation unit keeps things cleaner.
Move more Vulkan code to report errors with exceptions and report them
through a log before notifying it with an error boolean for backwards
compatibility. In the future we can replace the rasterizer two-step
initialization to always use exceptions.
Initialize debug callbacks (messenger) from a separate file. This allows
sharing code with different backends.
Change our Vulkan error handling to use exceptions instead of error
codes, simplifying the initialization process.
The current texture cache has several points that hurt maintainability
and performance. It's easy to break unrelated parts of the cache
when doing minor changes. The cache can easily forget valuable
information about the cached textures by CPU writes or simply by its
normal usage.The current texture cache has several points that hurt
maintainability and performance. It's easy to break unrelated parts
of the cache when doing minor changes. The cache can easily forget
valuable information about the cached textures by CPU writes or simply
by its normal usage.
This commit aims to address those issues.
Without using VK_EXT_robustness2, we can't consider the 'enabled' (not
null) vertex buffers as dynamic state, as this leads to invalid Vulkan
state. Move this to static state that is always hashed and compared in
the pipeline key.
The bits for enabled vertex buffers are moved into the attribute state
bitfield. This is not 'correct' as it's not an attribute state, but that
struct has bits to spare, and it's used in an array of 32 elements (the
exact same number of vertex buffer bindings).
Most of the time people write code that always returns a value,
terminates execution, throws an exception, or uses an unconventional
jump primitive.
This is not always true when we build without asserts on mainline builds.
To avoid introducing undefined behavior on our most used builds, enforce
this warning signalling an error and stopping the build from shipping.
fmt now automatically prints the numeric value of an enum class member
by default, so we don't need to use casts any more.
Reduces the line noise a bit.
The previous definition was:
#define NUM(field_name) (sizeof(Maxwell3D::Regs::field_name) / sizeof(u32))
In cases where `field_name` happens to refer to an array, Clang thinks
`sizeof(an array value) / sizeof(a type)` is an instance of the idiom
where `sizeof` is used to compute an array length. So it thinks the
type in the denominator ought to be the array element type, and warns if
it isn't, assuming this is a mistake.
In reality, `NUM` is not used to get array lengths at all, so there is no
mistake. Silence the warning by applying Clang's suggested workaround
of parenthesizing the denominator.
On Apple platforms, FALSE and TRUE are defined as macros by
<mach/boolean.h>, which is included by various system headers.
Note that there appear to be no actual users of the names to fix up.
Migrates the video core code closer to enabling variable shadowing
warnings as errors.
This primarily sorts out shadowing occurrences within the Vulkan code.
This was only necessary for use with the
avcodec_decode_video2/avcoded_decode_audio4 APIs which are also
deprecated.
Given we use avcodec_send_packet/avcodec_receive_frame, this isn't
necessary, this is even indicated directly within the FFmpeg API changes
document here on 2017-09-26:
https://github.com/FFmpeg/FFmpeg/blob/master/doc/APIchanges#L410
This prevents our code from breaking whenever we update to a newer
version of FFmpeg in the future if they ever decide to fully remove this
API member.
Force early fragment tests when the 3D method is enabled.
The established pipeline cache takes care of recompiling if needed.
This is implemented only on Vulkan to avoid invalidating the shader
cache on OpenGL.
- Use .at() instead of raw indexing when dealing with untrusted indices.
- For the special case of WaitFence with syncpoint id UINT32_MAX,
instead of crashing, log an error and ignore. This is what I get when
running Super Mario Maker 2.
EmuWindow::PollEvents was called from the GPU thread (or the CPU thread
in sync-GPU mode) when swapping buffers. It had three implementations:
- In GRenderWindow, it didn't actually poll events, just set a flag and
emit a signal to indicate that a frame was displayed.
- In EmuWindow_SDL2_Hide, it did nothing.
- In EmuWindow_SDL2, it did call SDL_PollEvents, but this is wrong
because SDL_PollEvents is supposed to be called on the thread that set
up video - in this case, the main thread, which was sleeping in a
busyloop (regardless of whether sync-GPU was enabled). On macOS this
causes a crash.
To fix this:
- Rename EmuWindow::PollEvents to OnFrameDisplayed, and give it a
default implementation that does nothing.
- In EmuWindow_SDL2, do not override OnFrameDisplayed, but instead have
the main thread call SDL_WaitEvent in a loop.
This reduces the overhead of bounds checking on each element.
It won't reduce the cost of allocation because usually this vector's
capacity is usually large enough to hold whatever we push to it.
It's deprecated in the language to autogenerate these if the destructor
for a type is specified, so we can explicitly specify how we want these
to be generated.
The API of VP9 exposes a WasFrameHidden() function which accesses this
member. Given the constructor previously didn't initialize this member,
it's a potential vector for an uninitialized read.
Instead, we can initialize this to a deterministic value to prevent that
from occurring.
This implements texture cube arrays with shadow comparisons but doesn't
fix the asserts related to it.
Fixes out of bounds reads on swizzle constructors and makes them use
bounds checked ::at instead of the unsafe operator[].
This commit aims to implement the NVDEC (Nvidia Decoder) functionality, with video frame decoding being handled by the FFmpeg library.
The process begins with Ioctl commands being sent to the NVDEC and VIC (Video Image Composer) emulated devices. These allocate the necessary GPU buffers for the frame data, along with providing information on the incoming video data. A Submit command then signals the GPU to process and decode the frame data.
To decode the frame, the respective codec's header must be manually composed from the information provided by NVDEC, then sent with the raw frame data to the ffmpeg library.
Currently, H264 and VP9 are supported, with VP9 having some minor artifacting issues related mainly to the reference frame composition in its uncompressed header.
Async GPU is not properly implemented at the moment.
Co-Authored-By: David <25727384+ogniK5377@users.noreply.github.com>
These compiler flags aren't shared with clang, so specifying these flags
unconditionally can lead to a bit of warning spam.
While we're in the area, we can also enable -Wunused-but-set-parameter
given this is almost always a bug.