This brings the analysis done for condition registers
more in line with the analysis done for GPRs and FPRs.
This gets rid of the old wantsCR member, which wasn't actually
used anyway. In case someone wants it again in the future, they
can compute the bitwise inverse of crDiscardable.
Instead of materializing the quiet bit in a register and ORing the NaN
with it, we can perform an arithmetic operation on the NaN. This is a
cycle or two slower on some CPUs in cases where generating the quiet bit
pipelined well, but this is farcode that rarely runs, so instruction
fetch latency is the bigger concern. And for non-SIMD cases, we also
save a register.
By using MOVI2R+MOVI2R+CSEL in the zero case instead of doing bitwise
operations on the output of the other MOVI2R+MOVI2R+CSEL, we avoid using
BFI, an instruction that takes two cycles on most CPUs. The instruction
count is the same and the pipelining should be at least equally good.
This is a little trick I came up with that lets us restructure our float
classification code so we can exit earlier when the float is normal,
which is the case more often than not.
First we shift left by 1 to get rid of the sign bit, and then we count
the number of leading sign bits. If the result is less than 10 (for
doubles) or 7 (for floats), the float is normal. This is because, if the
float isn't normal, the exponent is either all zeroes or all ones.
Not sure if this was causing correctness issues – it depends on whether
the HLE code was actually reading the discarded registers – but it was
at least causing annoying assert messages in one piece of homebrew.
Nowadays, basically everything except for controller config is handled
by the new config system. Instead of enumerating the systems that are,
let's enumerate the systems that aren't.
I've intentionally not included Config::System::Session in the new list.
While it isn't intended to be saved, it is a setting that's fully
handled by the new config system. See
https://github.com/dolphin-emu/dolphin/pull/9804#discussion_r648949686.
This is used as a base pointer inside CustomPipelineAction, so this
should probably really have a virtual destructor to ensure derived
objects are torn down properly.
The point of farcode is to provide a separate location for code that
rarely runs, so that it doesn't pollute the icache. Taking a conditional
branch is something that happens very often, so the code for that
shouldn't be in farcode.
Bug: https://bugs.dolphin-emu.org/issues/13404
On macOS 13.6 / Intel HD 5000, Dolphin crashes with this message:
> -[MTLIGAccelDevice setShouldMaximizeConcurrentCompilation:]: unrecognized selector
This should be available on all macOS 13.3+ systems – but when using OCLP drivers,
some devices use an older version of Metal.framework, which doesn't expose the selector.
This concerns Intel Ivy Bridge, Haswell and Nvidia Kepler when using OCLP on macOS 13.3
or newer.
(See
34676702f4/docs/PATCHEXPLAIN.md (L354C1-L354C83))
As the behavior is an optional optimization anyway, perform a dynamic
detection to avoid crashing if the feature is not available.
Some state changes are meant to be near instantanoues, before switching to something else. By reporting ithe instant switch, the UI will flicker between states (pause/play button) and the debugger will unnecessarily update. Skipping the callback avoids these issues.
This was implemented to prevent UI flickering due to the state rapidly switching between pause/play. Recently, it has been causing issues with debugger windows, which update during frame advance.
This way, the number of loop iterations is equal to the number of set
bits in the bitset rather than the number of bits in the bitset,
which is a good improvement because usually very few bits are set.
This caused us to update the indirect texture information in shaders more often than we needed to, which probably doesn't matter in practice since it's only used in ubershaders and copyyscale and stride are generally only updated before EFB/XFB copies, which generally will have other changes afterwards.
As far as I can tell, it has nothing to do with the mipmap/half_scale functionality, but does change based on the width of the destination texture (and the destination texture is half the width if half_scale is set). The comment that was there (which dates back to the initial megacommit) seems to not have accounted for the width aspect; it was first used as an actual stride in bbbe898839 (the first commit that used it at all).
The move assignment operator for a class is implicitly deleted when the
class has a non-static reference data member, which is true of
WiiSocket's m_socket_manager member.
Explicitly declaring the operator as default generates a
-Wdefaulted-function-deleted warning on Clang.
Delete the move constructor as well for consistency.
Fix -WSwitch warning about unhandled enum value SDL_NUM_LOG_PRIORITIES.
log_level is initialized to LNOTICE right before the switch statement so
this doesn't cause any behavior changes.
Use RunOnCPUThread instead of RunAsCPUThread in BeginRecordingInput.
Most OpenGL functions require an OpenGL context to have been created on
that thread before calling the function; when that isn't the case they
return invalid results which can cause crashes when passed into other
functions.
Dolphin creates the OpenGL context in the EmuThread which then becomes
either the CPU-GPU thread or the Video thread for single and dual core
respectively. OpenGL functions must therefore be called from that
thread.
Movie::BeginRecordingInput is called from the Host thread and runs a
block of code which ultimately creates a savestate, which in turn embeds
the framebuffer which requires calling various OpenGL functions.
In single core the use of RunAsCPUThread leads to this all happening on
the Host thread, eventually leading to invalid OpenGL calls and a crash.
In Dual core the crash is avoided because VideoBackendBase::DoState uses
the AsyncRequests::DO_SAVE_STATE event which causes VideoCommon_DoState
and its subsequent OpenGL calls to safely run on the Video thread.
This commit uses RunOnCPUThread instead of RunAsCPUThread, which causes
the subsequent code to run on the CPU-GPU thread in single core which
has the valid OpenGL context and so doesn't crash.
This makes it so that if you just want to reload the current style (eg. on program start, or in response to a system event), you don't need to know the name of the currently selected user style. It's also more consistent with the way the 'userstyle/enabled' flag works.
Before dbf5dca, the dirty flag had no meaning for an immediate value,
so we made sure to always set the dirty flag when switching a register
from Immediate to Register. But after dbf5dca, that is no longer the
case. If an immediate is marked as not dirty, we can keep the register
marked as not dirty after materializing the value. This way we skip
having to write it back to ppcState later.
Without this change, non-dirty immediates don't actually get flushed.
This can be a problem if we for instance are flushing all registers in
order to execute an interpreter fallback. If that interpreter fallback
writes to a register that contained a non-dirty immediate, the JIT will
keep using the old value instead of loading the updated value.
This required a change in the denormal path where, instead of
subtracting 11 before shifting left, we shift left immediately and then
shift right by 11. This shouldn't affect performance.
Instead of combining X2 (the exponent) and X3 (the mantissa) using an
ORR instruction, we can read X1, which already contains both.
This requires us to reconstruct X1 in the denormal path, but that's
an acceptable price.
Dolphin's JITs have a minor terminology problem: The term "fastmem" can
refer to either the system of switching between a fast path and a slow
path using backpatching, or to the fast path itself. To hopefully make
things clearer, I'm adding some new terms, defining the old and new
terms as follows:
Fastmem: The system of switching from a fast path to a slow path by
backpatching when an invalid memory access occurs.
Fast access: A code path that accesses guest memory without calling C++
code.
Slow access: A code path that accesses guest memory by calling C++ code.
With this, situations where multiple arguments need to be moved
from multiple registers become easy to handle, and we also get
compile-time checking that the number of arguments is correct.
At the end of each frame automatically update the Current Value for
visible table rows in the selected and visible CheatSearchWidget (if
any). Also update all Current Values in all CheatSearchWidgets when the
State changes to Paused.
Only updating visible table rows serves to minimize the performance cost
of this feature. If the user scrolls to an un-updated cell it will
promptly be updated by either the next VIEndFieldEvent or the State
transitioning to Paused.
If dcache is enabled when the game starts, initializing the fastmem
arena is still useful in case the user changes the dcache setting.
And initializing it doesn't really cost anything.
Preparation for the next commit.
JitArm64 has been conflating these two flags. Most of the stuff that's
been guarded by fastmem_arena checks in fact requires fastmem.
When we have fastmem_arena without fastmem, it would be possible to do
things a bit more efficiently than what this commit does, but it's
non-trivial and therefore I will leave it out of this PR. With this
commit, we effectively have the same behavior as before this PR - plus
the added ability to toggle fastmem with a cache clear.
This is needed so that the checks added in the previous commit will be
reevaluated if the value of m_enable_dcache changes.
JitArm64 was already recompiling its asm routines on cache clear by
necessity. It doesn't have the same setup as Jit64 where the asm
routines are in a separate region, so clearing the JitArm64 cache
results in the asm routines being cleared too.
Some code paths in EmuCodeBlock.cpp that were checking fastmem_arena
should really also be checking m_enable_dcache.
Because JitArm64 centralizes more or less all memory access to the
EmitBackpatchRoutine function and because that function already
contained a check, JitArm64 works fine without the additional checks
added by this commit. Regardless, I added the checks to MMU.cpp instead
of EmuCodeBlock.cpp where applicable so they would be available to
JitArm64. Maybe one day JitArm64 will need them if its code gets
restructured.
The table only needs to be recreated when the displayed addresses might
change. If we're just refreshing the current values then update those
table cells and leave the rest of the table alone.
The code previously did this indirectly via `std::map<double, int>`, the key being the timestamp, which required a questionable workaround for the case where multiple states have the same timestamp. By having a particular combination of timestamps in the on-disk savestates, you could cause this workaround to infinitely loop, locking up Dolphin. This avoids this completely by refactoring the logic and just using `std::vector` instead.