-
Notifications
You must be signed in to change notification settings - Fork 15
RWMC #218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: hlsl_path_tracer
Are you sure you want to change the base?
RWMC #218
Conversation
| }; | ||
|
|
||
| template<class RandGen, class RayGen, class Intersector, class MaterialSystem, /* class PathGuider, */ class NextEventEstimator> | ||
| // TODO: maybe implement a concept to ensure that OutputTypeVec is a vector? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes please concept needed!
| struct SPushConstants | ||
| #ifndef __HLSL_VERSION | ||
| #include "matrix4SIMD.h" | ||
| #endif | ||
|
|
||
| struct RenderPushConstants | ||
| { | ||
| #ifdef __HLSL_VERSION | ||
| float32_t4x4 invMVP; | ||
| #else | ||
| nbl::core::matrix4SIMD invMVP; | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no we use hlsl::float32_t4x4 in C++ and float32_t4x4 in HLSL only!
| #ifdef RWMC_ENABLED | ||
| [[vk::image_format("rgba16f")]] [[vk::binding(0, 1)]] RWTexture2DArray<float32_t4> cascade; | ||
| #endif | ||
| [[vk::image_format("rgba16f")]] [[vk::binding(0, 0)]] RWTexture2D<float32_t4> outImage; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
always use an array view (without RWMC you can just make the array 1 layer), and use the same binding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you're only writing the image (not loading) you can skip declaring the format, because we use the Unformatted Storage extension
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after taking another look at the code, i figured it's best to keep cascade and outImage in two separate descriptor sets. this way descriptor set 0 (the one with outImage only) can be reused across different shaders.
| using pathtracer_type = ext::PathTracer::Unidirectional<randgen_type, raygen_type, intersector_type, material_system_type, nee_type>; | ||
|
|
||
| #ifdef RWMC_ENABLED | ||
| using accumulator_type = rwmc::CascadeAccumulator<float32_t3, CascadeSize>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming nitpick, CascadeCount
| struct RenderRWMCPushConstants | ||
| { | ||
| #ifdef __HLSL_VERSION | ||
| float32_t4x4 invMVP; | ||
| #else | ||
| nbl::core::matrix4SIMD invMVP; | ||
| #endif | ||
| int sampleCount; | ||
| int depth; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make one push cosntant struct in terms of the other, preferably through composition, so
struct RenderRWMCPushConstants
{
RenderPushConstants base;
rwmc::SplattingParameters rwmc;
};| float start; | ||
| float base; | ||
| float kappa; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dont use the same push constants for resolve and rendering & splatting
I really want you to pack up start and base into its own rwmc::SplattingParameters struct
| #ifdef RWMC_ENABLED | ||
| #include <nbl/builtin/hlsl/rwmc/CascadeAccumulator.hlsl> | ||
| #include "render_rwmc_common.hlsl" | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should be able to always include stuff like this without a penalty
| #ifdef PERSISTENT_WORKGROUPS | ||
| uint32_t virtualThreadIndex; | ||
| [loop] | ||
| for (uint32_t virtualThreadBase = glsl::gl_WorkGroupID().x * WorkgroupSize; virtualThreadBase < 1920 * 1080; virtualThreadBase += glsl::gl_NumWorkGroups().x * WorkgroupSize) | ||
| { | ||
| virtualThreadIndex = virtualThreadBase + glsl::gl_LocalInvocationIndex().x; | ||
| const int32_t2 coords = (int32_t2)math::Morton<uint32_t>::decode2d(virtualThreadIndex); | ||
| #else | ||
| const int32_t2 coords = getCoordinates(); | ||
| #endif | ||
|
|
||
| rwmc::ReweightingParameters reweightingParameters = rwmc::computeReweightingParameters(pc.base, pc.sampleCount, pc.minReliableLuma, pc.kappa, CascadeSize); | ||
| float32_t3 color = rwmc::reweight(reweightingParameters, cascade, coords); | ||
|
|
||
| outImage[coords] = float32_t4(color, 1.0f); | ||
|
|
||
| #ifdef PERSISTENT_WORKGROUPS | ||
| } | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolve doesn't need to use persistent WG, you can use a Regular 2D dispatch
| #ifdef PERSISTENT_WORKGROUPS | ||
| uint32_t virtualThreadIndex; | ||
| [loop] | ||
| for (uint32_t virtualThreadBase = glsl::gl_WorkGroupID().x * WorkgroupSize; virtualThreadBase < 1920 * 1080; virtualThreadBase += glsl::gl_NumWorkGroups().x * WorkgroupSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't hardcode rendertarget resolution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can query it from the cascade image HLSL has intrinsics for that, you can add SPIR-V instrinscis for GLSL functions textureSize and imageSize
| const int32_t2 coords = getCoordinates(); | ||
| #endif | ||
|
|
||
| rwmc::ReweightingParameters reweightingParameters = rwmc::computeReweightingParameters(pc.base, pc.sampleCount, pc.minReliableLuma, pc.kappa, CascadeSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be done on CPU and the reweightingParamaters should be in your ResolvePushConstants push constants here.
| NBL_CONSTEXPR uint32_t MAX_DEPTH_LOG2 = 4; | ||
| NBL_CONSTEXPR uint32_t MAX_SAMPLES_LOG2 = 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you dont need those here I think
| int32_t2 getCoordinates() | ||
| { | ||
| uint32_t width, height; | ||
| outImage.GetDimensions(width, height); | ||
| return int32_t2(glsl::gl_GlobalInvocationID().x % width, glsl::gl_GlobalInvocationID().x / width); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use a 2D dispatch
| Accumulator accumulator; | ||
| accumulator.initialize(accumulatorInitData); | ||
| //scalar_type meanLumaSq = 0.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
take accumulator byt deferend from the outside, otherwise static polymorphism gets harder with stateful accumulators
Consider this scenario, I already have an accumulator, and I just want to add a fwe samples
|
|
||
| // Li | ||
| measure_type getMeasure(uint32_t numSamples, uint32_t depth, NBL_CONST_REF_ARG(scene_type) scene) | ||
| output_storage_type getMeasure(uint32_t numSamples, uint32_t depth, NBL_CONST_REF_ARG(scene_type) scene, NBL_REF_ARG(typename Accumulator::initialization_data) accumulatorInitData) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one suggestion to refactor, take sampleIndex (the i in the loop) and do the loop outside the path tracer (initialize the accumulator outside as well)
Also I know its not your code, but better rename depth to maxDepth
| return Li; | ||
| return accumulator.accumulation; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
take accumulator by reference and make the function void
| }; | ||
|
|
||
| std::array<ICPUDescriptorSetLayout::SBinding, 1> descriptorSet0Bindings = {}; | ||
| std::array<ICPUDescriptorSetLayout::SBinding, 1> descriptorSet1Bindings = {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't add a new set, reuse the binding for the output image for the cascades as I wrote about the layers == CascadeCount in other comments
| params.shader.entryPoint = "main"; | ||
| params.shader.entries = nullptr; | ||
| params.shader.requireFullSubgroups = true; | ||
| params.shader.requiredSubgroupSize = static_cast<IGPUShader::SSpecInfo::SUBGROUP_SIZE>(5); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is bad and breaks on other GPUs than NV and RDNA, I know its @keptsecret's "fault" but lets fix, make the subgroup size the log2 of the minimum the device reports (for best divergence handling)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice code deduplication btw :P
| imgViewInfo.viewType = IGPUImageView::ET_2D; | ||
| } | ||
| else | ||
| { | ||
| imgViewInfo.subresourceRange.layerCount = CascadeSize; | ||
| imgViewInfo.viewType = IGPUImageView::ET_2D_ARRAY; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simplify code, always make it a 2D Array (you can just have 1 layer, no problem)
| cmdbuf->reset(IGPUCommandBuffer::RESET_FLAGS::NONE); | ||
| // disregard surface/swapchain transformation for now | ||
| const auto viewProjectionMatrix = m_camera.getConcatenatedMatrix(); | ||
| viewProjectionMatrix.getInverseTransform(pc.invMVP); | ||
| pc.sampleCount = spp; | ||
| pc.depth = depth; | ||
|
|
||
| // safe to proceed | ||
| // upload buffer data | ||
| cmdbuf->beginDebugMarker("ComputeShaderPathtracer IMGUI Frame"); | ||
| cmdbuf->begin(IGPUCommandBuffer::USAGE::ONE_TIME_SUBMIT_BIT); | ||
|
|
||
| // TRANSITION m_outImgView to GENERAL (because of descriptorSets0 -> ComputeShader Writes into the image) | ||
| { | ||
| const IGPUCommandBuffer::SImageMemoryBarrier<IGPUCommandBuffer::SOwnershipTransferBarrier> imgBarriers[] = { | ||
| { | ||
| .barrier = { | ||
| .dep = { | ||
| .srcStageMask = PIPELINE_STAGE_FLAGS::ALL_TRANSFER_BITS, | ||
| .srcAccessMask = ACCESS_FLAGS::TRANSFER_WRITE_BIT, | ||
| .dstStageMask = PIPELINE_STAGE_FLAGS::COMPUTE_SHADER_BIT, | ||
| .dstAccessMask = ACCESS_FLAGS::SHADER_WRITE_BITS | ||
| } | ||
| }, | ||
| .image = m_outImgView->getCreationParameters().image.get(), | ||
| .subresourceRange = { | ||
| .aspectMask = IImage::EAF_COLOR_BIT, | ||
| .baseMipLevel = 0u, | ||
| .levelCount = 1u, | ||
| .baseArrayLayer = 0u, | ||
| .layerCount = 1u | ||
| }, | ||
| .oldLayout = IImage::LAYOUT::UNDEFINED, | ||
| .newLayout = IImage::LAYOUT::GENERAL | ||
| } | ||
| }; | ||
| cmdbuf->pipelineBarrier(E_DEPENDENCY_FLAGS::EDF_NONE, { .imgBarriers = imgBarriers }); | ||
| } | ||
|
|
||
| // cube envmap handle | ||
| { | ||
| IGPUComputePipeline* pipeline; | ||
| if (usePersistentWorkGroups) | ||
| pipeline = renderMode == E_RENDER_MODE::ERM_HLSL ? m_PTHLSLPersistentWGPipelines[PTPipeline].get() : m_PTGLSLPersistentWGPipelines[PTPipeline].get(); | ||
| else | ||
| pipeline = renderMode == E_RENDER_MODE::ERM_HLSL ? m_PTHLSLPipelines[PTPipeline].get() : m_PTGLSLPipelines[PTPipeline].get(); | ||
| cmdbuf->bindComputePipeline(pipeline); | ||
| cmdbuf->bindDescriptorSets(EPBP_COMPUTE, pipeline->getLayout(), 0u, 1u, &m_descriptorSet0.get()); | ||
| cmdbuf->bindDescriptorSets(EPBP_COMPUTE, pipeline->getLayout(), 2u, 1u, &m_descriptorSet2.get()); | ||
| cmdbuf->pushConstants(pipeline->getLayout(), IShader::E_SHADER_STAGE::ESS_COMPUTE, 0, sizeof(RenderPushConstants), &pc); | ||
|
|
||
| // TODO: shouldn't it be computed only at initialization stage and on window resize? | ||
| const uint32_t dispatchSize = usePersistentWorkGroups ? | ||
| m_physicalDevice->getLimits().computeOptimalPersistentWorkgroupDispatchSize(WindowDimensions.x * WindowDimensions.y, DefaultWorkGroupSize) : | ||
| 1 + (WindowDimensions.x * WindowDimensions.y - 1) / DefaultWorkGroupSize; | ||
|
|
||
| cmdbuf->dispatch(dispatchSize, 1u, 1u); | ||
| } | ||
|
|
||
| // TRANSITION m_outImgView to READ (because of descriptorSets0 -> ComputeShader Writes into the image) | ||
| { | ||
| const IGPUCommandBuffer::SImageMemoryBarrier<IGPUCommandBuffer::SOwnershipTransferBarrier> imgBarriers[] = { | ||
| { | ||
| .barrier = { | ||
| .dep = { | ||
| .srcStageMask = PIPELINE_STAGE_FLAGS::COMPUTE_SHADER_BIT, | ||
| .srcAccessMask = ACCESS_FLAGS::SHADER_WRITE_BITS, | ||
| .dstStageMask = PIPELINE_STAGE_FLAGS::FRAGMENT_SHADER_BIT, | ||
| .dstAccessMask = ACCESS_FLAGS::SHADER_READ_BITS | ||
| } | ||
| }, | ||
| .image = m_outImgView->getCreationParameters().image.get(), | ||
| .subresourceRange = { | ||
| .aspectMask = IImage::EAF_COLOR_BIT, | ||
| .baseMipLevel = 0u, | ||
| .levelCount = 1u, | ||
| .baseArrayLayer = 0u, | ||
| .layerCount = 1u | ||
| }, | ||
| .oldLayout = IImage::LAYOUT::GENERAL, | ||
| .newLayout = IImage::LAYOUT::READ_ONLY_OPTIMAL | ||
| } | ||
| }; | ||
| cmdbuf->pipelineBarrier(E_DEPENDENCY_FLAGS::EDF_NONE, { .imgBarriers = imgBarriers }); | ||
| } | ||
|
|
||
|
|
||
| } | ||
|
|
||
| void beginCommandBufferAndDispatchPathracerPipelineUseRWMC(IGPUCommandBuffer* cmdbuf) | ||
| { | ||
| if (renderMode != E_RENDER_MODE::ERM_HLSL) | ||
| { | ||
| m_logger->log("Only HLSL render mode is supported.", ILogger::ELL_ERROR); | ||
| std::exit(-1); | ||
| } | ||
|
|
||
| cmdbuf->reset(IGPUCommandBuffer::RESET_FLAGS::NONE); | ||
| // disregard surface/swapchain transformation for now | ||
| const auto viewProjectionMatrix = m_camera.getConcatenatedMatrix(); | ||
| viewProjectionMatrix.getInverseTransform(rwmcPushConstants.invMVP); | ||
|
|
||
| rwmcPushConstants.start = rwmcCascadeStart; | ||
| rwmcPushConstants.depth = depth; | ||
| rwmcPushConstants.sampleCount = resolvePushConstants.sampleCount = spp; | ||
| rwmcPushConstants.base = resolvePushConstants.base = rwmcCascadeBase; | ||
| resolvePushConstants.minReliableLuma = rwmcMinReliableLuma; | ||
| rwmcPushConstants.kappa = resolvePushConstants.kappa = rwmcKappa; | ||
|
|
||
| // safe to proceed | ||
| // upload buffer data | ||
| cmdbuf->beginDebugMarker("ComputeShaderPathtracer IMGUI Frame"); | ||
| cmdbuf->begin(IGPUCommandBuffer::USAGE::ONE_TIME_SUBMIT_BIT); | ||
|
|
||
| // TRANSITION m_outImgView to GENERAL (because of descriptorSets0 -> ComputeShader Writes into the image) | ||
| { | ||
| const IGPUCommandBuffer::SImageMemoryBarrier<IGPUCommandBuffer::SOwnershipTransferBarrier> imgBarriers[] = { | ||
| { | ||
| .barrier = { | ||
| .dep = { | ||
| .srcStageMask = PIPELINE_STAGE_FLAGS::ALL_TRANSFER_BITS, | ||
| .srcAccessMask = ACCESS_FLAGS::TRANSFER_WRITE_BIT, | ||
| .dstStageMask = PIPELINE_STAGE_FLAGS::COMPUTE_SHADER_BIT, | ||
| .dstAccessMask = ACCESS_FLAGS::SHADER_WRITE_BITS | ||
| } | ||
| }, | ||
| .image = m_outImgView->getCreationParameters().image.get(), | ||
| .subresourceRange = { | ||
| .aspectMask = IImage::EAF_COLOR_BIT, | ||
| .baseMipLevel = 0u, | ||
| .levelCount = 1u, | ||
| .baseArrayLayer = 0u, | ||
| .layerCount = 1u | ||
| }, | ||
| .oldLayout = IImage::LAYOUT::UNDEFINED, | ||
| .newLayout = IImage::LAYOUT::GENERAL | ||
| } | ||
| }; | ||
| cmdbuf->pipelineBarrier(E_DEPENDENCY_FLAGS::EDF_NONE, { .imgBarriers = imgBarriers }); | ||
| } | ||
|
|
||
| // transit m_cascadeView layout to GENERAL, block until previous shader is done with reading from cascade | ||
| { | ||
| const IGPUCommandBuffer::SImageMemoryBarrier<IGPUCommandBuffer::SOwnershipTransferBarrier> cascadeBarrier[] = { | ||
| { | ||
| .barrier = { | ||
| .dep = { | ||
| .srcStageMask = PIPELINE_STAGE_FLAGS::COMPUTE_SHADER_BIT, | ||
| .srcAccessMask = ACCESS_FLAGS::NONE, | ||
| .dstStageMask = PIPELINE_STAGE_FLAGS::COMPUTE_SHADER_BIT, | ||
| .dstAccessMask = ACCESS_FLAGS::NONE | ||
| } | ||
| }, | ||
| .image = m_cascadeView->getCreationParameters().image.get(), | ||
| .subresourceRange = { | ||
| .aspectMask = IImage::EAF_COLOR_BIT, | ||
| .baseMipLevel = 0u, | ||
| .levelCount = 1u, | ||
| .baseArrayLayer = 0u, | ||
| .layerCount = CascadeSize | ||
| }, | ||
| .oldLayout = IImage::LAYOUT::UNDEFINED, | ||
| .newLayout = IImage::LAYOUT::GENERAL | ||
| } | ||
| }; | ||
| cmdbuf->pipelineBarrier(E_DEPENDENCY_FLAGS::EDF_NONE, { .imgBarriers = cascadeBarrier }); | ||
| } | ||
|
|
||
| // TODO: shouldn't it be computed only at initialization stage and on window resize? | ||
| const uint32_t dispatchSize = usePersistentWorkGroups ? | ||
| m_physicalDevice->getLimits().computeOptimalPersistentWorkgroupDispatchSize(WindowDimensions.x * WindowDimensions.y, DefaultWorkGroupSize) : | ||
| 1 + (WindowDimensions.x * WindowDimensions.y - 1) / DefaultWorkGroupSize; | ||
|
|
||
| { | ||
| IGPUComputePipeline* pipeline = usePersistentWorkGroups ? m_PTHLSLPersistentWGPipelinesRWMC[PTPipeline].get() : m_PTHLSLPipelinesRWMC[PTPipeline].get(); | ||
|
|
||
| cmdbuf->bindComputePipeline(pipeline); | ||
| cmdbuf->bindDescriptorSets(EPBP_COMPUTE, pipeline->getLayout(), 0u, 1u, &m_descriptorSet0.get()); | ||
| cmdbuf->bindDescriptorSets(EPBP_COMPUTE, pipeline->getLayout(), 1u, 1u, &m_descriptorSet1.get()); | ||
| cmdbuf->bindDescriptorSets(EPBP_COMPUTE, pipeline->getLayout(), 2u, 1u, &m_descriptorSet2.get()); | ||
| cmdbuf->pushConstants(pipeline->getLayout(), IShader::E_SHADER_STAGE::ESS_COMPUTE, 0, sizeof(RenderRWMCPushConstants), &rwmcPushConstants); | ||
|
|
||
| cmdbuf->dispatch(dispatchSize, 1u, 1u); | ||
| } | ||
|
|
||
| // m_cascadeView synchronization - wait for previous compute shader to write into the cascade | ||
| // TODO: create this and every other barrier once outside of the loop? | ||
| { | ||
| const IGPUCommandBuffer::SImageMemoryBarrier<IGPUCommandBuffer::SOwnershipTransferBarrier> cascadeBarrier[] = { | ||
| { | ||
| .barrier = { | ||
| .dep = { | ||
| .srcStageMask = PIPELINE_STAGE_FLAGS::COMPUTE_SHADER_BIT, | ||
| .srcAccessMask = ACCESS_FLAGS::SHADER_WRITE_BITS, | ||
| .dstStageMask = PIPELINE_STAGE_FLAGS::COMPUTE_SHADER_BIT, | ||
| .dstAccessMask = ACCESS_FLAGS::SHADER_READ_BITS | ||
| } | ||
| }, | ||
| .image = m_cascadeView->getCreationParameters().image.get(), | ||
| .subresourceRange = { | ||
| .aspectMask = IImage::EAF_COLOR_BIT, | ||
| .baseMipLevel = 0u, | ||
| .levelCount = 1u, | ||
| .baseArrayLayer = 0u, | ||
| .layerCount = CascadeSize | ||
| } | ||
| } | ||
| }; | ||
| cmdbuf->pipelineBarrier(E_DEPENDENCY_FLAGS::EDF_NONE, { .imgBarriers = cascadeBarrier }); | ||
| } | ||
|
|
||
| // reweighting | ||
| { | ||
| IGPUComputePipeline* pipeline = usePersistentWorkGroups ? m_resolvePersistentWGPipeline.get() : m_resolvePipeline.get(); | ||
|
|
||
| cmdbuf->bindComputePipeline(pipeline); | ||
| cmdbuf->bindDescriptorSets(EPBP_COMPUTE, pipeline->getLayout(), 0u, 1u, &m_descriptorSet0.get()); | ||
| cmdbuf->bindDescriptorSets(EPBP_COMPUTE, pipeline->getLayout(), 1u, 1u, &m_descriptorSet1.get()); | ||
| cmdbuf->pushConstants(pipeline->getLayout(), IShader::E_SHADER_STAGE::ESS_COMPUTE, 0, sizeof(ResolvePushConstants), &resolvePushConstants); | ||
|
|
||
| cmdbuf->dispatch(dispatchSize, 1u, 1u); | ||
| } | ||
|
|
||
| // TRANSITION m_outImgView to READ (because of descriptorSets0 -> ComputeShader Writes into the image) | ||
| { | ||
| const IGPUCommandBuffer::SImageMemoryBarrier<IGPUCommandBuffer::SOwnershipTransferBarrier> imgBarriers[] = { | ||
| { | ||
| .barrier = { | ||
| .dep = { | ||
| .srcStageMask = PIPELINE_STAGE_FLAGS::COMPUTE_SHADER_BIT, | ||
| .srcAccessMask = ACCESS_FLAGS::SHADER_WRITE_BITS, | ||
| .dstStageMask = PIPELINE_STAGE_FLAGS::FRAGMENT_SHADER_BIT, | ||
| .dstAccessMask = ACCESS_FLAGS::SHADER_READ_BITS | ||
| } | ||
| }, | ||
| .image = m_outImgView->getCreationParameters().image.get(), | ||
| .subresourceRange = { | ||
| .aspectMask = IImage::EAF_COLOR_BIT, | ||
| .baseMipLevel = 0u, | ||
| .levelCount = 1u, | ||
| .baseArrayLayer = 0u, | ||
| .layerCount = 1u | ||
| }, | ||
| .oldLayout = IImage::LAYOUT::GENERAL, | ||
| .newLayout = IImage::LAYOUT::READ_ONLY_OPTIMAL | ||
| } | ||
| }; | ||
| cmdbuf->pipelineBarrier(E_DEPENDENCY_FLAGS::EDF_NONE, { .imgBarriers = imgBarriers }); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deduplicate this code a bit, I think everything up to the resolve shader can stay as is, and the only diff between RWMC and no RWMC should be the push constants you set on the main PT shader
| float rwmcCascadeStart; | ||
| float rwmcCascadeBase; | ||
| float rwmcMinReliableLuma; | ||
| float rwmcKappa; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the reason you're not using shared HLSL/C++ RWMC structs is because the ImGUI parameters are different to what we actually carry around ?
Still ImGUI can take poitners to struct members, so if you can wrap it up in some HLSL struct which then gets translated intot eh actual optimized one with log2 members, that would be better
| RenderRWMCPushConstants rwmcPushConstants; | ||
| RenderPushConstants pc; | ||
| ResolvePushConstants resolvePushConstants; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't you rebuild them every frame from imgui stuff + camera etc?
Maybe lets not store them around as global state and have them transient
No description provided.