Skip to content

Minor improvements in Orochi 2.00 for Zen4 Raphael APU support and other suggestions.. #83

Description

@oscarbg

Hi,
first, congrats on getting Orochi 2.0 release out!
comments:

  1. tested your new Orochi 2.0 release on Zen4 Raphael iGPU and get some errors..

1a)first RadixSort test fails on Zen4 due to number_of_blocks becoming 0 due to APU having only 2 compute units..
fix (ParallelPrimitives/RadixSort.cpp):

258c258,259
< 		number_of_blocks = ( number_of_blocks / base ) * base;
---
> 		number_of_blocks = ( number_of_blocks / base ) * base;
> 		if( number_of_blocks == 0 ) number_of_blocks = 4;

1b)VulkanComputeSimple checking of pciBusID is incorrect for APUs as Vulkan reports 0 and ROCM 0x6b or viceversa.. I fixed it main.cpp with:

< 			if (physicalDevicePCIBusInfoProperties.pciDomain == props.pciDomainID &&
< 				physicalDevicePCIBusInfoProperties.pciBus == props.pciBusID &&
< 				physicalDevicePCIBusInfoProperties.pciDevice == props.pciDeviceID) {
< 				physicalDeviceIndex = i;
< 				break;
---
> 			if( physicalDevicePCIBusInfoProperties.pciDomain == props.pciDomainID &&
> 				// physicalDevicePCIBusInfoProperties.pciBus == props.pciBusID &&
> 				physicalDevicePCIBusInfoProperties.pciDevice == props.pciDeviceID )
> 			{
> 				if( physicalDevicePCIBusInfoProperties.pciBus > 16 ) // integrated!!
> 				{
> 					if( props.pciBusID == 0 )
> 					{
> 						physicalDeviceIndex = i;
> 						break;
> 					}
> 				}
> 				else
> 				{
> 					if( physicalDevicePCIBusInfoProperties.pciBus == props.pciBusID )
> 					{
> 						physicalDeviceIndex = i;
> 						break;
> 					}
> 				}
  1. can you fix running samples OK by default after a premake5 build?..
    I mean running executables directly from Orochi-2.00/dist/bin/Debug or Orochi-2.00/dist/bin/Release folders..
    as they seem to try to expect to find kernels or sample data like textures in "../" folder so basically you have to options:

2a) copy or move Debug or Release folder in Orochi-2.0/ folder (out of dist/bin) so samples find required data or

2b) recreate/copy needed files from Orochi-2.0 folder to Orochi-2.00/dist/bin like for example copying:

Orochi-2.00/Test/Texture/texture_test_kernel.hpp
to:
Orochi-2.00/dist/bin/Test/Texture/texture_test_kernel.hpp

3)add Linux support to VulkanComputeSimple RadixSort (Orochi-2.00/Test/VulkanComputeSimple and Orochi-2.00/Test/RadixSort)..
in fact support it's there.. only minor fixes as I added and is very simple..
for Vulkan premake5.lua:

--    buildoptions { "/wd4244" }
      buildoptions { "--std=c++14" }
--      links{ "Pop" }
--      links{ "kernel32", "user32", "gdi32", "winspool", "comdlg32", "advapi32", "shell32", "ole32", "oleaut32", "uuid", "odbc32", "odbccp32", "version" }

  1. finally would be nice if WMMA sample get added NV support.. seems AMD only ATM..
    I started with adding to the kernel "#ifdef CUDA_ARCH" code:
#ifdef __CUDA_ARCH__
#include <mma.h>
//https://developer.nvidia.com/blog/programming-tensor-cores-cuda-9/
using namespace nvcuda;
...
wmma::fragment<wmma::matrix_a, 16, 16, 16, half, wmma::col_major> a_frag2;
	wmma::fragment<wmma::matrix_b, 16, 16, 16, half, wmma::col_major> b_frag2;
	wmma::fragment<wmma::accumulator, 16, 16, 16, float> acc_frag2;
	wmma::fragment<wmma::accumulator, 16, 16, 16, float> c_frag2;
..
#ifdef __CUDA_ARCH__
	nvcuda::wmma::mma_sync( c_frag2, a_frag2, b_frag2, c_frag2 );
#else
	c_frag = __builtin_amdgcn_wmma_f16_16x16x16_f16_w32( a_frag, b_frag, c_frag, false );
#endif

#endif

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions