Recent

Author Topic: ARM float32 -> float16  (Read 2188 times)

ChrisR

  • Full Member
  • ***
  • Posts: 247
ARM float32 -> float16
« on: January 17, 2021, 03:49:07 pm »
Hello all,
  Can anyone help me convert this tiny C program that calls the Neon "vcvt_f32_f16" instruction, e.g. converting to FPC assembly?

  My OpenGL code uses GL_INT_2_10_10_10_REV datatype for normals which is not supported on the M1. I can use GL_FLOAT (float32), but that is wasteful. Therefore, I would like to use GL_HALF (float16). Included is a simple C program which calls the Neon SIMD instruction. My Pascal uses scalar code but the results appear different. Perhaps this reflects that Arm supports two half-precision (16-bit) floating-point scalar data types:
 The IEEE 754-2008 __fp16 data type, defined in the Arm C Language Extensions.
 The _Float16 data type, defined in the C11 extension ISO/IEC TS 18661-3:2015

(the attached file "f16.inc" should be renamed "f16.c", extension changed to allow upload).


$gcc f16.c -o f16; ./f16
f32: 32.012 64.094 12.096 3.14159274
f16: 0x00000020 0x00000040 0x0000000c 0x00000003
f32: 32 64.125 12.0938 3.14062500

$fpc f16.pas; ./f16     
Free Pascal Compiler version 3.3.1 [2021/01/01] for aarch64
Copyright (c) 1993-2020 by Florian Klaempfl and others
Target OS: Darwin for AArch64
Compiling f16.pas
Assembling f16
Linking f16
200 lines compiled, 0.5 sec
f32: 32.0120010375977 64.0940017700195 12.0959997177124 3.14159274101257
f16: 0x5000 0x5402 0x4A0C 0x4248
f32: 32 64.125 12.09375 3.140625

Laksen

  • Hero Member
  • *****
  • Posts: 724
    • J-Software
Re: ARM float32 -> float16
« Reply #1 on: January 17, 2021, 05:16:19 pm »
Something like this might be possible.. It's very ugly but it bypasses the current lack of advanced simd/fp16 handling in the aarch64 backend

Code: [Select]
function f32tof16Neon(v: single): word;assembler; nostackframe;
asm
    .long 0x1e23c000 //fcvt h0, s0
    .long 0x1ee60000 //fmov w0, h0
end;

function f16tof32Neon(v: word): single;assembler; nostackframe;
asm
    .long 0x1ee70000 // fmov h0, w0
    .long 0x1ee24000 // fcvt s0, h0
end;

procedure f32tof16Neon(var output: float16x4; constref f32x4: float32x4 );
begin
    output.x := f32tof16Neon(f32x4.x);
    output.y := f32tof16Neon(f32x4.y);
    output.z := f32tof16Neon(f32x4.z);
    output.w := f32tof16Neon(f32x4.w);
end;

procedure f16tof32Neon(var output: float32x4; constref f16x4: float16x4 );
begin
    output.x := f16tof32Neon(f16x4.x);
    output.y := f16tof32Neon(f16x4.y);
    output.z := f16tof32Neon(f16x4.z);
    output.w := f16tof32Neon(f16x4.w);
end;

ChrisR

  • Full Member
  • ***
  • Posts: 247
Re: ARM float32 -> float16
« Reply #2 on: January 17, 2021, 05:41:43 pm »
1. Laksen: Thanks for your swift reply.
2. I have attached a copy of Laksen's elegant solution (f16.pas) which using scalar instructions not the Neon SIMD.
3. The text display of the hex codes still does not match my C code, though maybe I am not typecasting correctly in C. I will see how your solution looks with OpenGL.
4. I would be interested if anyone has a SIMD solution, as my input arrays are DTI tractography maps with thousands of values.

fpc f16.pas; ./f16     
Free Pascal Compiler version 3.3.1 [2021/01/01] for aarch64
Copyright (c) 1993-2020 by Florian Klaempfl and others
Target OS: Darwin for AArch64
Compiling f16.pas
Assembling f16
Linking f16
59 lines compiled, 0.2 sec
f32: 32.0120010375977 64.0940017700195 12.0959997177124 3.14159274101257
f16: 0x5000 0x5402 0x4A0C 0x4248
f32: 32 64.125 12.09375 3.140625

ChrisR

  • Full Member
  • ***
  • Posts: 247
Re: ARM float32 -> float16
« Reply #3 on: January 17, 2021, 06:02:09 pm »
Laksen
  Your solution works perfectly with OpenGL. Thanks a lot! The conversion is only done once when the mesh is loaded to the GPU, so I can live with the scalar solution (though I would like to see if anyone can provide the NEON simd solution). Thanks again for such a rapid and elegant response!

Laksen

  • Hero Member
  • *****
  • Posts: 724
    • J-Software
Re: ARM float32 -> float16
« Reply #4 on: January 18, 2021, 04:44:01 pm »
The reason the hex display doesn't work is because your C version is casting the fp16 value to an integer which simply returns the rounded value converted to an integer (0x00000020 is 32).

Try this:
Code: [Select]
printf("f16: 0x%04x 0x%04x 0x%04x 0x%04x\n", *(uint16_t*)(&f16x4[0]), *(uint16_t*)(&f16x4[1]),*(uint16_t*)(&f16x4[2]),*(uint16_t*)(&f16x4[3]));

Maybe this will work for the SIMD version:
Code: [Select]
procedure f32tof16Neon2(var output: float16x4; constref f32x4: float32x4 );assembler; nostackframe;
asm
ldr q0, [x1]
.long 0x0e216800 // fcvtn   v0.4h, v0.4s
str d0, [x0]
end;

It assumes that both pointers have proper alignment but that should be easy to ensure
« Last Edit: January 18, 2021, 04:56:02 pm by Laksen »

ChrisR

  • Full Member
  • ***
  • Posts: 247
Re: ARM float32 -> float16
« Reply #5 on: January 18, 2021, 05:40:23 pm »
Thanks for the help with C. I can confirm your Neon SIMD code works. Both the scalar and SIMD are exceptionally fast. My arrays and element data are aligned, but it is a good caveat to mention. Thanks again for your help. The upcoming release of Surfice will use you code and avoids the major performance penalty of GL_INT_2_10_10_10_REV.

You have solved my issue.

 

TinyPortal © 2005-2018