Recent

Author Topic: Proper way to perform texture format conversion with hardware acceleration  (Read 952 times)

Mr.Madguy

  • Hero Member
  • *****
  • Posts: 670
I'm currently working to make retro game engine with backward compatibility with old graphic libraries. Currently the biggest problem is - not all texture formats may be supported by old libraries. Of course I can always perform simple software conversion. But isn't there some better way to do it? Something like using SSE? Because software conversion would slow down loading time too much. Some other complex solution would be needed to overcome this problem - like caching converted textures on disk. It's viable solution, but I want to avoid it.
30.04.2021 - DynamicData 4.0 is released and migration to it is completed.
It's more Lazarus-friendly, but still requires full Delphi 2009 support to be ported to Lazarus.
It's time to finally do it, because Delphi 2009 is 12 years old.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 9460
  • FPC developer.
I'm currently working to make retro game engine with backward compatibility with old graphic libraries. Currently the biggest problem is - not all texture formats may be supported by old libraries. Of course I can always perform simple software conversion. But isn't there some better way to do it? Something like using SSE? Because software conversion would slow down loading time too much. Some other complex solution would be needed to overcome this problem - like caching converted textures on disk. It's viable solution, but I want to avoid it.

I assume this is usually solved in packaging, iow if you link to an old graphics library, you convert and put the converted textures in the installer?

SSE has some processor requirements too, and specially with retrogaming people often run on old iron. Know your audience...   SSE2 is available on everything that is 64-bit capable, and SSE3 on nearly everything that is 64-bit capable (the very first "Hammer" athlon64's excluded).   

Older CPUs are more strict with alignment though, so that might require aligning your buffers to 16-byte. (I align image buffers to 64-bit just in case). 

I have some SSE2/3 code, but they mostly come from the camera format conversion side of things. (so 24-bit to 32-bit and vice versa), but they are often limited (usually alignment is no problem because the sensor dimensions are usually divisible by 16), so aligning the buffer itself is enough.

For opengl, you can do conversions using shaders.

« Last Edit: May 01, 2021, 03:03:54 pm by marcov »

Mr.Madguy

  • Hero Member
  • *****
  • Posts: 670
Overall idea is - me and users of my engine shouldn't bother about providing textures in different formats and mipmaps in data files in order to support some rare old hardware.

My motto is "Slow, but works in any case". I.e. of curse I can use "lowest common denominator" 32bit RGBA format, that is supported by everything, and not even bother about it. Because I won't support anything below Win2k anyway, so I don't think, that my engine would encounter some old hardware with 16bit support only. But I still want to make sure, that it will work in any conditions. So I need at least some basic texture format conversion features, like RGBA to BGRA, 32bit to 16bit and software mipmap generation. Yeah, it's easily done in software, but I guess, drivers use some hardware acceleration internally. I can't use shaders or something like that, because shader support would most likely mean more or less modern video card, so I wouldn't need conversion in a first place. But detecting some SSE extensions and using them would be nice.

May be there are some open image conversion libraries, where I can look for clues?
« Last Edit: May 01, 2021, 03:33:48 pm by Mr.Madguy »
30.04.2021 - DynamicData 4.0 is released and migration to it is completed.
It's more Lazarus-friendly, but still requires full Delphi 2009 support to be ported to Lazarus.
It's time to finally do it, because Delphi 2009 is 12 years old.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 9460
  • FPC developer.
RGBA to BGRA,

I checked, and contrary to what I thought, I don't have this. Apparently I solved this in the shader. It is fairly simple though since only a change.

I quickly whipped up something I had based on older code, untested:

Usual caveats apply: 

  • width divisable by 8px
  • and aligned buffer.
  • Assumes format (dimensions and topdown/bottomup) of src and dest are the same.
[li]Uses some parameter (rpendall) to be able to do both topdown and bottomup images.
[/li][/list]

Code: Pascal  [Select][+][-]
  1.  
  2. const
  3.   shufrgb :  array[0..15] of byte = (2,1,0,3,6,5,4,7,10,9,8,11,14,13,12,15);
  4.  
  5. procedure swaprgba(src,dest:pbbyte;countinner,countouter,rpendall:integer);assembler;
  6. // src         rcx   source pointer
  7. // dest        rdx   dest pointer
  8. // countinner  r8d   width in pixels. Must be multiple of 8
  9. // countouter  r9d   number of lines.
  10. // rpendall    stack number of bytes to go from end of line to start of the next line
  11.                        roughly scanline[i]-scanline[0]-width, can be 0
  12.  
  13. // local vars:
  14. // rax: counter for inner loop.
  15. // r10: loads rpendall
  16. // xmm1:xmm0:  32 bytes worth of pixels ( 8 pixels)
  17. // xmm2: load shufrgb
  18.  
  19. asm
  20.         // load shufflemask to register.
  21.         movdqa xmm2,[rip+shufrgb]
  22.         // sign extend 32-bit values because it won't eat movzxd
  23.         // the chance we do more than 2 billion iterations is low.
  24.         movsxd  r10,rpenddest
  25.         shr r8,3        // 32 bytes = 8 RGBA pixels. Divide pixels per line by 8
  26.          //.pad 8, align loop target ?
  27.   @outer:
  28.         mov    rax,r8
  29.   .align 16
  30.   @loopstart:
  31.         movdqa xmm0,[rcx]
  32.         movdqa xmm1,[rcx+16]
  33.         pshufb xmm0,xmm2   // shuffle bottleneck, but what can you do in such short procedure?
  34.         pshufb xmm1,xmm2   //
  35.         movdqa [rdx],xmm0
  36.         movdqa [rdx+16],xmm1
  37.         add rcx,32
  38.         add rdx,32
  39.         dec  rax
  40.         jne @loopstart
  41.         add rcx,r10
  42.         add rdx,r10
  43.         dec  r9
  44.         jne @outer
  45.    @end:
  46. end;
  47.  

Quote
32bit to 16bit

I never used 16-bit color (I do use 16-bit monochrome), so no routines there.  Basically 16-bit is a kind of 5bit rgb, so basically it is a couple of shifts and adds. Though the example does mulls and divs with  power of 2-1.

Quote
and software mipmap generation.

No experience there.

Quote
But detecting some SSE extensions and using them would be nice.

Afaik unit "CPU" has some SSE etc detections.

Quote
May be there are some open image conversion libraries, where I can look for clues?

Many are similar to fcl-image, and are based on functions that convert one pixel, with a general looping over them.  For speed you need dedicated routines (pixel processing and looping integrated)

For C++ there is some more advanced stuff, by mixing templates and SSE based intrinsics. Still not the same as handcoded assembler in all cases, but *much* better.   

E.g. the simd  library, which you could try to compile to DLL (I did for some more advanced color conversion routines once).

 

TinyPortal © 2005-2018