Proper way to perform texture format conversion with hardware acceleration

Mr.Madguy

Hero Member
Posts: 844

Proper way to perform texture format conversion with hardware acceleration

« on: May 01, 2021, 12:23:06 pm »

I'm currently working to make retro game engine with backward compatibility with old graphic libraries. Currently the biggest problem is - not all texture formats may be supported by old libraries. Of course I can always perform simple software conversion. But isn't there some better way to do it? Something like using SSE? Because software conversion would slow down loading time too much. Some other complex solution would be needed to overcome this problem - like caching converted textures on disk. It's viable solution, but I want to avoid it.

Logged

Is it healthy for project not to have regular stable releases?
Just for fun: Code::Blocks, GCC 13 and DOS - is it possible?

marcov

Administrator
Hero Member
Posts: 11453
FPC developer.

Re: Proper way to perform texture format conversion with hardware acceleration

« Reply #1 on: May 01, 2021, 03:01:42 pm »

Quote from: Mr.Madguy on May 01, 2021, 12:23:06 pm

I'm currently working to make retro game engine with backward compatibility with old graphic libraries. Currently the biggest problem is - not all texture formats may be supported by old libraries. Of course I can always perform simple software conversion. But isn't there some better way to do it? Something like using SSE? Because software conversion would slow down loading time too much. Some other complex solution would be needed to overcome this problem - like caching converted textures on disk. It's viable solution, but I want to avoid it.

I assume this is usually solved in packaging, iow if you link to an old graphics library, you convert and put the converted textures in the installer?

SSE has some processor requirements too, and specially with retrogaming people often run on old iron. Know your audience... SSE2 is available on everything that is 64-bit capable, and SSE3 on nearly everything that is 64-bit capable (the very first "Hammer" athlon64's excluded).

Older CPUs are more strict with alignment though, so that might require aligning your buffers to 16-byte. (I align image buffers to 64-bit just in case).

I have some SSE2/3 code, but they mostly come from the camera format conversion side of things. (so 24-bit to 32-bit and vice versa), but they are often limited (usually alignment is no problem because the sensor dimensions are usually divisible by 16), so aligning the buffer itself is enough.

For opengl, you can do conversions using shaders.

« Last Edit: May 01, 2021, 03:03:54 pm by marcov »

Logged

Mr.Madguy

Hero Member
Posts: 844

Re: Proper way to perform texture format conversion with hardware acceleration

« Reply #2 on: May 01, 2021, 03:25:13 pm »

Overall idea is - me and users of my engine shouldn't bother about providing textures in different formats and mipmaps in data files in order to support some rare old hardware.

My motto is "Slow, but works in any case". I.e. of curse I can use "lowest common denominator" 32bit RGBA format, that is supported by everything, and not even bother about it. Because I won't support anything below Win2k anyway, so I don't think, that my engine would encounter some old hardware with 16bit support only. But I still want to make sure, that it will work in any conditions. So I need at least some basic texture format conversion features, like RGBA to BGRA, 32bit to 16bit and software mipmap generation. Yeah, it's easily done in software, but I guess, drivers use some hardware acceleration internally. I can't use shaders or something like that, because shader support would most likely mean more or less modern video card, so I wouldn't need conversion in a first place. But detecting some SSE extensions and using them would be nice.

May be there are some open image conversion libraries, where I can look for clues?

« Last Edit: May 01, 2021, 03:33:48 pm by Mr.Madguy »

Logged

Is it healthy for project not to have regular stable releases?
Just for fun: Code::Blocks, GCC 13 and DOS - is it possible?

marcov

Administrator
Hero Member
Posts: 11453
FPC developer.

Re: Proper way to perform texture format conversion with hardware acceleration

« Reply #3 on: May 01, 2021, 04:57:38 pm »

Quote from: Mr.Madguy on May 01, 2021, 03:25:13 pm

RGBA to BGRA,

I checked, and contrary to what I thought, I don't have this. Apparently I solved this in the shader. It is fairly simple though since only a change.

I quickly whipped up something I had based on older code, untested:

Usual caveats apply:

width divisable by 8px
and aligned buffer.
Assumes format (dimensions and topdown/bottomup) of src and dest are the same.

[li]Uses some parameter (rpendall) to be able to do both topdown and bottomup images.
[/li][/list]

Code: Pascal [Select][+]

 
const
  shufrgb :  array[0..15] of byte = (2,1,0,3,6,5,4,7,10,9,8,11,14,13,12,15);
 
procedure swaprgba(src,dest:pbbyte;countinner,countouter,rpendall:integer);assembler;
// src         rcx   source pointer
// dest        rdx   dest pointer
// countinner  r8d   width in pixels. Must be multiple of 8
// countouter  r9d   number of lines.
// rpendall    stack number of bytes to go from end of line to start of the next line
                       roughly scanline[i]-scanline[0]-width, can be 0
 
// local vars:
// rax: counter for inner loop.
// r10: loads rpendall
// xmm1:xmm0:  32 bytes worth of pixels ( 8 pixels)
// xmm2: load shufrgb
 
asm
        // load shufflemask to register.
        movdqa xmm2,[rip+shufrgb]
        // sign extend 32-bit values because it won't eat movzxd
        // the chance we do more than 2 billion iterations is low.
        movsxd  r10,rpenddest
        shr r8,3        // 32 bytes = 8 RGBA pixels. Divide pixels per line by 8
         //.pad 8, align loop target ?
  @outer:
        mov    rax,r8
  .align 16
  @loopstart:
        movdqa xmm0,[rcx]
        movdqa xmm1,[rcx+16]
        pshufb xmm0,xmm2   // shuffle bottleneck, but what can you do in such short procedure?
        pshufb xmm1,xmm2   //
        movdqa [rdx],xmm0
        movdqa [rdx+16],xmm1
        add rcx,32
        add rdx,32
        dec  rax
        jne @loopstart
        add rcx,r10
        add rdx,r10
        dec  r9
        jne @outer
   @end:
end;
 

Quote

32bit to 16bit

I never used 16-bit color (I do use 16-bit monochrome), so no routines there. Basically 16-bit is a kind of 5bit rgb, so basically it is a couple of shifts and adds. Though the example does mulls and divs with power of 2-1.

Quote

and software mipmap generation.

No experience there.

Quote

But detecting some SSE extensions and using them would be nice.

Afaik unit "CPU" has some SSE etc detections.

Quote

May be there are some open image conversion libraries, where I can look for clues?

Many are similar to fcl-image, and are based on functions that convert one pixel, with a general looping over them. For speed you need dedicated routines (pixel processing and looping integrated)

For C++ there is some more advanced stuff, by mixing templates and SSE based intrinsics. Still not the same as handcoded assembler in all cases, but *much* better.

E.g. the simd library, which you could try to compile to DLL (I did for some more advanced color conversion routines once).

Logged

Lazarus

Bookstore

Search

Recent

Author Topic: Proper way to perform texture format conversion with hardware acceleration (Read 1499 times)

Mr.Madguy

Proper way to perform texture format conversion with hardware acceleration

marcov

Re: Proper way to perform texture format conversion with hardware acceleration

Mr.Madguy

Re: Proper way to perform texture format conversion with hardware acceleration

marcov

Re: Proper way to perform texture format conversion with hardware acceleration

	Computer Math and Games in Pascal (preview)
	Lazarus Handbook