Recent

Author Topic: Rules of fast image processing  (Read 5108 times)

PeterBalch

  • New Member
  • *
  • Posts: 14
Rules of fast image processing
« on: June 17, 2019, 05:27:26 pm »
Hi

I'm just starting working with OpenCV on Paspberry Pi. I want to write my own image processing code rather than just re-using the stuff that's already in OpenCV.

I've been doing Delphi for decades on Windows and you soon learn that Images, Bitmaps, Canvases and the like are very, very slow. You should move the Bitmap data into, say, a string; do stuff fast on the raw bytes; then move it back to an Image to display it.

What are the corresponding rules in Lazarus/FreePascal/OpenCV?

I just wrote a simple Lazarus/OpenCV program that, in the OnIdle loop, grabbed an image from a pi camera, copied it into a bitmap then Assigned the bitmap to an Image to display it. I set the Image to Stretch=true so the image was scaled to fit on the form.

Very slow. Oddly, the frames rate was OK (53mS) but there was a latency of 500mS. That's weird. How can the latency be longer than the frame period? It must be pipelined. Where? And why do I have to copy the image into a bitmap then into the image? Why copy it twice, not just once?

Anyway, that's doing the re-sizing in Lazarus using Canvas.StretchBlt (I presume that's what Lazarus does).

So I then resized it in OpenCV using cvResize. and copied the result to an Image with Stretch=false. The result: no latency and a faster framerate (33mS).

Similarly, cvShowImage (with CV_WINDOW_NORMAL to force OpenCV to do the StretchBlt) is faster than Canvas.StretchBlt (33mS).  Using cvShowImage with CV_WINDOW_AUTOSIZE (so it's a big picture) is as fast (33mS).

Clearly, OpenCV is faster than Lazarus TCanvas but is it even faster do maths on the raw data in the TIplImage record? The raw data seems to be available in TIplImage.ImageData.

What are the tricks of the trade that long-time OpenCV programmers have learned?

Peter


marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11376
  • FPC developer.
Re: Rules of fast image processing
« Reply #1 on: June 17, 2019, 07:59:04 pm »
I'm just starting working with OpenCV on Paspberry Pi. I want to write my own image processing code rather than just re-using the stuff that's already in OpenCV.

That's good. OpenCV's routines are often quite general, and thus not very performing

Quote
I've been doing Delphi for decades on Windows and you soon learn that Images, Bitmaps, Canvases and the like are very, very slow.

Canvas level pixel access is slow, but there is nothing wrong with scanline[] use.

Quote
You should move the Bitmap data into, say, a string; do stuff fast on the raw bytes; then move it back to an Image to display it.

TBitmap already allows that via scanline.  If you save the difference between e.g. scanline[1] and scanline[0] you can go vertical by adding that difference.

Quote
What are the corresponding rules in Lazarus/FreePascal/OpenCV?

Afaik the lazarus tbitmap has some binary possibilities. The rules are the same. Find some function to get a pointer to e.g. a line, and then simply use pointer math.

Quote
I just wrote a simple Lazarus/OpenCV program that, in the OnIdle loop, grabbed an image from a pi camera, copied it into a bitmap then Assigned the bitmap to an Image to display it. I set the Image to Stretch=true so the image was scaled to fit on the form.

I don't know the pi camera (I use industrial cameras), but try to feed the camera your own buffers. Saves a copy.


Quote
Very slow. Oddly, the frames rate was OK (53mS) but there was a latency of 500mS. That's weird. How can the latency be longer than the frame period? It must be pipelined. Where?

Best bet is the camera driver/api. The rpi stuff is often written for convenience, not performance. Also make sure you have a camera actually capable of delivering fotos (rather than video streams). Avoids compression and processing.

Sometimes these things are configurable in the driver.

Quote
So I then resized it in OpenCV using cvResize. and copied the result to an Image with Stretch=false. The result: no latency and a faster framerate (33mS).

Avoid the resize. It is visual only, and for that better use some hardware scale (e.g. using opengl).

Apropos display, don't overdo it, more than 5fps is usually pointless, too quick to see fine details.

Quote
What are the tricks of the trade that long-time OpenCV programmers have learned?

To avoid it, and make your own?

Some of my older (Windows) opengl display code is here : https://forum.lazarus.freepascal.org/index.php/topic,30556.msg194627.html#msg194627
« Last Edit: June 18, 2019, 09:41:49 am by marcov »

jamie

  • Hero Member
  • *****
  • Posts: 6088
Re: Rules of fast image processing
« Reply #2 on: June 17, 2019, 11:30:53 pm »
use the RAWIMAGE of the bitmap, it is a direct access to the image bits.

I do live video capture using the AVI interface on windows and send the buffer directly to a
RAWIMAGE block of a Tbitmap… It works flawlessly except for image orientation issues.
The only true wisdom is knowing you know nothing

PeterBalch

  • New Member
  • *
  • Posts: 14
Re: Rules of fast image processing
« Reply #3 on: June 18, 2019, 12:56:40 pm »
Quote
Canvas level pixel access is slow, but there is nothing wrong with scanline[] use.

Well, in theory yes there is. Microsoft gave all sorts of warnings about scanlines not being supported and not being guaranteed to work in future releases. The format of bitmaps might change. You have to use pf24bit which isn't fully supported, etc.

OTOH, I've been using exactly the same "unsupported" code since XP and it still works on 64-bit Windows 10.

Quote
I don't know the pi camera (I use industrial cameras), but try to feed the camera your own buffers. Saves a copy.

Thanks. I'll look into that. I've no idea how to get direct access to the camera.

I've read elsewhere that the OpenCV camera API is slower than doing it some other way but I've yet to find out what the "other way" is.

Quote
Quote
the frame rate was OK (53mS) but there was a latency of 500mS.
Best bet is the camera driver/api.

I don't see that. If that were the case then the frame rate and latency would be the same whether I used Bitmaps or OpenCV. It must be something that happens after the images have got inside my exe.

It's the latency that's so puzzling. Frames trickle though the system each 53mS but appear on the screen 500mS later. That means there must be ten frames queued up somewhere.

Which implies pipelining. (If some process took 500mS per frame then the frame rate would be 2 fps.)

A pipeline with 10 stages each holding one frame? After all, why would a stage hold more than one frame in a queue? Where are the ten stages?

The ARM has 4 cores so that's max 4 stages to the pipeline.

And why does the pipelining only happen with TBitmap, not pIplImage?

Quote
Afaik the lazarus tbitmap has some binary possibilities.

Thanks. How?

Should I use TRawImage.data that's inside Bitmap.RawImage or TLazIntfImage or TIplImage.ImageData?

Quote
Avoid the resize. It is visual only, and for that better use some hardware scale (e.g. using opengl).

Are you talking about cvResize or making a big image display small on a form (with CV_WINDOW_NORMAL). Is cvResize slow?

Your warning that stuff is often written for convenience, not performance is useful. I've just been reading how OpenCV has got slower with later versions. Also that Mat is very slow and RawImage is a wrapper for Mat.

Quote
use the RAWIMAGE of the bitmap, it is a direct access to the image bits.

So much to learn!

Peter


SymbolicFrank

  • Hero Member
  • *****
  • Posts: 1313
Re: Rules of fast image processing
« Reply #4 on: June 18, 2019, 01:45:06 pm »
Video takes lots of bandwidth and memory, so each step in the process tries to improve that by compressing the data. Which is exactly what you don't want: it takes time (latency) and all those processing is "lossy": it throws away as much data as possible.

So, see if you can get the raw image data from the camera and dump it into a buffer somewhere. If that's not possible (because the USB connection doen't have the bandwidth, for example), see if you can get the camera to deliver a high-quality jpeg stream.

Edit: actually, most of the things you would want to do with OpenCV are also done by the video compression algorithms (look for the differences between the frames), but to extract that you have to analyze the video stream, which is quite hard to do.
« Last Edit: June 18, 2019, 01:50:53 pm by SymbolicFrank »

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Rules of fast image processing
« Reply #5 on: June 18, 2019, 02:34:37 pm »
Is OnIdle a suitable place for image processing?
You have 4 cores, shouldn't you be using threads?

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11376
  • FPC developer.
Re: Rules of fast image processing
« Reply #6 on: June 18, 2019, 03:35:05 pm »
Quote
Canvas level pixel access is slow, but there is nothing wrong with scanline[] use.

Well, in theory yes there is. Microsoft gave all sorts of warnings about scanlines not being supported and not being guaranteed to work in future releases.

Yeah. Moreover, bitmaps might have resources attached to them which you might not want.

The latter is the reason I moved from tbitmap and tmetafile to own formats (see url already listed).

But are you really gearing up for very long term support? The next year a new RPI might come out base on a new SOC, and the old piface might not even be supported anymore.

Quote
Quote
I don't know the pi camera (I use industrial cameras), but try to feed the camera your own buffers. Saves a copy.

Thanks. I'll look into that. I've no idea how to get direct access to the camera.

How do you get your images? Some API I assume. Start reading docs ;_)

Quote
It's the latency that's so puzzling. Frames trickle though the system each 53mS but appear on the screen 500mS later. That means there must be ten frames queued up somewhere.

Which implies pipelining. (If some process took 500mS per frame then the frame rate would be 2 fps.)

Lazarus doesn't contain image pipelines for GUI, so it must be in the added parts somewhere. The only thing I can imagine is using the windows messages like system, and then not polling often enough. But on a quad core rpi that shouldn't be that big a deal?

Quote
Quote
Afaik the lazarus tbitmap has some binary possibilities.

Thanks. How?

I've looked at tlazintfimage and that had something. But since then I moved to an own image format, and the whole imageprocessing etc is separate from LCL types.

Quote
Are you talking about cvResize or making a big image display small on a form (with CV_WINDOW_NORMAL). Is cvResize slow?

Any resizing without the GPU is slow. In theory GTK or QT or whatever you use could do it using some optimized way if you pass it whole bitmaps. But I would doublecheck :-)

Quote
Your warning that stuff is often written for convenience, not performance is useful. I've just been reading how OpenCV has got slower with later versions. Also that Mat is very slow and RawImage is a wrapper for Mat.

Convenience, and too many usecases. 

Some random tips:

  • Try to keep the main components (camera - processing - display) apart, don't use integrated suites that pretend to do it all. This makes it easier to isolate problems (like your queuing problem)
  • Use a thread for camera acquisition. Usually you can just block on the next image, and maybe do copy or rotates.
  • Try also to use a thread for processing, leaving the mainthread for gui
  • As said consider doing hardware accelerated display in time, for hardware pan,zoom, scale and rotate


PeterBalch

  • New Member
  • *
  • Posts: 14
Re: Rules of fast image processing
« Reply #7 on: June 18, 2019, 03:45:04 pm »
Quote
So, see if you can get the raw image data from the camera and dump it into a buffer somewhere.

I think that's what I'm doing but I don't really know.

I'm using cvQueryFrame which is in OpenCV. I think it's taking a single photo rather than grabbing a passing frame from a video stream. It's a Pi Camera rather than a webcam. (I presume a Pi Camera is "better" than a webcam but I have no data to back that up. I'm starting from zero knowledge. It's got a ribbon cable - which in my imagination is better than a USB cable.)

Quote
Is OnIdle a suitable place for image processing?
You have 4 cores, shouldn't you be using threads?

That sounds like good advice. If one is using cores as a pipeline then you have to balance the load in both threads. If the part of the algorithm in one thread takes 90mS and the other thread takes 5mS then you haven't achieved much. Or are you suggesting that, say, alternate frames go to alternate cores?

And aren't the cores already in use by the Pi? If I were writing a Pi OS I'd be tempted to use one or two of the cores to act as a GPU and maybe just leave one for user apps.

I've no idea how threads match onto cores. Does Pi Lazarus allow processor affinity? It's no use creating lots of threads if the OS assigns them all to one core. Googling hasn't told me so far.

Do I create a "run once" thread for each frame or do I have, e.g. three "permanent" threads on three cores each grabbing and processing frames as fast as it can?

As each thread finishes processing a frame, it has to synchronise with the main program to ensure that the processed frames are used in the correct order - preferably evenly spaced.

I can see that multi-threading is attractive but it seems like a whole new can of worms.

Are there any examples around?

Peter

SymbolicFrank

  • Hero Member
  • *****
  • Posts: 1313
Re: Rules of fast image processing
« Reply #8 on: June 18, 2019, 03:57:34 pm »
For the camera, you need an API, something like: https://www.uco.es/investiga/grupos/ava/node/40. There's an even better one that uses Python, but probably not in Free Pascal.

If you haven't got much experience with threads, use an endless loop, and call "Application.ProcessMessages" regularly.

PeterBalch

  • New Member
  • *
  • Posts: 14
Re: Rules of fast image processing
« Reply #9 on: June 18, 2019, 04:21:23 pm »
Quote
But are you really gearing up for very long term support?
The next year a new RPI might come out

That's a very good question. Right now I'm a noob on Pi and Lazarus and I feel I ought to just learn. So it will be a "personal" project - achievable but hard enough to force me to learn stuff.

Long term support? I tend to think of a Pi or an Arduino as an embedded board that runs just one program for years then goes in the bin. It's not like a Windows program where clients phone me up twenty years later. If a new Pi comes out, you ignore it and go on making a product around the old Pi (That's what I like about PICs - thirty year old chips are still available.)

Quote
How do you get your images? Some API I assume. Start reading docs ;_)

It's a Pi Cam and I'm getting the images via OpenCV. I've no idea whether that's good or bad. "Start reading docs"! Hah! Docs for OpenCV?

Quote
Afaik the lazarus tbitmap has some binary possibilities.
I've looked at tlazintfimage and that had something.

I had a quick look at your code. It seemed rather Windows specific. Maybe I should look more closely.

I've just started writing code to process Bitmap.RawData (which seems similar to tlazintfimage). I'll see how it goes.

Quote
Any resizing without the GPU is slow.

Sure but it's useful to know tips like Bitmap.Canvas.StretchBlt is much slower than cvResize.

Quote
to keep the main components (camera - processing - display) apart,
a thread for camera acquisition.

Interesting. So you're suggesting a thread for each stage - so it's a pipeline. Rather than several parallel threads each doing all the processing for a single frame.

Quote
leaving the mainthread for gui

Makes sense. There are all those rules in Delphi about what's "thread safe" - most gui isn't.


engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Rules of fast image processing
« Reply #10 on: June 18, 2019, 05:58:03 pm »
Which Pi are you using?

avra

  • Hero Member
  • *****
  • Posts: 2514
    • Additional info
ct2laz - Conversion between Lazarus and CodeTyphon
bithelpers - Bit manipulation for standard types
pasettimino - Siemens S7 PLC lib

lucamar

  • Hero Member
  • *****
  • Posts: 4219
Re: Rules of fast image processing
« Reply #12 on: June 18, 2019, 11:33:34 pm »
"Start reading docs"! Hah! Docs for OpenCV?

Well, I don't know much about it but a search finds quite a lot of tutorials and explanations, like for example OpenCV's own docs, a tutorial at Tutorials Point, a whole website dedicated to OpenCV Learning (Python/C++), a lots more.

Whether they are useful or not ... that's quite another thing :)
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus/FPC 2.0.8/3.0.4 & 2.0.12/3.2.0 - 32/64 bits on:
(K|L|X)Ubuntu 12..18, Windows XP, 7, 10 and various DOSes.

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Rules of fast image processing
« Reply #13 on: June 19, 2019, 01:45:26 pm »
And aren't the cores already in use by the Pi? If I were writing a Pi OS I'd be tempted to use one or two of the cores to act as a GPU and maybe just leave one for user apps.

You have some reading to do. I hope you enjoy starting with this. Meanwhile I would like to correct your idea about the GPU:
Quote
The CPU doesn’t talk to the camera directly. In fact, none of the camera processing occurs on the CPU (running Linux) at all. Instead, it is done on the Pi’s GPU (VideoCore IV) which is running its own real-time OS (VCOS).

PeterBalch

  • New Member
  • *
  • Posts: 14
Re: Rules of fast image processing
« Reply #14 on: June 19, 2019, 03:58:45 pm »
Quote
Which Pi are you using?

Pi 3B+ connected to a proper monitor. I've also got a Pi 2 connected to a 7" screen. They both have cameras attached.

Quote
You might be interested in some of these libraries:
https://github.com/graphics32/graphics32

Maybe that's a better route. I think that's what I'm asking for advice on - which route?

The message I'm getting seems to be that I have to use Bitmap.Canvas at some point because that's the native Lazarus format. But Bitmap.Canvas operations are slow.

To do stuff faster, I should either:

1. write my own code to operate on Bitmap.RawData.

2. convert the Bitmap into OpenCV pIplImage then use OpenCV to do things with the pIplImage and also write my own code to directly work with pIplImage.ImageData.

3. convert the Bitmap into graphics32 format - whatever that is - and write my own code to directly work with graphics32 raw data

But I have to choose one of them. I shouldn't mix them. I shouldn't, for instance, do some processing with OpenCV then convert to Bitmap to run my own code then go back to OpenCV to do more processing there.

Once I choose a format, I should stick to it.

I've worked in Computer Vision and AI and OpenCV seems to be the place people go to for all the old favourites.

So I think I should choose OpenCV. Is that sensible?

Quote
https://asphyre.net/products/pxl
http://packages.lazarus-ide.org/pl_OpenGLES.zip

Interesting.

They both look good for rendering and games - which may be of interest in the future. Cross-platform is nice in theory but I don't need it.

Can I mix-and-match with, say, OpenCV?

PXL has a lot of very useful non-graphics stuff. Thanks.

Quote
https://wiki.freepascal.org/BGRABitmap

Same question: Can I mix-and-match?

I feel like I have to choose a path and stick to it. Maybe I'm wrong.

And aren't Bitmaps already 4-bytes per pixel? What's the 4th byte if it's not Alpha?

Bitmap.RawImage.Description.BitsPerPixel = 32

So what does BGRABitmap do that TBitmap doesn't? It has effects that aren't available with a standard canvas. I don't see why a new data type is needed. Does the Alpha channel of TBitmap not work?

Quote
https://wiki.freepascal.org/Fast_direct_pixel_access

That's a useful summary. It would have saved me some time as I spent yesterday working it all out for myself.

I was writing code to do a 3x3 convolution using TBitmap.Rawdata with pointer arithmetic; or treating TBitmap.Rawdata as a big 'array of byte' and using indexing; or using OpenCV cvSmooth; or using cvFilter2D; or using pIplImage.ImageData.

The results aren't all in yet but so far, the good news is that they're all run at similar speed. Lazarus is a few percent slower than OpenCV. Perhaps the gcc compiler produces better code than FreePascal or perhaps my code could be tighter.

As I say, I feel like I have to choose a path and stick to it.

Peter

 

TinyPortal © 2005-2018