Recent

Author Topic: Array access time depends on size?  (Read 12783 times)

idog

  • Full Member
  • ***
  • Posts: 121
    • www.idogendel.com (Hebrew)
Array access time depends on size?
« on: July 25, 2010, 05:23:10 am »
Hi all,

Is it possible that the average access time to elements in a dynamic array (of integers) depends on the size of the array?

The indexes that I access are more or less random - not a sequential "pass" - and the array size varies between 100 to 2000 elements.

I can't find another explanation for the effects I get in my program, but still, one would assume a dynamic array would be just a big block of memory, where random access is done with simple pointer arithmetic... right?

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12201
  • FPC developer.
Re: Array access time depends on size?
« Reply #1 on: July 25, 2010, 08:16:02 am »


I can't find another explanation for the effects I get in my program, but still, one would assume a dynamic array would be just a big block of memory, where random access is done with simple pointer arithmetic... right?

There is a small effect, with small sizes (1,2,4) lookup can be done by a single instruction, and even higher values that are powers of two can be optimized with a shift instead of a mul.

I expect this to be hardly measurable though, it sounds more that you pass it around in a way that makes copies of the array. (dyn array to open array?)


idog

  • Full Member
  • ***
  • Posts: 121
    • www.idogendel.com (Hebrew)
Re: Array access time depends on size?
« Reply #2 on: July 25, 2010, 09:23:14 am »
it sounds more that you pass it around in a way that makes copies of the array. (dyn array to open array?)

Not at all. Basically, the relevant part of the program is a class that contains a N-by-N matrix of numbers, represented by a dymanic array of dynamic arrays of integer. I don't change, copy or cast anything - just access cells in this matrix. This is done about 500 times each round, times 10^6 rounds  (:D I'm measuring performance for some programming riddle, long story).

The number of cell accesses does not change with N, so you'd expect that a change of N from 500 to 2000 won't make a difference in performance, but it does - about 1.5 seconds added for the entire cycle.

Edited: I forgot to mention that I have two Random(int) commands for each round, for ranges 0 to N... could they possibly have that much influence?

Edited again: No, I tested Random() separately and there's no effect.
« Last Edit: July 25, 2010, 10:21:50 am by idog »

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 11164
  • Debugger - SynEdit - and more
    • wiki
Re: Array access time depends on size?
« Reply #3 on: July 25, 2010, 11:53:47 am »
1.5 seconds increase, on which overall time?

if your test runs several minutes, then 1.5 secs has no meaning
if your test runss 2 or 3.5 secs then : strange.

if it really is the array size, and not any other factor that was overlooked

Does it slowly increase? or does the time make a jump at a certain size of the array?
If your array exceeds the size of a cache line, then maybe the cpu needs to load more from memory or a lower cache ... (I am no expert on this, just an idea)

Alternative you can compile with -al and see if anything is in the assembler code

idog

  • Full Member
  • ***
  • Posts: 121
    • www.idogendel.com (Hebrew)
Re: Array access time depends on size?
« Reply #4 on: July 25, 2010, 12:07:24 pm »
maybe the cpu needs to load more from memory or a lower cache ... (I am no expert on this, just an idea)

Alternative you can compile with -al and see if anything is in the assembler code

I thought about the Cache too, especially when I limited the access range for the big arrays - then they behaved like small arrays...

Here's a stripped down sample code I made. In my (ancient) computer it takes about 10 seconds for the first, and 15 for the second:

Code: [Select]
program Project1;

{$mode objfpc}{$H+}

Uses
  SysUtils;

type
  TIntArray = array of Integer;

var
  a, b : array of TIntArray;
  j, i : Integer;

begin

  SetLength(a, 100);
  for j := 0 to 99 do SetLength(a[j], 100);
  SetLength(b, 1000);
  for j := 0 to 999 do SetLength(b[j], 1000);

  randomize;

  Writeln('a: ' + TimeToStr(Time));
  for j := 1 to 100000000 do Inc(i, a[Random(100)][Random(100)]);
  Writeln('b: ' + TimeToStr(Time));
  for j := 1 to 100000000 do Inc(i, b[Random(1000)][Random(1000)]);
  Writeln('Done: ' + TimeToStr(Time));

  Readln;

end.

Last time I touched Assembly was 15 years ago... I wonder if I can make anything of it now  :(

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 11164
  • Debugger - SynEdit - and more
    • wiki
Re: Array access time depends on size?
« Reply #5 on: July 25, 2010, 12:22:40 pm »
10 kb of data vs 1 MB

btw have you checked for page faults, maybe you app is swapped to disk?


what cpu do you have? e.g some P2 used to have (just googled)
# L1 cache: 16 + 16 KB (Data + Instructions)

You run a 100 million time, so you touch the same values again
« Last Edit: July 25, 2010, 12:28:45 pm by Martin_fr »

idog

  • Full Member
  • ***
  • Posts: 121
    • www.idogendel.com (Hebrew)
Re: Array access time depends on size?
« Reply #6 on: July 25, 2010, 12:48:02 pm »
10 kb of data vs 1 MB

It's an extreme example, the effect in the original program was for 250K vs. 4Mb - but I guess it could still be cache threshold issue.

btw have you checked for page faults, maybe you app is swapped to disk?

Not that I'm aware - there's plenty of free RAM.

what cpu do you have? e.g some P2 used to have (just googled)
# L1 cache: 16 + 16 KB (Data + Instructions)

Intel T5600 - L2 2Mb Cache.

You run a 100 million time, so you touch the same values again

I'm more and more inclined towards this explanation. Never occured to me to take Cache into account when looking for efficient algorithms  :)

eny

  • Hero Member
  • *****
  • Posts: 1646
Re: Array access time depends on size?
« Reply #7 on: July 25, 2010, 01:30:24 pm »
FYI: on my 3.1G dual core they both take 6 seconds.
All posts based on: Win10 (Win64); Lazarus 3_4  (x64) 25-05-2024 (unless specified otherwise...)

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 11164
  • Debugger - SynEdit - and more
    • wiki
Re: Array access time depends on size?
« Reply #8 on: July 25, 2010, 01:42:12 pm »
anyway, what to you try to solve?

idog

  • Full Member
  • ***
  • Posts: 121
    • www.idogendel.com (Hebrew)
Re: Array access time depends on size?
« Reply #9 on: July 25, 2010, 04:42:18 pm »
FYI: on my 3.1G dual core they both take 6 seconds.

Hmmm. Six exactly, or is the difference simply too small for this time format? Can you run the loop for, say, 400000000 instead of 100000000 and verify it's 24 seconds for both?

anyway, what to you try to solve?

It's a progamming challenge - you have a matrix of N-by-N pixels, 90% of them black and 10% white, scattered randomly. You also know in advance the value of K - a number significantly smaller than N. You may scan this matrix once beforehand to create a data structure of your choice, and your task is to find the fastest and most memory-efficient method to locate, over and over again, all the white pixels within randomly placed "frames" of K-by-K pixels. That is, your algorithm gets some [x, y] coordinate and must return a list of the white pixels in the range [x, y] to [x+K, y+K].

The requirements are a little vague, but that opens up a lot of possibilities. Not as trivial as it may look at first  :)

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 11164
  • Debugger - SynEdit - and more
    • wiki
Re: Array access time depends on size?
« Reply #10 on: July 25, 2010, 05:35:45 pm »
Then creating a full array as the lookup-data source, is very inefficient.

since, when you are given your K*K area, you must do a full scan of it. That is a very slow way to do it.

Btw, the speed of the algorithm isn't normally measured in seconds, but in the "big O" notation (http://en.wikipedia.org/wiki/Big_O_notation)

Scanning an area of k*k pixels, in your manner is O(k^2) => It doesn't depend on N.

There are specific data structures you can use to locate/index 2 dimensional data: For example you can store each white picel in an R-Tree (http://en.wikipedia.org/wiki/R-tree) then they can be retrieved at relative low cost.

Of course you have the cost of:
- look up (finding one or more pixels)
- insertion (insert each pixel during creation)


A maybe simpler algorithm is using sorted lists, of the following format:

- list 1: sorted list of all lines containing at least one white pixel

Doing a binary search on this you can O(Log linecount) find quickly all lines of interest.

- list 2 (one for each line): sorted x position of each white pixels.

Again using binary search to find the 1st pixel in the K*K block is quick

Since only 10% of the pixels are white, the lists should be small (smaller than the array you do at current)

If you want to trade memory against speed, you can do several, overlapping (overlapping at leas the maximum possible K amount of pixels) lists for lines.
Then you can choose one of the lsits, and it will have less lines, because it doesn't contain lines, that have white pixels outside the given x bounds.
(Again it takes more memory and more time to build / The higher memory isn't affecting the cache for the search, since in the search only one list is used)
if max k = 10 then you have
list 1: lines with white in x from 0 to 50
list 2: lines with white in x from 40 to 90
list 2: lines with white in x from 80 to 130
...

if you need to search the area from y=? and  60-70 => you sear4ch the 2nd list.




idog

  • Full Member
  • ***
  • Posts: 121
    • www.idogendel.com (Hebrew)
Re: Array access time depends on size?
« Reply #11 on: July 25, 2010, 05:58:53 pm »
Btw, the speed of the algorithm isn't normally measured in seconds, but in the "big O" notation

This problem was presented in a "real-life-oriented" context, where - as everyone knows - the O-notation is just a tiny part of the picture...  :)

Actually, my weekly programming column (in Hebrew) discussed this problem last week, and I wrote the actual code for this week's follow-up. I presented several different solutions, and my readers added a couple more. If you want extreme efficiency, where every millisecond and byte matter, there are very interesting approaches to try... such as this Cache thing!  ;D

But I'm not sure this is the correct place to write them all  :)

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12201
  • FPC developer.
Re: Array access time depends on size?
« Reply #12 on: July 28, 2010, 09:48:09 am »

The number of cell accesses does not change with N, so you'd expect that a change of N from 500 to 2000 won't make a difference in performance, but it does - about 1.5 seconds added for the entire cycle.

As many did before you, you found a fine way to measure L2 cache size   >:D >:D >:D

 

TinyPortal © 2005-2018