Recent

Author Topic: How many lines is too many lines ?  (Read 6062 times)

440bx

  • Hero Member
  • *****
  • Posts: 6356
How many lines is too many lines ?
« on: March 06, 2026, 12:14:55 pm »
Hello,

There is a recent post about the compiler giving an error when attempting to compile a 50K line function/procedure.

It seems there is a consensus that a 50K line function is "wrong" or "bad programming" or "improper" (one way or another, simply undesirable.)

The interesting part is that claims that a function/procedure is too long are relatively often made but, what is extremely rare, actually non-existent so far, are a solid set of reasons that justify the claim.

Here are my questions:

1. How many lines does it take for a function/procedure to be considered too long ?
2. Why is it too long ?  or IOW, why is it objectionable to have a function exceed that number of lines simply be summarily declared as "too long" ?  What made it too long ? just the number of lines ?
3. Can a function/procedure be too short ? and whatever the answer is, how is it justified ? (IOW, how is that answer justified ?)

Enlighten me please... thank you :)
FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12771
  • FPC developer.
Re: How many lines is too many lines ?
« Reply #1 on: March 06, 2026, 12:24:44 pm »
In general it is lack of overview.  In companies where code must occasionally be reviewed/fixed by other people unwieldy procedures are a problem.

Quite often even the person that wrote them, doesn't have a complete overview, as he wrote and extended it over a long period of time. Breaking it up is not just about getting the line count down, but also to introduce abstractions that enable people to understand the procedure. Quite often this is the reason why they push for stretching the compiler limits, to not have to do a deep dive into their own code.

I assume this large procedure in assembler is some case statements over mnemonics that just grew as the instruction set did. In that sense it would be different than the usual cases that are typically auto generated.
« Last Edit: March 06, 2026, 01:22:39 pm by marcov »

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 12296
  • Debugger - SynEdit - and more
    • wiki
Re: How many lines is too many lines ?
« Reply #2 on: March 06, 2026, 12:52:54 pm »
"case" statements are probably an interesting example.

Look at (upstream or LazEdit, not sure about the one in the RTL)
Code: Pascal  [Select][+][-]
  1. function TRegExpr.MatchPrim(prog: PRegExprChar): Boolean;

It has about 1000 lines. That is still for from the 50k. But 1000 is way over the size that most other procedures should IMHO have.

It is a big case, that acts as interpreter on a large set of instruction forming the reg-ex. So as the engine learns more specialized (optimized sub-version of common cases) commands, the case grows, and each case block has some code.

Well, yes you could move the code in each case block into a subroutine (and handover any local vars as "var param").
You could even combine some commands, and have a subroutine with a nested case, doing the final choice....

But those subroutines should then be inlined, which apart from inlining bugs in the compiler, limits the amount of nested inlines of other functions. So that isn't good either.

And not inlining them, would grow the stack faster, limiting (potentially by half) the amount of recursions the regex engine can run in its search. But being able to run deep recursions is really really important.

So then the engine would need a rewrite, to be able to use the heap, but that needs delicate mem managment, because doing normal allocs could majorly slow down the engine. And for regex speed does also matter.

440bx

  • Hero Member
  • *****
  • Posts: 6356
Re: How many lines is too many lines ?
« Reply #3 on: March 06, 2026, 01:23:08 pm »
It has about 1000 lines. That is still for from the 50k. But 1000 is way over the size that most other procedures should IMHO have.
But, I get the impression you were justifying its being 1000 lines by enumerating the downsides of different implementations.  Did I get the wrong impression ?

I'm curious, why is it that you claim that 1000 lines is over the size most procedures should have ?   Why is it that a 1000 line procedure is, in your humble opinion, too long ?
FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12771
  • FPC developer.
Re: How many lines is too many lines ?
« Reply #4 on: March 06, 2026, 01:26:20 pm »
But those subroutines should then be inlined, which apart from inlining bugs in the compiler, limits the amount of nested inlines of other functions. So that isn't good either.

If some cases are much more likely then others, it can improve performance to move the code executed on less likely cases to a non inline procedure. Because it increases locality for the hot track cases.


marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12771
  • FPC developer.
Re: How many lines is too many lines ?
« Reply #5 on: March 06, 2026, 01:28:49 pm »
I'm curious, why is it that you claim that 1000 lines is over the size most procedures should have ?   Why is it that a 1000 line procedure is, in your humble opinion, too long ?

440bx: this are just rules of thumb uses in organisations, to enable other people to quicker understand the code. There are no absolutes, and exceptions are sometimes made. E.g. a large case statement is a form of structuring and easier to understand than heaps of normal code.

But people often think stowing everything in one procedure is always faster, which is not always the case.

440bx

  • Hero Member
  • *****
  • Posts: 6356
Re: How many lines is too many lines ?
« Reply #6 on: March 06, 2026, 01:53:34 pm »
440bx: this are just rules of thumb uses in organisations, to enable other people to quicker understand the code. There are no absolutes, and exceptions are sometimes made. E.g. a large case statement is a form of structuring and easier to understand than heaps of normal code.

But people often think stowing everything in one procedure is always faster, which is not always the case.
That sounds reasonable but, the thing I've consistently noticed are claims of a function or procedure being too long but without any solid foundation given to the claim. 

I have written functions/procedures that most programmers (likely 99%+) would consider "too long" and speed was _never_, not even once, one of the reasons it was "long".   What I'm saying is that, I don't get the impression that execution speed is a common reason to have long functions/procedures.

So far, I get the impression that a "long" function/procedure might be acceptable as a result of a case statement.

Is there another case that would justify a 1000+ lines function/procedure ?

FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12771
  • FPC developer.
Re: How many lines is too many lines ?
« Reply #7 on: March 06, 2026, 02:16:31 pm »
That sounds reasonable but, the thing I've consistently noticed are claims of a function or procedure being too long but without any solid foundation given to the claim. 

Line count is an arbitrary metric anyway. Empty lines, comments etc etc. It is all just rule of thumb.

Quote
I have written functions/procedures that most programmers (likely 99%+) would consider "too long" and speed was _never_, not even once, one of the reasons it was "long".   What I'm saying is that, I don't get the impression that execution speed is a common reason to have long functions/procedures.

Well, it is a common argument if people are told the virtual register limit won't be increased. And keep in mind that it is chiefly for the benefit of others, and more a threshold for yourself to reconsider if what you are doing is smart long term.

Quote
So far, I get the impression that a "long" function/procedure might be acceptable as a result of a case statement.

It also depends on your colleague's tastes.

Quote
Is there another case that would justify a 1000+ lines function/procedure ?

Yes. A permission from Niklaus Wirth, written in blood or a bughunting check from Donald Knuth.

440bx

  • Hero Member
  • *****
  • Posts: 6356
Re: How many lines is too many lines ?
« Reply #8 on: March 06, 2026, 02:36:08 pm »
Line count is an arbitrary metric anyway. Empty lines, comments etc etc. It is all just rule of thumb.
Just to make sure I am understanding what you're saying correctly... suppose a function that is 5000 lines of _code_ (excluding blank lines, comments and, any other non-executable lines) and, is _not_ a "case" statement, would _you_ consider that function to be "too long" ?

If your answer(s) depends on some conditions, please state the conditions after the answer(s).

My objective is to find out if it is possible to enumerate a set of clear and explicit conditions that would fully justify an answer whatever that answer may be.

FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12771
  • FPC developer.
Re: How many lines is too many lines ?
« Reply #9 on: March 06, 2026, 02:47:49 pm »
Line count is an arbitrary metric anyway. Empty lines, comments etc etc. It is all just rule of thumb.
Just to make sure I am understanding what you're saying correctly... suppose a function that is 5000 lines of _code_ (excluding blank lines, comments and, any other non-executable lines) and, is _not_ a "case" statement, would _you_ consider that function to be "too long" ?

Typically yes. Way too long even.

Quote
If your answer(s) depends on some conditions, please state the conditions after the answer(s).

I would do a quick view if I could quickly understand the procedure. If it has some internal structuring (like the case statement) that could work despite the length.

Quote
My objective is to find out if it is possible to enumerate a set of clear and explicit conditions that would fully justify an answer whatever that answer may be.

There isn't. It is not like a law book, where you can skirt conditions. It is more your own gut feeling what your co-workers, or anybody else that will have to work with your code, will put up with.

In my experience,  older programmers usually learned this by experience and tend to allow for more exceptions. Younger people learned it from books and just blindly apply the rules without much reasoning. If you see GoF book, then run!


MarkMLl

  • Hero Member
  • *****
  • Posts: 8563
Re: How many lines is too many lines ?
« Reply #10 on: March 06, 2026, 02:57:21 pm »
"case" statements are probably an interesting example.

Particularly since the code in one particular section might want to break, continue, exit the function with a result, or mess around with exceptions. Let's leave goto out of this, but on occasion I find it useful inside a case (e.g. handling 3, 2 or 1 parameters from a command line).

As regards the size: let's whimsically assume a handful of punched cards marked with the traditional diagonal swipe (in case they're dropped) is as thick as the height of the card (3.25", they were, after all, an American idea) using 7 thou stock, that's 464 cards. That's outrageously arbitrary- as is the 80 character line length assumed as a desirable limit for many years.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 12296
  • Debugger - SynEdit - and more
    • wiki
Re: How many lines is too many lines ?
« Reply #11 on: March 06, 2026, 03:09:48 pm »
I'm curious, why is it that you claim that 1000 lines is over the size most procedures should have ?   Why is it that a 1000 line procedure is, in your humble opinion, too long ?

I was wondering, how to read your original question: I.e. technical vs philosophical limits.

Note, that I am not drawing the line at 1000 lines. Actually I don't draw a line at all. Hence I said "most".

IMHO, "one procedure" = "one purpose" (aim, task, ...). Of course, the question is how to select the "purpose" boundaries. And that is the outer, and inner (i.e. nested, but other tasks, even if required by the outer).

And there it simply is my observation that procedure seldom get that long. My experience simple is, that if the do, then the embed to many sub-tasks, that better would be in their own procedure. Even if the procedure is only called from the one place. A well named procedure call is much easier to read that embedded code, and the bigger picture will be easier to understand.

But that does not limit the overall size.

440bx

  • Hero Member
  • *****
  • Posts: 6356
Re: How many lines is too many lines ?
« Reply #12 on: March 06, 2026, 03:43:50 pm »
I was wondering, how to read your original question: I.e. technical vs philosophical limits.
Consider it philosophical but, I believe the philosophy should rest on solid technical reasons.  Just in case, "a solid technical reason" is _not_ a limit imposed by a compiler's implementation.  IOW, that a compiler cannot handle a function/procedure that has more than X lines is _not_ a valid reason to limit the number of lines in a function or procedure (it's an excellent reason to improve the compiler.)

IMHO, "one procedure" = "one purpose" (aim, task, ...). Of course, the question is how to select the "purpose" boundaries. And that is the outer, and inner (i.e. nested, but other tasks, even if required by the outer).
I find that to be quite logical and sensible. 

And there it simply is my observation that procedure seldom get that long. My experience simple is, that if the do, then the embed to many sub-tasks, that better would be in their own procedure. Even if the procedure is only called from the one place. A well named procedure call is much easier to read that embedded code, and the bigger picture will be easier to understand.
Lot's of interesting points there that lead to some questions:

1. What's easier ?  a) to read 1000 lines of code in a single block or b) read 20 out of line functions/procedures that implement that one "macro" function ?

2. Isn't the fact that the programmer's attention has to jump 20 times (or at least to 20 different places) to the "sub-functions" a form of spaghetti code ?

3. Isn't the fact that the procedures being called from only one place is highly misleading ? after all, functions/procedures can easily be interpreted as being code that is shared among other functions/procedures, the fact that they aren't shared is misleading and makes program maintenance harder (the programmer has to establish that the function/procedure is executed from only 1 place in order to ensure that any changes to it don't affect other code, this task would not be necessary if the code was not in a separate function/procedure. Note: nesting does _not_ solve that problem.)

4. doesn't the fact that the programmer has to remember whether or not each of those functions or procedures cause side effects make maintenance and understanding more difficult ?

5. if the programmer forgets some detail while reading the "macro"/"composite" function, isn't the fact that he/she now has to jump back to some function/procedure to remember the detail in addition to having to (hopefully) remember which function/procedure has the detail in question, make the maintenance and understanding of the whole harder ?  Isn't this one of typical problems associated with spaghetti code ?

FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 12296
  • Debugger - SynEdit - and more
    • wiki
Re: How many lines is too many lines ?
« Reply #13 on: March 06, 2026, 04:57:47 pm »
Upfront, about any examples: Yes, bad examples can be found in many places. But bad examples of all in one proc code can also be found. You can do bad stuff with any coding paradigm. That isn't a flaw of the paradigm.

And, in case you say some are more likely to be gotten wrong. Well, some may require more study how to do them. Doesn't make the paradigm less good. It may even be better. (just not for someone who does their first hello world with it).


Lot's of interesting points there that lead to some questions:

1. What's easier ?  a) to read 1000 lines of code in a single block or b) read 20 out of line functions/procedures that implement that one "macro" function ?

2. Isn't the fact that the programmer's attention has to jump 20 times (or at least to 20 different places) to the "sub-functions" a form of spaghetti code ?
The first reads a bit like a "trick question"...

But I actually already answered it:
A well named procedure call is much easier to read that embedded code,

If you find code that says
Code: Pascal  [Select][+][-]
  1. MemberList.SortByKey(skLastName);
Do you really need to jump there and read that code?

If I program an Engine, I can embed the code to measure the position, to open the fuel valves, to ignite the spark-plugs... Or I can write
Code: Pascal  [Select][+][-]
  1. If GetPosition(...)= CONST_TRIGGER_VALVE then OpenFuellValve
  2. else
  3. If GetPosition(...)= CONST_TRIGGER_SPARK then Ignite;

Sure, if (which is likely not that often) I need to read how to do those subtasks, then I need to jump around. But on the plus site, if I just need to work on the big picture, that is so much easier to read and work with.

And, well, if there is a bug somewhere, and I don't know in which routine? (Ignoring test cases, and test-ability improvements by having some code extracted), I can on the first run step over each function and check the result, until I find the routine that returns wrong.

So, having stuff in functions is much easier to handle. Even for someone who doesn't know the code, as they can decide which blocks to read and which not, simple by the information that the function name itself gives them.

And I don't see this in any way fulfilling the criteria for Spaghetti code. If anything it adds structure, hence reduces the spaghetti factor. It is also likely to reduce nested conditions, loops and sets clear bounds for conditional code flow, which also reduces the spaghetti factor.

Quote
3. Isn't the fact that the procedures being called from only one place is highly misleading ? after all, functions/procedures can easily be interpreted as being code that is shared among other functions/procedures,
They can, a lot of thinks can be seen in specific ways, and in others.
That you can see something one way, doesn't mean that is the essence of it.

Its a named block of code (with certain other details about it...).

That it can be re-used is one of its qualities, and one that is very often used. But not exclusive.

Encapsulation can also be a reason. You can have local vars in that procedure (including static writeable const) that can not be seen from anywhere else.

And just naming a block of code, and expressing that it does a task that is distinct enough to qualify for being separated, is a good reason too. It gives very valuable info to the reader.

Quote
the fact that they aren't shared is misleading and makes program maintenance harder (the programmer has to establish that the function/procedure is executed from only 1 place in order to ensure that any changes to it don't affect other code, this task would not be necessary if the code was not in a separate function/procedure. Note: nesting does _not_ solve that problem.)
What problem? I don't see it misleading.

Also there has to be differentiated between
1) it is only called from one place (just because nothing else needs to call it.
2) it must be called from no other place

I never said that such code may be of the 2nd case.
And in case of 1, its just a what it is, if the reader thinks its called from other places too, then there is no harm at all. None, nada, zero.

But talking about such code (which is new, which is not my original suggestion)

There is danger too, if such code is inlined. It can be copy/pasted. Among other things. Adding a comment to protect it is hard, because you need a marker were that code block ends, and that marker must not accidentally move if surrounding code is edited.

A highly scoped nested proc, with the same comment, and a name that makes it very clear not to re-use it elsewhere => that can be a better approach. But that is very much case by case....


Quote
4. doesn't the fact that the programmer has to remember whether or not each of those functions or procedures cause side effects make maintenance and understanding more difficult ?
Again, a new implication, that wasn't in my original description

But fair enough, such code may have side effects, and I would still outline it.

In the above example "OpenFuelValve" would have side effects, and it would be very clear what they are. And if the code does exactly what the name says, then there is nothing misleading, nothing to remember,....

In fact "OpenFuelValve" is much better than 20 lines of code, containing (somewhere in those 20 lines) API calls such as
Code: Pascal  [Select][+][-]
  1. vh := GetValveHandle;
  2. SendCommand(vh, CmdChange, GetCurrentOffset + value_open)
And all that code would be surrounded by other code doing other stuff, leaving it completely unclear what it does, until you read the "fine print".
Sure you could have a big comment in front of it. But where does the code end, that is covered by the comment? A function call is much clearer.


Quote
5. if the programmer forgets some detail while reading the "macro"/"composite" function, isn't the fact that he/she now has to jump back to some function/procedure to remember the detail in addition to having to (hopefully) remember which function/procedure has the detail in question, make the maintenance and understanding of the whole harder ?  Isn't this one of typical problems associated with spaghetti code ?
How does having to read copious amount of extra lines every time you need the bigger picture solve that. Keeping the code inlined just increases the likelihood to overlook parts of it.

Also again, they should be appropriately named. They should just do one thing and that is in the name and hence no way to forget it.



Warfley

  • Hero Member
  • *****
  • Posts: 2050
Re: How many lines is too many lines ?
« Reply #14 on: March 06, 2026, 06:48:11 pm »
1. How many lines does it take for a function/procedure to be considered too long ?
If it does more than one thing and it does make sense to logically compartmentalize them. E.g. if you have a function that should load a file, parse the contents and initialize a GUI based on that contents, it should be at least two functions (file handling, and UI handling), maybe even three (file reading, deserialization, UI handling).

Those can be arbitrarily long. E.g. if you have a UI with 100 components, you may need to write hundreds of lines just for that, but of course it doesn't make sense to break up after an arbitrary number of components just because you reached a limit. Similarly, if you (de-)serialize a datastructure with dozens of fields, it may not make sense to break up the function into first deserializing the first half and then the second.

2. Why is it too long ?  or IOW, why is it objectionable to have a function exceed that number of lines simply be summarily declared as "too long" ?  What made it too long ? just the number of lines ?
A good indicator if a function is too long is actually if you have "headline comments", e.g. you have a code block:
Code: Pascal  [Select][+][-]
  1. // read file
  2. ...
  3. // parse file
  4. ...
  5. // update ui
  6. ...
If you have dedicated sections in your functions that you can give headlines to, then just take those code pieces and put them in a function with your headline as name. This is why I really love nested functions, because sometimes you just want a "named block" and not some code fragment visible to the rest of the file.

The second condition when you may want to break up a function, if you do the same thing multiple times. Depending on the complexity, my rule of thumb is, if I do something once, no need for a function. If I do it twice, if it's not to complex I also just do it inline. If I need the same code 3 times and it is non trivial code (i.e. >10 lines), I put it in a function.

But of course this also heavily depends on how "error prone" the function is. E.g. if it's a mathematical formula, where it's very easy to mess up some operations, I often put it into a dedicated function from the beginning, because it's something I may need to look up or reason about on a more mathematical logical level.

3. Can a function/procedure be too short ? and whatever the answer is, how is it justified ? (IOW, how is that answer justified ?)
Everything below 10 lines of code if it's not a placeholder, no-op or mathematical formula is generally to short for me to care.

 

TinyPortal © 2005-2018