Recent

Author Topic: Forum slow  (Read 7174 times)

Zvoni

  • Hero Member
  • *****
  • Posts: 2800
Re: Forum slow
« Reply #30 on: April 25, 2024, 11:58:20 am »
OTOH ignoring robots.txt is not a nice way to operate.

Quote from: Zvoni
Redirect the IP of that search engine to a page showing a "middlefinger"?
One System to rule them all, One Code to find them,
One IDE to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------
Code is like a joke: If you have to explain it, it's bad

Marc

  • Administrator
  • Hero Member
  • *
  • Posts: 2622
Re: Forum slow
« Reply #31 on: April 25, 2024, 12:06:55 pm »
OTOH ignoring robots.txt is not a nice way to operate.

Quote from: Zvoni
Redirect the IP of that search engine to a page showing a "middlefinger"?

On Apache level: return 403 for ClaudeBot
On fail2ban level: block IP for bad-bots

even on a 403 they kept on going
//--
{$I stdsig.inc}
//-I still can't read someones mind
//-Bugs reported here will be forgotten. Use the bug tracker

Thaddy

  • Hero Member
  • *****
  • Posts: 16430
  • Censorship about opinions does not belong here.
Re: Forum slow
« Reply #32 on: April 25, 2024, 12:52:55 pm »
I am fine with the fail2ban solution, merely remarked the bot its purpose.
Thanks Marc!
There is nothing wrong with being blunt. At a minimum it is also honest.

440bx

  • Hero Member
  • *****
  • Posts: 4915
Re: Forum slow
« Reply #33 on: April 25, 2024, 02:53:42 pm »
even on a 403 they kept on going
Just a thought... for bots that keep going, is it possible to cause the server to reply only every five (5) seconds to requests from their IP ?

That would take a load off the server :)
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

Marc

  • Administrator
  • Hero Member
  • *
  • Posts: 2622
Re: Forum slow
« Reply #34 on: April 25, 2024, 03:21:23 pm »
even on a 403 they kept on going
Just a thought... for bots that keep going, is it possible to cause the server to reply only every five (5) seconds to requests from their IP ?

That would take a load off the server :)

Not if they do it from 467 different IP addresses
//--
{$I stdsig.inc}
//-I still can't read someones mind
//-Bugs reported here will be forgotten. Use the bug tracker

Thaddy

  • Hero Member
  • *****
  • Posts: 16430
  • Censorship about opinions does not belong here.
Re: Forum slow
« Reply #35 on: April 25, 2024, 04:11:12 pm »
That would take a load off the server :)
Marc knows what he is doing.
fail2ban (or similar) works.
And he also knows what to do with my second suggestion, which limits the bot on the server software directly at least on the server software the forum uses.

Your suggestion comes close to 2) but is a bit, do not misunderstand me, a bit naive.
The server configuration already does something like that with the same effect.

The reason I made my remarks is that fail2ban ca be a bit crude and it may be beneficial to the community if a crawler/bot is crawling with the intention to improve their AI brand. But in this case it interfered with the workings of the forum.
In this case there is also an ethical question at play, since the bot ignores well established standards on how to behave.

Better leave it to Marc: he is a pro.
What you suggested is - part of - an option that is already available to him and he actually uses. So it is not a bad idea, but usually already implemented.
It takes time though to mitigate such issues. fail2ban can be configured like a sledge hammer. or a fly killer. But note my remarks: if the ethical question is considered and the opinion is, that the indexing by an AI company bot is not harmful and Marc has the time, he can configure the server to let the bot do its work but under his control.
« Last Edit: April 25, 2024, 04:22:24 pm by Thaddy »
There is nothing wrong with being blunt. At a minimum it is also honest.

440bx

  • Hero Member
  • *****
  • Posts: 4915
Re: Forum slow
« Reply #36 on: April 25, 2024, 04:19:43 pm »
Better leave it to Marc: he is a pro.
I have no doubt he knows what he's doing.

I just made a suggestion, it is entirely up to him to decide if it is possible, practical and/or worthwhile and by his response it doesn't look like it's practical. 
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

Joanna from IRC

  • Hero Member
  • *****
  • Posts: 1297
Re: Forum slow
« Reply #37 on: April 25, 2024, 05:08:47 pm »
Is there a way to ban bots altogether? It sounds like the entire forum is being harvested for the benefit of companies that want to profit from other people’s work..
✨ 🙋🏻‍♀️ More Pascal enthusiasts are needed on IRC .. https://libera.chat/guides/ IRC.LIBERA.CHAT  Ports [6667 plaintext ] or [6697 secure] channel #fpc  #pascal Please private Message me if you have any questions or need assistance. 💁🏻‍♀️

Thaddy

  • Hero Member
  • *****
  • Posts: 16430
  • Censorship about opinions does not belong here.
Re: Forum slow
« Reply #38 on: April 25, 2024, 05:16:05 pm »
Yes there is a way and Marc used it this morning: server tooling like fail2ban and with some - a lot - more work the server software itself, like ngynx and apache.
In this case, you must understand that a bot is not always harmful and can even benefit the community. But there is indeed an ethical question, that is correct.
It was not my intention to open a hornet's nest. I should have made the conversation with Marc private.
BTW my suggestions are used by myself on some of my suppositely vulnerable servers.
And it is clear our sysop uses them too.

People that experience such issues with their own servers can send a private request.
The topic is too complicated to do it right if you lack knowledge on how to manage servers. This is beyond the scope of this forum, so I hope the reactions stay at a minimum.
« Last Edit: April 25, 2024, 05:27:56 pm by Thaddy »
There is nothing wrong with being blunt. At a minimum it is also honest.

Joanna from IRC

  • Hero Member
  • *****
  • Posts: 1297
Re: Forum slow
« Reply #39 on: April 25, 2024, 05:33:04 pm »
How are bots identified besides being able to do things at a speed that a human can’t?
✨ 🙋🏻‍♀️ More Pascal enthusiasts are needed on IRC .. https://libera.chat/guides/ IRC.LIBERA.CHAT  Ports [6667 plaintext ] or [6697 secure] channel #fpc  #pascal Please private Message me if you have any questions or need assistance. 💁🏻‍♀️

Thaddy

  • Hero Member
  • *****
  • Posts: 16430
  • Censorship about opinions does not belong here.
Re: Forum slow
« Reply #40 on: April 25, 2024, 05:51:41 pm »
In this case just by identifying the traffic. Reading logs. The logs are round-robin and can partially be mitigated by automation.
But as I intended to make clear: this is not the place for a server management course.
Do not spoil that Joanna, I am in a friendly mood today...
« Last Edit: April 25, 2024, 05:55:05 pm by Thaddy »
There is nothing wrong with being blunt. At a minimum it is also honest.

Kays

  • Hero Member
  • *****
  • Posts: 614
  • Whasup!?
    • KaiBurghardt.de
Re: Forum slow
« Reply #41 on: April 26, 2024, 12:22:56 am »
I don’t understand. What has fail2ban to do with web crawlers? Fail2ban already has in its name that there was/were failed attempt(s) regarding a network resource, e. g. you entered five times a wrong password thus for the next fifteen minutes any request from your IP address are immediately rejected. What kind of failed requests does a web crawler do though that fail2ban could be useful? A web crawler follows all kinds of HTML <a href="…">…</a> links, it does not invent random URLs thus causing loads of 404 resource not found errors (= failures), does it?
Yours Sincerely
Kai Burghardt

TRon

  • Hero Member
  • *****
  • Posts: 3844
Re: Forum slow
« Reply #42 on: April 26, 2024, 01:57:54 am »
When looking at some of the IP blocks, I noticed they all were owned by amazon. Some googling resolved in ClaudeBot made by anthropic.com.
I've blocked this bot from accessing this forum
That certainly helped a lot. Thank you for having looked into it and solving it MW. Much appreciated.

I don’t understand. What has fail2ban to do with web crawlers?
...
fail2ban can do more than just 'watching' login attempts. For a quick example see here.
I do not have to remember anything anymore thanks to total-recall.

MarkMLl

  • Hero Member
  • *****
  • Posts: 8141
Re: Forum slow
« Reply #43 on: April 26, 2024, 08:03:08 am »
Is there a way to ban bots altogether? It sounds like the entire forum is being harvested for the benefit of companies that want to profit from other people’s work..

That raises the important question of whether this forum- and the overall community of Pascal users- loses more from its content being abused or from cutting itself off from a potential source of publicity.

Pascal- or Delphi, or whatever you want to call it- needs all the help it can get.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

MarkMLl

  • Hero Member
  • *****
  • Posts: 8141
Re: Forum slow
« Reply #44 on: April 26, 2024, 08:19:05 am »
How are bots identified besides being able to do things at a speed that a human can’t?

If something tries to connect via HTTP (i.e. the case we're looking at), there's lots of ways based on headers and so on. See e.g. https://code-maze.com/http-series-part-3/ which I stumbled upon while trying to answer the next para adequately.

For non-HTTP, there's still a surprising number of ways based on the use of known IP sender addresses, spoofed sender addresses, peculiarities in higher-level protocol (e.g. ssh) use, peculiarities in TCP (e.g. how it reacts to a forced error), or IP. See https://lcamtuf.coredump.cx/p0f3/ which I've used when doing some router work: you've no idea how much utter crap hits an exposed router connected to the raw Internet.

If you have a couple of minutes also see https://lcamtuf.coredump.cx/oldtcp/tcpseq.html and https://lcamtuf.coredump.cx/newtcp/ which, by coincidence, are from the same researcher. I don't know the extent to which they're still valid, but they're a very good example of how "rolling your own" network implementation can be unwise.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

 

TinyPortal © 2005-2018