Quick Diag.
The robots.txt in the root is blocking all search engines from indexing.
user-agent: *
disallow: /index.php?*type=rss
crawl-delay: 30
user-agent: YandexBot
disallow: /index.php?*type=rss
crawl-delay: 30
user-agent: Amazonbot
disallow: /
Not sure why it has Crawl Delay Set though, as if it obeys the agent * delay has no effect.
Same for the ones after, as they will be disallowed because all agents are disallowed with *
I would allow google and a few others and disallow the rest slurp is yahoo
User-agent: *
Disallow: /
User-agent: Applebot
Allow: /
crawl-delay: 30
User-agent: baiduspider
Allow: /
crawl-delay: 30
User-agent: Bingbot
Allow: /
crawl-delay: 30
User-agent: DuckDuckBot
Allow: /
crawl-delay: 30
User-agent: Facebot
Allow: /
crawl-delay: 30
User-agent: Googlebot
Allow: /
crawl-delay: 30
User-agent: msnbot
Allow: /
crawl-delay: 30
User-agent: Naverbot
Allow: /
crawl-delay: 30
User-agent: seznambot
Allow: /
crawl-delay: 30
User-agent: Slurp
Allow: /
crawl-delay: 30
User-agent: teoma
Allow: /
crawl-delay: 30
User-agent: Twitterbot
Allow: /
crawl-delay: 30
User-agent: Yandex
Allow: /
crawl-delay: 30
User-agent: Yeti
Allow: /
crawl-delay: 30
On Pages like login / register / etc add the 'no-index' tag to the header section or the meta tag, this tell the search engines not to crawl/index/follow these pages, its is not immediate as search engines have to re-crawl your entire site, some engines can take a few complete passes until they obey the change so be patient.
Note the delay of 30 with the amount of pages on the forum could stress the server, you could try increasing this to 100 or 300, the index will not be as up to date, but should help forum speed
// Block Specific pages from being crawled if these are fixed html/html files
User-agent: *
Disallow: /contactus.htm
// Block Directories that you dont want crawled etc, ie
User-agent: *
Disallow: /cgi-bin/
Disallow: /forum-user-attachments/
Disallow: /tmp/
Hope the above makes sense.
note robots.txt works fine for well behaved spiders, not all obey everything and some just completely ignore them.
if Using Apache, I would suggest updating the .htaccess file in the root to block these bots/spiders. by creating a rewrite rule. obviously check logs to see which bot it is and create a reqwite rule.
ie
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [NC,OR]
I would not block ip or ranges, ligit bots stick to there ip, bad bots do not and generaaly they go through proxies, so can change there ip at will.
Addition:If your using a sitemap.xml, if its set to update every 24hrs, increase to 72 hrs,
I had issues awhile back where a log file became so large, updates to it used somuch resources the site stalled, this file was over 30gb, even though it was deleted every7 days, this was not anticipated as normally it would be max 400mb, it increased by 9000% due to hacker attemps being logged. This was helped by using fail2ban, and logging to abuseipd, and using abuseipd as a blacklist check, it took a bit of configuring and fine tuning i think i ended up with 10 failed login attempts causing a block, and 3 various hack attemps causing a block.
baiduspider china leeding search engine
Applebot Apple
Bingbot Bing
DuckDuckBot
Facebot Facebook
Googlebot Google
msnbot MSN
Naverbot South Korean
seznambot Czech Republic
Slurp Yahoo
teoma ASK Search Engine
Twitterbot Twitter
Yandex Russian Search Engine
Yeti South Korea
Added DuckDuck to allowed list.