My post count is probably inflated by something like 20% over the last few days, because the site often posts my reply twice when the load is high. Is there something I can do to avoid this? Is there something I should be doing to notify site administrators about the duplicates? I could click on the "Report this posting" icon if you like, rather than go back and edit down the duplicate to a no-op.
Also, should there be terminology to distinguish these duplicates from those resulting from deliberate cross-posting (separately posting the same message to multiple threads)?
That leaves an unattractive "This message has been deleted by era" trail; is there any way to avoid that? (I went ahead and deleted the duplicates I could quickly find.)
Forgot to mention that we are checking if there is some rogue spider than we cannot identify causing some of the high load. So far, nothing out of the ordinary in the log files discovered.
I spend many hours hunting down hungry, rogue spiders in the log files. I found many spiders not obeying robots.txt, so I blocked their network completely.
Maybe we will see some improvement before the upgrade, but I am not confident we will see much, since the bot traffic is very small compared to the Google search referral traffic.
Also, I spend many hours trying various MySQL and Apache configurations and tuning parameters, but that did not result in any tangible improvements and, in some cases, trying to add or allocate more resources just made it worse. For example, we were getting some errors because Apache needed more than 250 concurrent servers. I tried changing this to 350 and the load average when to over 350 (!!) during peak time, crushing the server.
Obviously, when we upgrade, traffic will greatly increase again. This is great, thanks to all of your super posts, great expertise, helping others help themselves.
Thank you for your patience during this phase of our growing pains!
As a side note, I am amazed at the number of spiders that roam the internet these days; and from all over the world. It seems everyone wants to be the next Google. I even noticed that two of the spiders were operating out of the new Amazon Elastic Computing Cloud (EC2).
Now I have a new security topic to write about, something like:
The Attack of the Cyberspiders from the Clouds !!!