Worm attack update:

Discussion in 'APUG System Announcements' started by Sean, Feb 3, 2005.

  1. Sean

    Sean Admin Staff Member

    Messages:
    8,990
    Joined:
    Aug 29, 2002
    Location:
    New Zealand
    Shooter:
    Multi Format
    The forum may be up and down until this thing blows over. I'll post updates when I get new info. Thanks guys
     
  2. rbarker

    rbarker Member

    Messages:
    2,222
    Joined:
    Oct 31, 2004
    Location:
    Rio Rancho,
    Shooter:
    Multi Format
    No problem, Sean. We know you're setting hooks out for those worms, so they can "go fish" somewhere else.
     
  3. oriecat

    oriecat Member

    Messages:
    243
    Joined:
    Nov 4, 2004
    Location:
    Portland, OR
    Shooter:
    35mm
    Thanks Sean! :smile:
     
  4. kwmullet

    kwmullet Member

    Messages:
    889
    Joined:
    Jan 3, 2004
    Location:
    Denton, TX,
    Shooter:
    Multi Format
  5. Sean

    Sean Admin Staff Member

    Messages:
    8,990
    Joined:
    Aug 29, 2002
    Location:
    New Zealand
    Shooter:
    Multi Format
    I think we may be ok now guys. It turns out we were not attacked by a worm, but by a mass amount of search engine spiders (they hit us and pull information). For now I am blocking all of them. When things settle down I may re-activate a few of them to maintain our indexes out on the net. I had no idea there were 100's of these bots hitting us like that, it was like a flood. I may keep the gallery offline a little longer. Thanks
     
  6. SchwinnParamount

    SchwinnParamount Member

    Messages:
    1,187
    Joined:
    Nov 29, 2004
    Location:
    Tacoma, WA
    Shooter:
    4x5 Format
    Any chance this was a deliberate attack by someone with an axe to grind? Don't mind me, I'm just being paranoid
     
  7. Sean

    Sean Admin Staff Member

    Messages:
    8,990
    Joined:
    Aug 29, 2002
    Location:
    New Zealand
    Shooter:
    Multi Format
    doesn't look that way, it looks like MSNBOT hit us the hardest (microsofts new search engine)
     
  8. Bob F.

    Bob F. Member

    Messages:
    3,984
    Joined:
    Oct 4, 2004
    Location:
    London
    Shooter:
    Multi Format
    MSNBOT seems a very hungry creature - you might want to use robots.txt to stop the sod crawling too deeply... This thread makes interesting reading (http://www.webmasterworld.com/forum97/73.htm). <EDIT: Hmmm - that link does not work from here - it does from Google... - do a google search on: MSNBOT voracious appetite... >


    Cheers, Bob.
     
  9. edz

    edz Member

    Messages:
    685
    Joined:
    Dec 4, 2002
    Location:
    Munich, Germ
    Shooter:
    Multi Format
    The problem is less the spiders but the site design.

    You CAN controll this via a few mechanisms, among them the robots.xt controll files and via HTTP headers and embeded metadata. Spiders that don't follow robots.txt or hit too fast you just block and be done.

    Looking at your pages I notice that neither of Modified nor Expires (settable in HTTP and also via HTTP-EQUIV in metadata) have been set. This is a bad thing as beyond spiders it also increases the bandwidth demand from those using large proxy communities--- typical for these PHP hacks and mildly amusing since the earliest Webforum systems (Hypernews) of a decade ago did not make these errors.

    Since, for all purposes, messages in the forum are like email messages, viz. unchanged, I'd set their expiration dates at a very distant point in the future, set the modification date to the date of the publication of the message and the cache-controll to public.

    If this does not have "enough" impact there are quite a few sets of other approaches, including disalloying all spidering (robots.txt) and pushing the spiders over to highly cached summary pages, OAI (Open Archive) and RSS feeds.
     
  10. Sean

    Sean Admin Staff Member

    Messages:
    8,990
    Joined:
    Aug 29, 2002
    Location:
    New Zealand
    Shooter:
    Multi Format
    That's what I've done to get us out of the woods for now. Might contact you in the future edz, thanks
     
  11. Aggie

    Aggie Member

    Messages:
    4,925
    Joined:
    Jan 1, 2003
    Location:
    So. Utah
    Shooter:
    Multi Format
    Sean, Tim is telling me this, so bear with me if it soumds a bit blonde.

    Posts are not emails. Posts are data base entries. This allows them to searched and indexed. Expiration is generally against what your data base is supposed to do. You don't want data to just expire and disappear into thin space. Least of all in a community like this. There is a way to lighten the load without expireing the data, which means hacking the forum php to set the default views to posts for only in a certain recent time period, usually a month. Generally it will also let you set it how ever long back to how ever long you want to view posts. But by setting the default value to a more current time period you ligthen the load by when the data base is searched for posts it doesn;t return every single value except for those posts in that specified time period.

    I hope you understand that, cause it just went over my head. To think I gave birth to a hacker knowledgable geek.
     
  12. edz

    edz Member

    Messages:
    685
    Joined:
    Dec 4, 2002
    Location:
    Munich, Germ
    Shooter:
    Multi Format
    I don't quite understand what chromatics have to do with anything?

    You are confusing context with storage models. Posts are like emails--- and too can be viewed as entries in a data base--- in that they are static and don't change--- that we have mechanisms to edit a message are aside the issues. The typical RDBMS system is designed to handle volatile and dynamic information. The volatile aspect of our forums is the topic. It tracks the development, like a mail folder, of a subject but each of the elements, the contributions, are static. In a relational database one needs to design for volatility such as seats as in the question "How many seats are left on the flight to San Francisco?". Data banks should more properly be called data markets. A forum, by contrast, is a collection of static and unchanging objects.

    The model for these forums is little else other than a threaded mailing list or what we've come to call Usenet News lifted over to a web interface (at first mail->web but latter also web->web). As a little aside I created the Mail->Web genre a good dozen years ago: see the w3c.org web museum.....

    No. Indexing and search is of information and have nothing to do with it. Relational database systems, in fact, are poorly suited to the task. Again you are confusing things, this time applications with organization and storage models.

    You don't understand what "expiration" in HTTP means. If its not set then the data can expire immediately since its not been defined. If I don't know when data expires then I also can't assume that it will never expire or that the data expires once I get a copy (as the case might be in a ticket reservation system). If data is volatile then this is want you may want to get someone to keep asking for data. This is where the modification date/time enter the picture. One then asks if the data has been changed since the last time one asked and got it.. But if it too is undefined then one will probably need to assume that the data might have been changed (most browsers allow one to set this on a per-session etc. basis but its in the hands of the client and not server). There are also some features for a hash of context to try to distinguish between changes but I think I'm getting too deep into the fine details of designing spiders and search engines (which I do) and less sites.

    That's why one needs to set an expiration date at a distant point in the future and set the modification date. Its up to the site administration to try to controll how clients (and web spiders are in this capacity nothing other than clients with a code of behaviour) behave.
     
  13. Andy K

    Andy K Member

    Messages:
    9,422
    Joined:
    Jul 3, 2004
    Location:
    Sunny Southe
    Shooter:
    Multi Format
    Why am I not surprised that a Microsoft product caused the problem? :rolleyes:
     
  14. KenM

    KenM Member

    Messages:
    800
    Joined:
    Apr 22, 2003
    Location:
    Calgary, Alb
    Shooter:
    4x5 Format
    Lots of good stuff edz, but keep in mind that thread contents can change after a period of time. People go back and re-edit posts - Aggie knows about this. Setting a very long expiration on the response would cause these changes to not be re-indexed in the short term.
     
  15. kwmullet

    kwmullet Member

    Messages:
    889
    Joined:
    Jan 3, 2004
    Location:
    Denton, TX,
    Shooter:
    Multi Format
    Also, Sean, if your service provider has a good set of Cisco skills, they can throttle the bandwidth available to traffic from certain address blocks and/or domains, so if there's a commonality to the sources of of the spiders, you could let them wander and do all they want, just within the bandwidth of a dialup connection or two. Doing so at the border router would also benefit the rest of their sites as well.

    -KwM-
     
  16. Valthonis

    Valthonis Member

    Messages:
    12
    Joined:
    Oct 26, 2003
    Location:
    California
    Originally Posted by Aggie
    Sean, Tim is telling me this, so bear with me if it soumds a bit blonde.
    I don't quite understand what chromatics have to do with anything?
    --------
    I dont quite understand why this concept is foreign to you. Must be chromaticly challenged.
    ---------
    Quote:
    Posts are not emails. Posts are data base entries.

    You are confusing context with storage models. Posts are like emails--- and too can be viewed as entries in a data base--- in that they are static and don't change--- that we have mechanisms to edit a message are aside the issues. The typical RDBMS system is designed to handle volatile and dynamic information. The volatile aspect of our forums is the topic. It tracks the development, like a mail folder, of a subject but each of the elements, the contributions, are static. In a relational database one needs to design for volatility such as seats as in the question "How many seats are left on the flight to San Francisco?". Data banks should more properly be called data markets. A forum, by contrast, is a collection of static and unchanging objects.
    --------
    I have to break this one down into its component flawed arguements to best show why this is a wrong interpretation.

    Quote: You are confusing context with storage models.
    --------- You are confusing context with method. Just because you view the forums LIKE an email system does not an email system make. You arent logging on to a pop3 server to transmit a properly formated message to then be sent through the magic smoke in the wires over the interweb to another pop3 email server to then be acessed by the end user. What you are doing is acessing an interactive php script that generates a form that is formated and then sent to the same server. It is indexed and added to a database. When a user requests the forum listing hes not logging on to his email server to get his local copy of an email. No. He is makeing a call to the database which then sends back a properly formated page with the requested data on it. There is no CC or BCC option.
    --------
    Quote: Posts are like emails--- and too can be viewed as entries in a data base--- in that they are static and don't change--- that we have mechanisms to edit a message are aside the issues.
    -------- Firstly learn the language. A runon sentance with syntax errors abounding. Posts are not emails but you could make an arguement about certain similartities. The biggest diffrence is that emails are distributed to multiple email servers so that there are multiple independant copies of the email. Your not supposed to edit the contents of the email once sent. Forum posts are semi-static. They can be edited only by the same user or a user with greater admin acess. They are specificly designed to be editable. The mechanisms to edit a message are Not besides the point.


    Quote: The typical RDBMS system is designed to handle volatile and dynamic information.
    --------- Firstly a definition of RDBMS for those too lazy to use google.
    *Short for relational database management system and pronounced as separate letters, a type of database management system (DBMS) that stores data in the form of related tables. Relational databases are powerful because they require few assumptions about how data is related or how it will be extracted from the database. As a result, the same database can be viewed in many different ways.*
    Databases in an environment such as apug Should handle volatile and dynamic information. The entries are specificly designed to be editable and thus dynamic. Entries can be removed or moved or even stored in a diffrent place and thus is volatile. However other than describeing something like APUG's forum database i dont see what this sentance is supposed to imply.

    Quote: The volatile aspect of our forums is the topic. It tracks the development, like a mail folder, of a subject but each of the elements, the contributions, are static.
    -------- The volatile aspect is the numerus catagories and sub catagories. The contents. The entire database is volatile. Again you can construe that the forums are LIKE a mail folder but image and perception does not a mail folder make. The elements are dynamic. They change based upon user input. They can be changed at any time by user intervention. Theres nothing mystical nor hard about this concept.

    Quote: In a relational database one needs to design for volatility such as seats as in the question "How many seats are left on the flight to San Francisco?".
    ------- Are you implying that in the forum database specificly designed and marketed to thousands of users that there must be a limited number of users able to acess the forum at any given time?

    Quote: Data banks should more properly be called data markets. A forum, by contrast, is a collection of static and unchanging objects.
    ------- Data banks should more properly be called Data Banks. The b in banks should be capitalized. Its important you know. A forum, by contrast, is an ever expanding collection of dynamic entries that can be changed at any time by an end user working from a remote connection to a centralized database.


    Quote: The model for these forums is little else other than a threaded mailing list or what we've come to call Usenet News lifted over to a web interface (at first mail->web but latter also web->web). As a little aside I created the Mail->Web genre a good dozen years ago: see the w3c.org web museum.....
    -------- The model for these forums is little else other than phpBB with several phpHacks and a skin that matches the design. This is the defacto standard for web forums but the administrator could have used a diffrent system as his model. To have the arrogance to presume exactly and definately the model is an insult to him. Plus Usenet is being phased out by major isp providers. The system is DEAD. Oh and its nice that you did something relavent in the past dozen years.. :rolleyes:

    ---------
    Quote:
    This allows them to be searched and indexed.

    No. Indexing and search is of information and have nothing to do with it. Relational database systems, in fact, are poorly suited to the task. Again you are confusing things, this time applications with organization and storage models.
    ------- Use english. Were not useing ebonics here. Indexing and search is of information? wtf? Relational database systems allow the user to do a search for RELATED information. Not just specific entries. Just as all these posts are RELATED to the origional Post(entry) they are related to the FORUM(catagory) which is related to the FORUMS(Forum object on the webserver). There are numerus other relationships that are too tedious to point out to your narrow field of view. Mabey you should stop useing antiquated old email listing systems (usenet) and join the modern world. Im confusing things again am i? Your confuseing stupid for english. Before you post please write it down and ask someone to grammar check you because the first and last sentance of this block DONT MEAN ANYTHING. Use English.
    ----------
    Quote:
    Expiration is generally against what your data base is supposed to do.

    You don't understand what "expiration" in HTTP means. If its not set then the data can expire immediately since its not been defined. If I don't know when data expires then I also can't assume that it will never expire or that the data expires once I get a copy (as the case might be in a ticket reservation system). If data is volatile then this is want you may want to get someone to keep asking for data. This is where the modification date/time enter the picture. One then asks if the data has been changed since the last time one asked and got it.. But if it too is undefined then one will probably need to assume that the data might have been changed (most browsers allow one to set this on a per-session etc. basis but its in the hands of the client and not server). There are also some features for a hash of context to try to distinguish between changes but I think I'm getting too deep into the fine details of designing spiders and search engines (which I do) and less sites.
    ---------- Expiration means the data expires at a certain point. IE it is no longer relavent. IE it should not be viewed. IE you are confuseing things. IF Expiration is NOT SET then the data is not supposed to expire. If its not set you cant assume its supposed to expire at all. All you know is that the expiration is not set. The forums dont operate on a token key system. There is no reservation. If the data is volatile then you may want the data to be volatile. The context of the data and how it is acessed determines if you want people to keep asking for it. Because most of this site has dynamic generation of data probably through php scripts then by its very design you want users to continualy request the newest data. The browser should always assume in this case that the data HAS changed. A spider or robot just follow links and there is code to stop them from trying to index past a certain point. The robot.txt and setting iptables are the most common ways.
    ---------
    Quote: You don't want data to just expire and disappear into thin space. Least of all in a community like this.

    That's why one needs to set an expiration date at a distant point in the future and set the modification date. Its up to the site administration to try to controll how clients (and web spiders are in this capacity nothing other than clients with a code of behaviour) behave.
    -------- That is why you DONT set the expiration data Period. End of Story. You do not want your forum data to expire. You already have built into the database systems to limit the data returned by the database on client request. That data should not expire unless the forum administrator wishes it to. Your method requires continualy reseting the expiration date when anything changes. A simpler step is to just not expire the data.
     
  17. Tom Duffy

    Tom Duffy Member

    Messages:
    963
    Joined:
    Nov 13, 2002
    Location:
    New Jersey
    Certainly more than I needed to know... Maybe we should move this to the lounge...
     
  18. SchwinnParamount

    SchwinnParamount Member

    Messages:
    1,187
    Joined:
    Nov 29, 2004
    Location:
    Tacoma, WA
    Shooter:
    4x5 Format
    Valthonis,

    There is no reason for you to make personal attacks against edz. He may be wrong but doesn't deserve rudeness. As an aside, before attacking his grammar you should check your own. Use a spell checker too. You have spelling errors too numerous to mention.

    I am also a RDMS admin/software developer but can spell and write the Queen's english. We should not let technical expertise excuse us from the requirement to write well.
     
  19. Andy K

    Andy K Member

    Messages:
    9,422
    Joined:
    Jul 3, 2004
    Location:
    Sunny Southe
    Shooter:
    Multi Format
    Bloody hell Schwinn, you actually bothered reading all that? lol!
     
  20. edz

    edz Member

    Messages:
    685
    Joined:
    Dec 4, 2002
    Location:
    Munich, Germ
    Shooter:
    Multi Format
    Skipping much of the dribble, I see a lack of understanding.

    POP3 is just a little protocol designed to pass around some e-mail messages that conform to a certain standard. One should never confuse a transfer protocol with the content of the messages being transfered. One should never confuse the syntactics of the message, its form, with its context. One should never confuse the grammar with the model. One should not confuse metaphor with concrete instance. A story is a story and pixies don't in real life fly.

    So? And they are ill-suited for many uses. RDBMSs are typically not terribly good at searching for something in anything.

    No. One really needs to distinguish between the need/desire to be able to handle some changes and the need/desire to handle volatility. Its like the difference in evaporation between a glass of oil and a glass of acetone. In designing RDBMSs--- and I've designed a few, including even work some decades ago to go as far as to embed an RDBMS in a disk controller--- one has a set of goals and contraints. Once one gets rid of the demand for volatility and "instants" in contrast to snapshots one can start to use other approaches, models and algorithms. I have some customers that have been using my software (and implementation of some of my algorithms) to search though many millions and millions of human genome records. RDBMSs system like Oracle running even on large sexy big Sun servers just can't handle things. One a cheap PC from the supermarket we can search through many GBs of data in milliseconds. We can do this also searching though complete structure (yes, we can search anywhere in an XML/SGML tree including even among unnamed siblings) . Nearly all of the RDBMSs tend to be very poor at polymorphism. A developer of a system using an RDBMS needs to define his data model with great care. You don't want to feed a RDBMS with just anything, just anyhow.
    Look over at photo.net, a very well developed RDBMS-backed forum system. .. why do you think (beyond the VC-deathtrap that ArsDigita fell into) they don't offer search (despite even starting off using Illustra and then Oracle)?


    I don't understand the relevance of popularity--- the crux of your "defaco standard" as in Microsoft Windows to totalitarian and corrupt governments to ... I don't want to touch upon why PHP is currently popular nor the current trends of "development".

    PHP is, however, not a standard but a popular piece of software. Most of the forums that have been developed using PHP seem to assume a similar data model but there is little in PHP to dictate the model nor is there any indication that these models have been chosen by carefull design to scale. Because they don't scale well.

    One can model the messages and threads and trees of discussions in a RDBMS but its not terribly effecient for either search, discovery or retrieval. RDBMSs tend to be very good at storage but have a high overhead for indexes. The systems work fine as long as they are small but as they grow and get "large" (the semantics for "large" change in responce to developments in computers and storage devices) they become unuseable. There are means and ways to try to address these deficits.