| Search engines have robots that come to your site | | | | instead of your beloved Internet Explorer, Firefox, |
| and grab everything there is to grab. But because | | | | Opera or whatever browser you are attached to, you |
| competition is so fierce, there is no way to get in the | | | | go dig on the internet and download a version of the |
| search engines, unless you pay for ads or hire a SEO | | | | venerable Lynx browser?I'll tell you what would |
| (Search Engine Optimization) consultant, right? | | | | happen, and some will probably accuse me of giving |
| Wrong!Even if you pay big money, if your site is not | | | | away one of the secrets the SEO corporate |
| properly seen by the robots used by search engines | | | | community does not want you to know:You will be |
| for indexing, chances are many of your pages will | | | | able to see your site very close to the way a robot |
| never make it.In this article I will discuss the importance | | | | sees it. You will be able to look for errors in your |
| of having your website structured properly, the | | | | pages and track down navigation errors that might |
| importance of using the old fashioned hyperlinks | | | | block a robot from seeing portions of your site.In plain |
| versus the modern Flash menus, scripts and | | | | English, let's say you built a great looking site. There is |
| extensions and provide you with a very simple and | | | | an index page, the first page one sees when entering |
| free tool that will allow you to see your site in a similar | | | | your site. On that page you have the most incredible |
| fashion most indexing robots do. But first, let's define | | | | Flash navigation system, with a huge button pointing to |
| some of the concepts.What is a www robot?A robot | | | | your products and services and the rest of the site. If |
| is a computer program that automatically reads web | | | | Lynx goes to your index page and will not see a |
| pages and goes through every link that it finds.The first | | | | standard link, it will not be able to see the rest of your |
| robot was developed by MIT and launched in 1993. It | | | | site. There are extremely high chances that a lot of |
| was named the World Wide Web Wander and its | | | | indexing robots will not see your site either.You will |
| initial purpose was of a purely scientific nature, its | | | | then understand why your very large site, that has one |
| mission was to measure the growth of the web. The | | | | of the most intricate and functional Flash based |
| index generated from the experiment's results proved | | | | navigation systems on the planet never makes it high |
| to be an awesome tool and effectively became the | | | | into the search engines, even after all your efforts of |
| first search engine. Most of the online stuff we can't | | | | manually submitting it everywhere. It's simply because |
| live without today was born as a side effect of some | | | | you forgot to add basic hyperlinks. It's because when |
| scientific experiment.What is a search | | | | you submit a site - even manually - all that really |
| engine?Generically, a search engine is a program that | | | | happens is you telling the search engine "hey, Mr. |
| searches through a database. In the popular sense, as | | | | Search Engine, whenever you think you can find some |
| referred to the web, a search engine is considered to | | | | time, please send your trusty robot to my site".Folks, |
| be a system that has a user search form, which can | | | | robots can't usually use a navigation menu made in |
| search through a repository of web pages gathered | | | | Flash, Java script, PHP, etc. and will not be able to get |
| by a robot.What is a bot? What is a spider? What is a | | | | to your pages, it's as simple as that.How do I get |
| crawler?Bot is just a shorter, cooler (for some) version | | | | Lynx?Lynx first started life as a UNIX application, |
| of the word robot. Spiders and crawlers are robots, | | | | written by the University of Kansas as part of their |
| only the names sound more interesting in the press | | | | campus-wide information system. It then became a |
| and within metro-geek circles. For reasons of | | | | gopher application (a pre-web search tool), then a web |
| consistency, I will use the term robot throughout this | | | | browser. The official page for Lynx is however, if you |
| article, when referring to spiders, crawlers and bots.Are | | | | are not a Linux geek, used to play with binary |
| there other... things that crawl out there?Oh yeah, but | | | | distribution files and used to compiling your own apps |
| these things are way beyond the scope of this article. | | | | (don't worry about what I just said), you might want to |
| Well, for the conspiracy theory aficionados, let's see... | | | | find a version that someone else already made usable |
| we have worms - self-replicating programs, webants | | | | for your computer. For example, if you are a PC user |
| (or ants) - distributed cooperating robots, autonomous | | | | running Windows, you might want to check links to |
| agents, intelligent agents and many other bots and | | | | "Win32 compiled versions". At the time of writing, one |
| beasties.How do robots work?As with all other things | | | | such site is (called a distribution site) where you can |
| technical, I believe that the only way you will utilize a | | | | download a version that will install onto Windows |
| technology to its full potential and to your best | | | | machines in a fashion that will be familiar to non-geeks. |
| advantage is if and when you understand how that | | | | After you install the browser, you might want to read |
| technology works. When I say how it works, I don't | | | | the documentation. To get you going and to alleviate |
| mean intricate technical details, but fundamental | | | | your beginner frustrations, I'll tell you that you must |
| processes, big picture stuff.Generally, robots are | | | | press the G key (as in "go"), then type the complete |
| nothing but stripped down versions of web browsers, | | | | URL of the site you want to browse (starting with " |
| programmed to automatically browse and record | | | | then hit Enter. Use the arrows to navigate.Bottom line, |
| information about web pages. There are some very | | | | use Lynx to verify that every page of site is |
| specialized robots out there, some that look only for | | | | accessible and let the robots do all the work for you. |
| blogs, some that index nothing but images. Many (such | | | | You'll save yourself a lot of aggravation and maybe |
| as Google's GoogleBot) are based on one of the first | | | | some money that you would waste on advertising |
| popular browsers, called Lynx. Lynx was initially a pure | | | | your otherwise non-indexable site.--- |
| text browser, therefore, in today's internet Lynx would | | | | Andrei co-owns Bsleek - a company that specializes |
| be extremely robust and fast. Basically, if you can | | | | in web design, hosting, promotional items, printing, |
| program, you can take Lynx, modify it and make a | | | | tradeshow displays, logos, CD presentations, SEO and |
| robot.So how do these things actually work? They get | | | | more. Andrei has amassed an extensive technical |
| a list of websites, and literally start "browsing" them. | | | | knowledge and experience through his career as the |
| They come to your site and then start reading the | | | | CIO for a major travel management company and |
| pages and following every link, while storing different | | | | through his past careers in military research, data |
| information, such as page titles, the actual text of the | | | | acquisition and airspace engineering. He also consults |
| page, etc.Based on the above, what would happen if | | | | for Trinity Investigations, a New York based PI firm. |