SquiggleBot SquiggleBot is the web crawler for the newest entry into the game of building the ultimate search engine. We have adopted a simplified philosophy. Rather than hiring 200 compter PHDs and spending $100million on hardware and salaries to eventually loose the search market to Google, Yahoo and MSN. We plan to stay very very small. Keep our server banks limited to a handful of processors and a few terabytes of data. Yes, I said very small! We do not intend to index every page of every website, but rather select good quality content avoiding spam, affiliates and adult based sites that comprise 90% of the 8 billion pages index by google. In that instance, we could provide clean and viable results with only 800 million pages in our database. The trick will be to set up very specific sites to deliver relevent data. Although it is nice to access a Google and be able to search for anything, when you are looking for a fishing lodge in the back woods of a small Canadian village, their small powerless low ranking website must compete with the millions of others and becomes almost impossible to find. However, if the search is dedicated only to fishing related websites and has only a few million sites the results could yeild much more relavent results. If we can prequality and categorize the search, to be fishing related. We can limit the search to only websites that fit the users needs. of course it is all great in theory. But can we actually do it? To first get our feet wet, we plan to subdivide all the results into very specific search bases first using our BumbleBee Network. Websites like http://buzztrader.com which will include automotive related searches that have indexed only websites dedicated to the automotive industry. Other content specific websites will include http://craftersbuzz.com, http://bumblebeemusic.com, http://fishingbee.com and many similar websites that have been previously developed for classifieds and content. Since our existing websites maitain heavy traffic, we should be able to learn quickly and slowly build a complete index with all of he content combined. The biggest obstacle will be to filter out the spam and keep the results clean. As Squigglebots's Crawler indexes more and more sites we learn what is junk and what's worth keeping. Afer we have qualified domains, the next step will be to descend further into the websites and include internal pages. We have developed smart technology into our search. Rather than indexing everything, our crawler can seek out more and more sites that are relevent to the most popular searches. This allows us to keep the databases lean while returning ample and relevent data to the end user. It will also increase speed for searches as more people search. Hey, its just a theory at this point. But we like it and we are going with it! And as if things are not tuff enough, we have to do it on a shoestring budget while working from home between paying jobs. More information about the project and progress can be found at the links at the bottom of each page. We will share our input and ideas with everyone in hopes that someone will whip this thing and come up with a great new search engine. Not that Google is not great, but there is room for smaller searches with unique results. As the internet expands it becomes harder and harder to find what you want. This problem is more prevalent in novice computer users and is driving them away from using the web as a tool. Small companies like ours could fill that need with out being able to provide more comprehensive searches of 8 billion pages. There will always be a need for companies like google, but with out smaller sites supporting special needs the web will have less value to the individual and less appeal to small business. ABOUT THE BOT If you do not want the squigglebot crawler accessing your website simply add the lines below to your robots text file. User-agent:
SquiggleBot Our bot will ignore your entire website. If you are being crawled by a bot claiming to be SquiggleBot in spite of the robots exclusion, it's not our bot. Our bot is less invasive than most and will not suck up all your images and eat yor bandwidth. It will also not descend into the directories unless the content is relevent to the crawl and your site has been previosly indexed and qualified as acceptable content based on your main page. The bot will not crawl more than 20 pages from a single domain in the same day. If you have a big site, it could take a long time for all the pages to be crawled hence avoiding massive bandwidth bills for both of us. We are also sensitive to the fact that many sites shut down when they reach bandwidth limits. So we limit the access rate to avoid causing such a scenario. Sorry, we can not stop all the malicious bots or data miners, we can only control our our bot. You should start seeing searches on our main websites by November or December 2006. This website is here to explain what we are doing and show the progress in development as well as share ideas about building a better search engine. Below are links to our pages for more information about the project and where you might find the results of our work. We are always open to suggestions and input that will assist in the build. [ Home ] [ Help ] [ More About The
Bot ] [ The
Cyber Web Inc ] [ Search Engine News ] [ Building The Search ] [ Project Progress ] [ Page Ranking ] |