SquiggleBot Page Ranking I keep hearing how searches rank pages. Considering paramaters such as outside page linking, text fonts and size, heading placements and keywords in the heading tags. So my question is, what relavence does any of this have? How can we rank a page becuase some one has huge letters that say "Used Cars For Sale" and then have content about software to manage your auto repairs. The fact is, the site builders know the game better than the search engines. Search engines must deal with a sea of spam that makes e-mail abuse look like a walk in the park. Therefore its time to go back to basics. Actually rank a page based on the page itself. It seems like thinking outside the search box, but it could work. Imagine getting results based on what is actually on the page. It is a nice thought to use outside sources to generate relavancy, but I had a hosting site that came back from google as relavent to "Reggae Music" because at one time I had clients in that industry with return links. Needless to say, hosting is not at all related to music. The concept of relating sites based on outside content is flawed, since it opens people to attack from competitors that dilute relavency or just people trying to help by returning links from sites that have some search engine flaws counting against your site. It is so exploited that there are link brokers selling link space on websites to up your website page rank. So you can sell a link or buy a link to trick the search engine into thinking your site is good. The concept of allowing sites higher ranking based on headings just plain sucks and is widely abused resulting in spam pages with huge text that is highly ranked and lowly viewed. Webmasters using divisions off the page with huge text to fool the search crawler and remain invisable to the viewer. These practices are making searches less relavent and more spamable. We need to get back to classifying page and website content and feeding back results that are actually on the pages. It may be a bit extreme to actually base results on the pages, but the results would be awesome. The trick will be to filter out blatenet spam and exclude those domains from the index. That process could be bigger than the search itself. Also many people wil be mad for not being included. However the good sites will remain and the insane amout of spam would stop. People could actually make money building quality websites rather than targeting search engines with automated attacks in the form of billions of doorway pages and spam pages with ads. Our plan will be to rank pages based on the overall website text. With little attention to "BIG Text" or text placement but rather an overview of the content. For example. This website is about building a search engine. But if you are searching for "Building a Search Engine" you will never find it, because we are not linked to by other sites via keywards in your keyword search terms and we lack huge page titles with more keywords in bold oversize type. On the other hand, if the site could be classified as good content, unique in nature and on topic:search engine stuff, then the site could be returned for that querry. Giving the user good articles to read about building a search engine. But if you rank the page based on how many times we link the words "Search Engine" or how big the link is, readers will be denied good quality articles that may only use the keywords one or two times in the page. It seems like the google concept was awesome when the internet was just getting started. But now they have help mold the form of spam pages by setting the standard for being included. If spammers actually have to write good articles and provide unique content, they will shrivel up and die. They are not the working types, so they would just look for an easier outlet. Squigglebot could reel in the internet and give it back to the idealists that wanted good content at the click of a mouse. By prefiltering the pages and classifying websites rather than elaborate algorythms that defy thought. Another key that a page is spam is the outside advertising. OK, everyone is trying to make money displaying some type of ads. But if a website is selling web hosting and they have google ads with 4 other hosting companies selling their product, it is abundantly clear that the website owner is less concerned with selling their own product than making money advertising. We would have to exclude this site since it is likely an advertising front or not a very good web host. On the other hand, if a site is a classified site and has google ads, then it is in line with the site. So it should be included. Other spam keys include cloned sites using the same keyword but representing different cities with a time and weather table from that city. Or sites that have more ads than content. It also seems like the big 3 count against sites that use heavy graphics, but those sites tend to be more prepared and developed than the spammers. So we will not penalize someone for graphics but rather reward them for the unique work. There is neve an easy solution to raking pages, but the current systems definetly need some work. [ Home ] [ Help ] [ More About The
Bot ] [ The
Cyber Web Inc ] [ Search Engine News ] [ Building The Search ] [ Project Progress ] [ Page Ranking ] |