Archives

Archive forJune, 2006

Google Algorithm and Axiomatic Set Theory

Warning. Skip this post if you hate mathematics.

Can we learn about Google algorithm by applying Inverse Mapping techniques? In the absence of tangible evidence, Google algorithm is assumed to be a black box. The black box can either be a proprietary software package or a stand-alone code (mostly it is). An inverse mapping is a function whose kernel can be inferred from its image. An example of this is a recursive function such as Fibonacci sequence. In our case, however, the black box is not an inverse mapping. The initial condition, X=FUNCTION(N,Y=1), and the kernel, are insufficient to infer the corresponding values of N. Therefore, the knowledge of different ordinate pairs (X,Y) are insufficient to deduce N.

Okay, so what do I want to prove now?

Using Axiomatic Set Theory and Inverse Mapping I want to prove:

    1. Google, and any other Mainframe based search engine cannot spider and index the entire net.
    2. You have no chance to know how exactly they works.
    3. They (Google, MSN, Yahoo) don’t know themselves how exactly it works.

The theory in one leg

Since its inception, there have been some mathematicians who have objected to using set theory as a foundation for mathematics, claiming that it is just a game which includes elements of fantasy. Though the axioms of set theory are fairly straightforward, they form the basis for several branches of mathematics including Algebra, Topology, Category Theory, etc.
[Read more about Axiomatic Set Theory] ; [Read more about Inverse Functions]

However, most of the work on set theory is done in Zermelo-Fraenkel (ZF) set theory. In this picture Ernst Zermelo 1871-1953 (left) and Adolf Fraenkel 1891-1965 (right).

Zermelo Fraenkel Set Theory

It’s axioms are as follows (source: Aron Wall):

The above shows a top set (red) containing blue sets, with each blue set containing green/purple sets, which are assumed to be all be different. The Axiom of Choice says you can always find a set containing exactly set from each of the blue sets, even with an infinity of blue sets or green sets. The purple sets in the above diagram form a choice set. Here are some other ways to state the axiom of Choice:

  • For any two cardinalities, either one is bigger, or they are equal. For this to be violated, you need a choice situation like above with no choice set, and then the number of B’s will be incomparable with the number of x’s. One would think there would be more x’s, but that requires there to be an x for every B. You would have to be able to choose a specific one for each B– the Axiom of Choice.
  • Every set is well orderable. You can prove that any infinite set can be divided into an infinite number of well orderable pieces each smaller than the infinite set. But you need the Axiom of Choice to do this “all the way down” and get a well ordered set.

Conclusion

Today’s search engines are bunch of huge programming code running on huge mainframes (OK, call it “datacenters”) with numberless patches on it. They’re using recursive functions which produce chaotic results that even their engineers cannot tame and put in relevant order. Developing a new search engine is MUST. Meanwhile if you’re running a SEO business, just stick to the white hat things and get points for the parts in the formula we do knows.

Comments (1)

Spell checkers and Morphological Analysis

Current search engines are utilizing morphological analyzers in their code and are very interesting to discuss about. In this typo example Google “understands” that Geoge Bush is equal to George Bush, see number 2 below. This is exactly what Morphological Analysis is all about . It also found a page with the typo itself. In few days search engines will show you this page as well :)

Google Spell Checker


Spell checkers today are very powerful and include morphological analyzers themselves, but leave it to the users to decide and select how to continue when morphological problem occur. However the paid search engines (Google AdWords, Yahoo Overture, etc.) have to make automatic spontaneous decisions and absolutely utilize embed morphological analyzers.

Comments

Google Health – A Vertical Search?

Apparently Google has vertical search now, at least for the health arena and search for anything health related. The results page gives users the option to narrow down or filter to produce desired results. For example, I searched for Tylenol and it let me choose from various options including: uses, side effects, symptoms, interactions and alternative medicine. Clicking on “From medical establishment/authorities” gives even more options.

Google health screenshot


Clicking on a refine link shows you a new Google command syntax, or “operators“. Interesting enough the health, or the “more” operator is not listed in its lists :)

Tylenol more:drug_side_effects

My opinion:

Basically, Google Health is what I expected €” an enhanced way to search for health related material, but not really a vertical search engine. I like the service as it reminds me of a great product called kosmix.com €” a vertical search engine that was designed to help users find health related content. As long as Google remains huge mainframe with many spider problems he can’t really approach the vertical search market.

Liked this post?

Send me a comment if you like to have more like this. I like to post some more “Google Uncover” topics in the future in the areas of:

    1. Google command syntax and its operators
    2. Vertical search
    3. Google spiderbot uncovered

Comments (1)

Google AdWords Ad Scheduling is live

Let’s start by reading Google’s Help Center https://adwords.google.com/support/bin/answer.py?answer=33227 message:

What is ad scheduling?

Ad scheduling lets you control the days and times your AdWords campaigns appear. Your AdWords ads normally are available to run 24 hours each day. Ad scheduling allows you to set your campaigns to appear only during certain hours or days of each week. For example, you might set your ads to run only on Tuesdays, or from 3:00 until 6:00 pm daily. With ad scheduling, a campaign can run all day, every day, or as little as 15 minutes per week. A campaign can also run and pause several times each day.

What to do?

You have to set your time first, you can find a link on AdWords welcome page.

Setting you awake/sleep ads is easy as 1-2-3:

How to pause your ads?

  1. Click on “Edit Campaign Settings” and on “Ad Scheduling” option.
  2. Choose daily (default), weekdays, or weekends cycle.
  3. Select times ads awake or sleep, you can add as many perios as required.
  4. Save your work. The Google clock is ticking and is checking your schedule from now!

Is it for me?

Think twice if you’re fighting against fraud clickers or you’re beside your potential customers. Learn well your market niche to determine your clients web behaviour and act accordianly. Some marketers like on-line jewelrers found “brown bag” lunch time is their best hot hours, so they want to pause their ads during late night hours (one minute after midnight to 7 am) to keep budget in shape for the noon hours. Movers companies found evening time is where family decisions happening, so they want to keep budget to that time. In the old times where ad scheduling wasn’t exist, many of movers spent they budget by noon.

Comments

Search3w Home Page

Comments (1)

Searchability and Usability

Searchability is the art of getting traffic to your website.
Usability is the art of keeping them around and turn them into buyers.

Google by Dilbert

Comments

About

I try to build a small platform to post some SEO related things that happening to me on a daily basis. It’s hard to find email address on web blogs, well, you’ve just found it! acroterion@gmail.com

Comments

Dynamic versus static web pages optimization

The choice between static and dynamic when building software for the web is a critical one, and one that I think deserves in-depth discussion.

In a dynamic site, pages are assembled “on the fly” as and when they are requested. Most server side languages as PHP, JSP and ASP powered sites do this technology by actively encourages dynamic content creation. Generating pages dynamically allows for all sorts of clever applications, from e-commerce, random quote generators to full on web applications such as Hotmail. See this dynamic web page generation diagram:

Dynamic web page generation

In a static publishing system, HTML pages are pre-generated by the publishing software and stored as flat files on the web server, ready to be served. This approach is less flexible than dynamic generation in many ways and is often ignored as an option as a result, but in fact the vast majority of content sites consist of primarily static pages and could be powered by static content generation without any loss of functionality to the end user. See this static web page generation diagram:

static web page (HTML) generation

The most widespread example of a static publishing system is D2S, which rebuilds static files for a site each time a page is added or modified – although it can be configured to serve content dynamically instead.

Benefits of dynamic publishing

At first glance, the benefits of dynamic publishing are obvious. What is frequently ignored are the benefits of static publishing, at least for content-driven sites which don’t have any heavy need for dynamic features. The most obvious benefit is performance; serving static files is what web servers such as Apache are optimised to do, and they can do it fast.

The reliability advantage

A big part is that it takes the pressure off of going live. You can be sure before going live that the published website is correct. The actual CMS may explode in flames, but the site will be fine. Going live with a web application is always a stressful process, and anything that reduces the stress of that is a great benefit. As time goes on, static publishing is also a big stress reduction for the system administrator, since a simple Apache configuration is a lot more reliable under different loads and configurations than any dynamic site will be.

Performance issues

Static site will increase the performance of any website or online application. Static pages will have a €˜circular’ effect on speed: static pages will take up less load time; less load time will allow for better performance under stress, and better performance will reduce the server stress and give the user faster downloads. Note, though, that accessibility should always have a higher priority than performance.

Static over dynamic – Conclusion

Not everything needs to be dynamically created. If there are pieces of information that have quite a long dynamic cycle, embed them statically, but perhaps allow for new items to be re-embedded easily, through a pseudo-dynamic process.

Comments (4)


(21) (6) (1) (1) (2) (15) (13) (1) (23) (1) (19) (1) (5)