Things worth thinking about
Today's posting is presented as a simple web poster.
Click here to view.
01:52 PMA Call for Responsible Page Ranking (and Web Development)
Why don’t Google, Yahoo, MSN (or wherever you get your site searches from) start providing us with a “Noise Score” for each listed site?
Won’t anyone admit how truly annoying it is to be searching for information regarding a given subject and the top search results return what I consider to be, total rubbish?
Here is an example (don’t worry, I won’t name names):
I did a very rough estimate of the ‘textual surface area’ of “real content” (i.e. the textual content directly related to why you are visiting a particular page) versus the ‘total page size’ of the web page.
This is what I came up with for the web page that prompted this posting. Prepare yourself, its not a pretty result…
Out of a total 2,348,032 pixels (1,024 X 2,293) of an entire web page, only 695 X 468 pixels (325,260) deal with actual, meaningful (?) content. That calculates to only 0.139 of the page’s content being actual, potentially informative, content.
The “Noise Ranking” (or “Noise Score”) for this particular page would therefore be 8.6 - round that up for a simple score of 9 out of 10 - meaning 9/10 of the web page is “Noise” or as I prefer to call it, “Garbage”.
Now, let me temper this just a little bit of fairness.
Yes, parts of the web page deal with site navigation, headers and footers - fair enough, but c’mon world, I came to your web site for meaningful information - NOT to click on your adsense advertisements or rate your article (you wouldn’t like my rating anyway!). Just let me read your content, in an as efficient manner as possible. The particular article that served as the basis of this rant, took 7 link clicks to read the entire article (the article was split across 7 separate and equally offensive “noisy” pages) - with the usual result - useless information and a wasted trip!
As an alternative to “Noise Rankings”, how about if these ‘High Noise Rank’ web sites provided us with a link that can be clicked on to redeem the part of our lives that were wasted by their site?
11:47 AMData Musings
Last summer (2009) I watched a brief lecture by Hans Rosling, regarding his pioneering work in bursting the myths about the developing world using a proprietary system, GapMinder*, which he wrote to help him analyze huge amounts of statistical data.
I have been thinking about this topic, more or less, ever since then. Not necessarily about the evidence of the socio-economic disparities (I see too much of that every day), but rather about data in general.
The internet is now an enormous 'society' in its own right — a sort of 'sub-world' that shadows the lives of us all…
Why not encourage the development of a universal standard method for accessing all the data in all the data warehouses of the world (as long as privacy rights are not trampled upon)?
I realize that most of you will scream something about ‘big brother’, but my retort would be, “and you don’t think big brother is already alive and kicking behind the shadows?”
So, why continue fighting ‘big brother’? Why not join him and use him as much as he is using us?
Imagine the progress the world could potentially make if we shared information instead of clutching it close to our chests ‘till our dying day?
I'm not asking for any one database system to become the predominant market leader — they all have their self-envisioned strengths and weaknesses that make them better or worse than the next competitor — they can continue being the Oracle, Sybase, MS-SQL, mySQL that they want to be, just lets find a common method that allows us all to access the ‘public’ portions of the world’s data in any way that is meaningful for us.
What if we all had access to any data that we were curious about or profoundly interested in and we could have a system like GapMinder that we could use to help us understand how the data does or doesn't relate to the answers we might be seeking?
*GapMinder was sold to Google and is now available for free. Visit the GapMinder site and have a play for yourself… it’s really quite interesting!
The Hans Rosling lecture
Does the Internet Foster Bad Programming?
I originate from the very early days of personal programming. When computer books were inexpensive and computers, when they could be found were outrageously expensive because no one was interested in computing. These were the days before Microsoft ran rampant throughout the world. The days of CP/M (computer program for microcomputers).
I was fortunate to be part of this 'front wave' of technological advances. It taught me a lot. These were the days when, if you wanted to sort some data, you had to not only know the strengths and weaknesses of shell sorts, bubble sorts and various other sorting methodologies but you also had to know how to write your chosen sorting algorithm yourself.
These were the days when you had to understand the underlying functionality and architecture of the silicon chips that made up the computer you were using. When machine cycles would make or break your program. When the hot fad of the day was to write your assembly code in the tightest, smallest way possible to squeeze one more microsecond out of your execution time and require one less nybble of memory consumption.
These were the days when TSRs (terminate and stay resident) hooks were the coolest thing since sliced bread. When computing magazines taught their readers new methodologies and algorithms instead of reviewing the latest computer game on the market.
As the world of computing became more popularized and technology advanced to allow for that popularity, the computing pundits began their lambasting of languages such as BASIC (beginners all purpose symbolic instruction code) which did not require the use of 'spaghetti code' for the program to be operational but certainly did not restrict the user, in any way, from writing as sloppily as they wanted.
The object oriented world seems to have reduced this problem of spaghetti code quite considerably but a determined programmer can still write sloppy, inefficient code using object oriented methods just as easily as the BASIC programmer could.
The difference, it seems, is that now days, the byte per buck ratio is significantly larger than it was, and programmers (even those from the lead software houses) seem to have a laissez faire attitude to their hardware's minimum requirements to be able to execute their program. The installation footprint is 578 megabytes? So? The minimum core memory requirement is 4 gigabytes? So?
Yes, what computers are capable of doing now is totally amazing, and yes, in order to support some of these amazing features you do require more resource consumption. But… let's get out of this throw-away mentality. Let's return to the days of self-pride. Lets not rely on the internet browser to automagically heal our un-closed code blocks. Lets not rely on everyone having fast internet connections (much of the world is still not on mega bits per second connections). Lets stop consuming our user's disk space as if our large footprint is our birthright.
Let us demonstrate some common human respect for our users and offer them the absolute best of what we are capable of delivering. It is through striving to present our best face that we ourselves grow and develop. Let yesterday's best, be tomorrow's worst!
09:48 AMDB Thoughts
I was thinking this morning (yes, you know I start that sort of thing quite early)…
MongoDB, CouchDB, Okeanos, etc. actually make a bit of sense for using with RoR (Ruby On Rails) and OO-PHP (object oriented PHP).
When designing a data store, one typically designs the store so that it reflects the structure of the data objects you are storing.
Why not go one step further and use a data store that speaks in the same terms as the language that is going to use it?
The normal data object representation in RoR and OO-PHP is a hierarchical structure of key -> value pairs (even if the value part of the pair is yet another array of key -> value pairs).
So, why not store the language's data in a way that doesn't require further interpretation of the structure so that it more closely matches a structure that the language knows how to deal with (and hence, a theoretical speed improvement).
I think the reason why the normal data structure of row-column became so persistent and prolific is because things like Lotus 1-2-3 and Excel became so commonplace in the business empires that needed to store their data.
There is a simple correlation between a table of rows and columns in a database and a spreadsheet's representation of rows and columns.
It was what people were used to seeing and working with and it was quite simple to pull in data from a database into a spreadsheet.
Now that people are using the web and the web uses PHP and RoR (notice the top billing), it only makes sense to store your data in a more directly usable way!
So there!
05:54 PM