[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: long URLs WAS(Re: [ProgSoc] International Solaris 10 University



Christian Kent wrote:
Same goes for accountancy, engineering, or most things with a bachelor's degree. Though what other professions get stressed about their complexity (and the necessity of it all) as much as IT? Maybe it's still new?

Not just 'new'. It's Big. The IT field is big. Really, really big. I mean, you just won't believe how vastly, hugely, mind-bogglingly big it is.


It's IT that solves half of the accounting, legal, and engineering problems. IT *includes* those fields. We need to understand them, model them, etc.

You've never savoured the success of a good database design?

Of course I have. Which is why I know that the more you normalise your database the more 'complex' it becomes. You begin to need more inner joins, you need more tables, etc. So, once again, Good does not necessarily imply Brief. I think people can waste to much energy trying to avoid long URLs, especially when they actually need long URLs.


It's funny that you've brought up this database thing though, because on the subject of 'knowledge' and 'scale' I'd like to refer you to the preface from Database Systems 3rd Edition - Connolly/Begg 2002, and An Introduction to Database Systems 6th Edition - Date 1994, each of which has been the recommended text book at UTS for the entry level database subject (DD, or whatever they call it these days).

In the preface to their book Connolly/Begg have the following to say with regard to knowledge, and simplicity:

The history of database research over the past 30 years is one of exceptional productivity that has led to the database system becoming arguably the most important development in the field of software engineering. The database is now the underlying framework of the information system, and has fundamentally changed the way many organisations operate. In particular, the developments in this technology over the last few years have produced systems that are more powerful amd more intuitive to use. This has resulted in database systems becoming increasingly available to a wider variety of users. Unfortunately, the apparent simplicity of these systems has led to users creating databases and applications without the necessary knowledge to produce an effective and efficient system. And so the 'software crisis' or, as it is sometimes referred to, the 'software depression' continues.

Here's what Date had to say back in 94, with regard to the sheer scale of *one aspect* of Information Technology:

The field of database technology is suffering from an information explosion - ironically enough, since information processing is what database technology is supposed to be all about. Here, for example, is a partial list of professional publications in the field that appear in the United States on a regular basis:

- ACM Transactions on Database Systems (TODS), published quarterly -- around 700 pages per year
- ACM SIGMOD Record, published quarterly -- around 250 pages per year
- Proceedings of the Annual ACM SIGMOD International Conference on Management of Data -- around 450 pages per year
- Proceedings of the Annual ACM SIGACT-SIGMOD Symposium on Principles of Database Systems -- around 350 pages per year
- Proceedings of the Annual International Conference on Very Large Data Bases (VLDB) -- around 650 pages per year
- The VLDB Journal, published quarterly -- around 450 pages per year


Add to the foregoing the various more specialized conferences on distributed database systems, or CAD/CAM databases, or expert database systems, or client/server systems, or object-oriented systems (etc.) -- say eight or ten conferences a year, with proceedings typically running at around 300-350 pages... add too the huge number of technical reports from universities and industrial research laboratories... add the occasional papers that appear in the publications of related disciplines, such as office automation, artificial intelligence, and programming languages... add the trade journals such as Data Base Newsletter, Database Review, InfoDB, Database Programming & Design, etc., etc., which together represent many thousands of pages per year... add the trade shows such as Database World and DB/Expo, each with its own voluminous set of proceedings... add the vendor reference manuals and other documents describing various commercial products, each with a new release every 18 months or so... add the numerous textbooks now available that have the word "database" somewhere in their title... and it becomes apparent that there are (conservatively) somewhere in excess of 100,000 pages of new material published *every year*. It is thus clearly impossible to keep abreast of everything that is happening in the database field.

Database systems are just one aspect of IT. Getting a database schema 'right' is a very challenging thing to do, and it requires an in-depth understanding of both database systems and the problem domain.

URIs are extraordinarily complex the more you delve into them. On the face of it, they're just a string that identifies a 'resource'. I'm pretty regularly awed by how awesome URI technology is. I think it's probably one of the most important and powerful specifications we have at our disposal. We can describe *everything in the known universe* with a URI. That's pretty cool.

URIs have a scheme for indicating a protocol, location description facilities, user identification facilities, a facility for describing a hierarchical path to navigate through a tree at a location, and a facility for passing named parameters and named parameter arrays (i.e. you can repeat the same parameter name in the URI's query string and that parameter will then be interpreted as a one-dimensional array of values). As you can have an arbitrary number of named parameters you effectively have both a 'hierarchical' selector (i.e. the 'path') and a 'multi-dimensional' selector (i.e. the query string). Making the determination of what is in the hierarchy, and what is in the query string, is not an easy thing to do. It's a very complicated design decision, and pretending it's trivial doesn't make it so. Once you have made a decision about what you describe in a hierarchy, you need to describe which way the resource described by the path can vary. It's not trivial at all. There are *heaps* of things to consider. For example, search engines won't necessarily index URLs on your site that vary only in the query string. HTML forms provide a facility for composing a URL for a HTTP GET which varies query string parameter values based on a user's selection. And so it goes, getting even more complicated... regional preferences, language, browser capabilities, layout, formatting, content, content encoding, style, syndication, re-branding, pagination, sorting, versioning, document formats (HTML, XHTML, PDF, plain TXT, RSS, SWF, PNG, GIF), legacy URL support, etc. etc.

If you're doing 'simple' things, then you won't have any of those problems.

Like I said before, a RoR CRUD system doesn't score any points, and long URLs are a reality for web based applications which provide non-trivial querying services. A query is a HTTP GET operation, apart from a few HTTP headers the majority of the stateless query context is deferred to the URL, thus a complicated query requires a complicated URL, and a complicated URL will be long. Like I said, long URLs are a reality. (Unless, like I also said, you provide a 'tinyurl' like service for all your URLs, but then you obfuscate them, a problem I also specified a potential solution for. But that's a non-trivial problem to solve too. (For example, so you always wrap a URL, or only if the user asks, or do you do it automatically and 301 (or 302?) back to the short URL, etc?))






- You are subscribed to the progsoc mailing list. To unsubscribe, send a message containing "unsubscribe" to progsoc-request@xxxxxxxxxxxxxxxxxxx If you are having trouble, ask owner-progsoc@xxxxxxxxxxxxxxxxxx for help.