[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: long URLs WAS(Re: [ProgSoc] International Solaris 10 University
Christian Kent wrote:
Same goes for accountancy, engineering, or most things with a bachelor's
degree. Though what other professions get stressed about their
complexity (and the necessity of it all) as much as IT? Maybe it's
still new?
Not just 'new'. It's Big. The IT field is big. Really, really big. I
mean, you just won't believe how vastly, hugely, mind-bogglingly big it is.
It's IT that solves half of the accounting, legal, and engineering
problems. IT *includes* those fields. We need to understand them, model
them, etc.
You've never savoured the success of a good database design?
Of course I have. Which is why I know that the more you normalise your
database the more 'complex' it becomes. You begin to need more inner
joins, you need more tables, etc. So, once again, Good does not
necessarily imply Brief. I think people can waste to much energy trying
to avoid long URLs, especially when they actually need long URLs.
It's funny that you've brought up this database thing though, because on
the subject of 'knowledge' and 'scale' I'd like to refer you to the
preface from Database Systems 3rd Edition - Connolly/Begg 2002, and An
Introduction to Database Systems 6th Edition - Date 1994, each of which
has been the recommended text book at UTS for the entry level database
subject (DD, or whatever they call it these days).
In the preface to their book Connolly/Begg have the following to say
with regard to knowledge, and simplicity:
The history of database research over the past 30 years is one of
exceptional productivity that has led to the database system becoming
arguably the most important development in the field of software
engineering. The database is now the underlying framework of the
information system, and has fundamentally changed the way many
organisations operate. In particular, the developments in this
technology over the last few years have produced systems that are more
powerful amd more intuitive to use. This has resulted in database
systems becoming increasingly available to a wider variety of users.
Unfortunately, the apparent simplicity of these systems has led to users
creating databases and applications without the necessary knowledge to
produce an effective and efficient system. And so the 'software crisis'
or, as it is sometimes referred to, the 'software depression' continues.
Here's what Date had to say back in 94, with regard to the sheer scale
of *one aspect* of Information Technology:
The field of database technology is suffering from an information
explosion - ironically enough, since information processing is what
database technology is supposed to be all about. Here, for example, is a
partial list of professional publications in the field that appear in
the United States on a regular basis:
- ACM Transactions on Database Systems (TODS), published quarterly --
around 700 pages per year
- ACM SIGMOD Record, published quarterly -- around 250 pages per year
- Proceedings of the Annual ACM SIGMOD International Conference on
Management of Data -- around 450 pages per year
- Proceedings of the Annual ACM SIGACT-SIGMOD Symposium on Principles
of Database Systems -- around 350 pages per year
- Proceedings of the Annual International Conference on Very Large Data
Bases (VLDB) -- around 650 pages per year
- The VLDB Journal, published quarterly -- around 450 pages per year
Add to the foregoing the various more specialized conferences on
distributed database systems, or CAD/CAM databases, or expert database
systems, or client/server systems, or object-oriented systems (etc.) --
say eight or ten conferences a year, with proceedings typically running
at around 300-350 pages... add too the huge number of technical reports
from universities and industrial research laboratories... add the
occasional papers that appear in the publications of related
disciplines, such as office automation, artificial intelligence, and
programming languages... add the trade journals such as Data Base
Newsletter, Database Review, InfoDB, Database Programming & Design,
etc., etc., which together represent many thousands of pages per year...
add the trade shows such as Database World and DB/Expo, each with its
own voluminous set of proceedings... add the vendor reference manuals
and other documents describing various commercial products, each with a
new release every 18 months or so... add the numerous textbooks now
available that have the word "database" somewhere in their title... and
it becomes apparent that there are (conservatively) somewhere in excess
of 100,000 pages of new material published *every year*. It is thus
clearly impossible to keep abreast of everything that is happening in
the database field.
Database systems are just one aspect of IT. Getting a database schema
'right' is a very challenging thing to do, and it requires an in-depth
understanding of both database systems and the problem domain.
URIs are extraordinarily complex the more you delve into them. On the
face of it, they're just a string that identifies a 'resource'. I'm
pretty regularly awed by how awesome URI technology is. I think it's
probably one of the most important and powerful specifications we have
at our disposal. We can describe *everything in the known universe* with
a URI. That's pretty cool.
URIs have a scheme for indicating a protocol, location description
facilities, user identification facilities, a facility for describing a
hierarchical path to navigate through a tree at a location, and a
facility for passing named parameters and named parameter arrays (i.e.
you can repeat the same parameter name in the URI's query string and
that parameter will then be interpreted as a one-dimensional array of
values). As you can have an arbitrary number of named parameters you
effectively have both a 'hierarchical' selector (i.e. the 'path') and a
'multi-dimensional' selector (i.e. the query string). Making the
determination of what is in the hierarchy, and what is in the query
string, is not an easy thing to do. It's a very complicated design
decision, and pretending it's trivial doesn't make it so. Once you have
made a decision about what you describe in a hierarchy, you need to
describe which way the resource described by the path can vary. It's not
trivial at all. There are *heaps* of things to consider. For example,
search engines won't necessarily index URLs on your site that vary only
in the query string. HTML forms provide a facility for composing a URL
for a HTTP GET which varies query string parameter values based on a
user's selection. And so it goes, getting even more complicated...
regional preferences, language, browser capabilities, layout,
formatting, content, content encoding, style, syndication, re-branding,
pagination, sorting, versioning, document formats (HTML, XHTML, PDF,
plain TXT, RSS, SWF, PNG, GIF), legacy URL support, etc. etc.
If you're doing 'simple' things, then you won't have any of those problems.
Like I said before, a RoR CRUD system doesn't score any points, and long
URLs are a reality for web based applications which provide non-trivial
querying services. A query is a HTTP GET operation, apart from a few
HTTP headers the majority of the stateless query context is deferred to
the URL, thus a complicated query requires a complicated URL, and a
complicated URL will be long. Like I said, long URLs are a reality.
(Unless, like I also said, you provide a 'tinyurl' like service for all
your URLs, but then you obfuscate them, a problem I also specified a
potential solution for. But that's a non-trivial problem to solve too.
(For example, so you always wrap a URL, or only if the user asks, or do
you do it automatically and 301 (or 302?) back to the short URL, etc?))
-
You are subscribed to the progsoc mailing list. To unsubscribe, send a
message containing "unsubscribe" to progsoc-request@xxxxxxxxxxxxxxxxxxx
If you are having trouble, ask owner-progsoc@xxxxxxxxxxxxxxxxxx for help.