Betamax 2.0: the future of mashups?
Complexity gets complicated
Simplicity in software is, I believe, more than just a noble aim; it is essential for successful software projects. However, simplicity should not be assumed just because one particular technology or methodology is being used.
Mashups, discussed recently by Reg Dev reader Aubry Thonon , are a case in point. One element of the hype around mashups is they are simple to build because all you have to do is link together a few APIs and then you are done.
Life is never that simple, though. Building successful, effective, reliable and long- running mashup applications is not a trivial task, and - indeed - is something that creates its own architectural, organizational and implementation problems.
The current, popular, definition of a mashup is of a web-based application that combines data from two or more sources, into a single integrated solution. The most widely cited example of this is the combination of geospatial data from, say, Google Maps , with information on local businesses taken from an online directory or another data source. The end result allows the user to search for, say, a local baker and be presented with a map showing their locations and a brief summary of the sort of products they supply.
A mashup, then, is merely a new kind of integrated solution, albeit one using the web that can be built by anyone with the knowledge and access to a browser! Simple, huh?
No. Mashup developers will encounter integration issues well known to the software and database worlds, for which there are still no off-the-shelf solutions.
Infact, things are going to be a little more complicated in the mashup world. For example, unlike traditional integration, the suppliers of the source data that's mashed up are often not involved in the project, and may never have designed their data to be used in that way. This will create problems as systems do not automatically collaborate with each other.
Here, then, is my list of some of the most fundamental issues:
Complexity of architecture: combining multiple technologies, development styles and integration points in a single application does not a simple solution make. Indeed, while it is certainly possible to achieve a functioning system, the end result may well resemble a spaghetti of code using a pallet of interfaces and frameworks, rather than a cleanly engineered solution that's simple to use or to maintain.
Data integration: one fundamental assumption in any mashup is data will be integrated from two or more applications. However, what one system may refer to as a supplier another may call a vendor. What one system considers a number another may assume is a string. Thus integrating data from diverse sources is a very difficult job. For example:
- Semantic meaning: what does the data mean? If a common reference model is available then translations between one data source and another can be made. If, though, the data sources in a mashup are logically and physically disparate, it is therefore unlikely that such a reference model exists. As mashup developers are typically remote from the data source suppliers they must analyse the data based on what they receive and not on how it is produced.
- Data formats: the number of data sources being integrated is the only limit on the number of data formats being used.
- Data quality: the quality of the data supplied may not be consistent. While one source may provide correct information, another may contain erroneous data or data of varying quality. That's where data cleansing would normally come in.
- Data pollution: as the data is provided by external sources it is possible that once they realize this, they may intentionally corrupt or alter the data for their own ends.
- Source feed persistence: a mashup is inherently reliant on the data sources for its basic functionality, so if a data source is terminated or significantly altered this may render the mashup useless or stop it functioning altogether.
Integration with browser functionality: if a mashup uses a browser then, to reach the broadest number of users, it must take into account the variations between different web browsers. And, as we all know, despite a general level of agreement on standards, some browsers are more compliant than others.
Don't get me wrong. Mashups are exciting and dynamic new systems that, since the early days of Google Maps, have rightly generated a lot of excitement.
However, they are not inherently simple or trivial to develop. As such, commercial organizations have sprung up to generously help solve these integration issues. Suddenly, it's looking like the old software consulting and integration business all over again, only using a greater modicum of openness and respect for standards.
Mashup developers, and those proposing mashups, need to be careful what they promise. They also need to give more thought to some traditional integration issues, such as data cleansing, and newer issues like persistence of the source feed.
If exponents of mashups over promise and under deliver, then users of the web will become disillusioned by mashups and avoid using them and building them. Mashups will then become the Betamax of the web generation: a great idea that lost a mass market.®