Data integration: one fundamental assumption in any mashup is data will be integrated from two or more applications. However, what one system may refer to as a supplier another may call a vendor. What one system considers a number another may assume is a string. Thus integrating data from diverse sources is a very difficult job. For example:
- Semantic meaning: what does the data mean? If a common reference model is available then translations between one data source and another can be made. If, though, the data sources in a mashup are logically and physically disparate, it is therefore unlikely that such a reference model exists. As mashup developers are typically remote from the data source suppliers they must analyse the data based on what they receive and not on how it is produced.
- Data formats: the number of data sources being integrated is the only limit on the number of data formats being used.
- Data quality: the quality of the data supplied may not be consistent. While one source may provide correct information, another may contain erroneous data or data of varying quality. That's where data cleansing would normally come in.
- Data pollution: as the data is provided by external sources it is possible that once they realize this, they may intentionally corrupt or alter the data for their own ends.
- Source feed persistence: a mashup is inherently reliant on the data sources for its basic functionality, so if a data source is terminated or significantly altered this may render the mashup useless or stop it functioning altogether.
Integration with browser functionality: if a mashup uses a browser then, to reach the broadest number of users, it must take into account the variations between different web browsers. And, as we all know, despite a general level of agreement on standards, some browsers are more compliant than others.
Don't get me wrong. Mashups are exciting and dynamic new systems that, since the early days of Google Maps, have rightly generated a lot of excitement.
However, they are not inherently simple or trivial to develop. As such, commercial organizations have sprung up to generously help solve these integration issues. Suddenly, it's looking like the old software consulting and integration business all over again, only using a greater modicum of openness and respect for standards.
Mashup developers, and those proposing mashups, need to be careful what they promise. They also need to give more thought to some traditional integration issues, such as data cleansing, and newer issues like persistence of the source feed.
If exponents of mashups over promise and under deliver, then users of the web will become disillusioned by mashups and avoid using them and building them. Mashups will then become the Betamax of the web generation: a great idea that lost a mass market.®
If there's no muck, is there any brass?
Another consideration with mash-ups is that they must inherently be free to use. People won't pay for something that's easy to put together - as Joel Spolsky wrote in his recent article "Where there's muck, there's brass" (http://www.joelonsoftware.com/items/2007/12/06.html).
One might argue that being free to use didn't slow Google down, but the technology behind Google search wasn't easy to make. If someone can make something of value with a mash-up, then anyone else can do the same thing.
So far, I haven't seen a mash-up demo that answers the question "who's going to pay for this?"
Where's the money?
This is always the problem with this kind of "rad, kewl" technology. Usually you can't charge for the mashup code itself, and it's only useful because it works by leeching informations from other sites - quite possibly in contravention to their usage licences.
Mashups are a great demo. But not products.
Watch what we do with gonumber.com...
...we hope to alleviate some of these issues in due course. Not yet, but we're working on it. For now, it's just a humble directory with a few bells and whistles to appeal to the small biz - restaurants & bars in particular. Hopefully, we'll get it right with regard to mashups - the backend is based on some robust semantic web concepts, including RDF. Watch this space! (Sorry, not keen on commercial plugs, but this excellent Reg article caught my eye!)