correlate Rotating Header Image

Semantic Web Kills Startups…well it could

Came across an article by Dan Zambonini, Is Web 2.0 killing the Semantic Web? The article points out a fundamental difference between Web 2.0 versus the Semantic Web, power of people versus the power of automation respectively. It also comments that the more Web 2.0 proves things easy, the more the semantic web seems complex and furthermore, seemingly impossibly unattainable. The interesting thing is this is precisely the opposite point than what I’ve been thinking lately. So what have I been thinking?

The emergence of the Semantic Web could prove a major disruption and potentially category killer to many of the web 2.0-like, innovative start-ups that have emerged building a bridge for us to the promise of the semantic web that many believe will never arrive. New players that enable data extraction, tight data integrations and shortcut mash-up platforms could potentially all go straight to the dead pool. Think players like Dapper, Fetch, Connotate, JackBe, Kapow and others.

Now some believe the Semantic Web will never happen, that publishers will never find a standard, embrace it and publish in formats where machines based integration can take place. I am beginning to question this view simply because in my view, the pace of work in the semantic arena seems to increasing each day with new pushes like Sparql, RDF and XBRL. In other cases, the semantic web acting as a disruption will be dismissed. And in others, some of the new players in the space will make the case that the existence of a true semantic web will make them stronger and more valuable players in the ecosystem. Any any of these could be correct.

But right now, my sense is if the semantic web takes off, people will begin to publish out their data and content sets in standard formats. And with that, the need for a number of intermediary players that scrape and extract content and then provide the platform layer to manipulate the information will no longer be there. Machines will be deployed to process information from a variety for a variety of data sources depending on their goals. The data will be there to use, manipulate, combine, mash and collate for new applications. This will already see emerging today, we are simply short on data availability but the time will come, it has to.

We used to think that XML would never happen, people liked their hand-coded HTML. Now we have RSS and Atom because people see the value in distributing their information for consumption. And how many players are there in the true RSS intermediary players are there now? Only a few. Could this space follow a similar path? Time will tell but I’m leaning in that direction.

Sphere: Related Content

  • Stefan: welcome. completely agree on your points. data is not data. and discussing it as a simple, clear and single type is both wrong and simplifies the problem being solved to an unrealistic level. your comments about niche and special content are absolutely correct. I will even extend that to archived and previously published content, many players will not go back to their systems to re-play their data even if new semantic-based publishing systems are available. and your addition of time horizon to the conversation is a valid one as well. clearly this is not going to happen overnight, the semantic web has been getting discussed for how many years now? it certainly isn't going to happen overnight.

    -Lou
  • Hi Luo
    I am another one from Kapow Technologies :)
    There is a different way to look at this. Data is not just data, but it serves different purposes, have different size audiences, etc. Also some data is more suited for standard semantic structures, while other data is too niche for that.
    What I try to say is that even if (or when) common data formats are agreed upon and data start to get posted in these formats, there will still be a waste amount of information that simply is to special, to niche, to fit in.
    The web is becoming an increasingly important source of intelligence, and it is often the niche data which makes the difference in using a number of data sources to draw a conclusion. For example in a mashup.
    The bottom line is that there will be a growing (and not decreasing need) to access web data (or as we say web intelligence), and I believe the growth for this, will for a long time (years)outgrow the pace in which we see data also posted in standard semantic formats.
    Stefan
  • Kash: Thanks for the reply and joining the conversation. I couldn't agree more that technologies like Kapow are being deployed to expose semantics and if I may say in a quite impressive way. I think the stuff Kapow has built and demonstrated to the market is extremely powerful. And I do not question the traction you and similar firms are getting because put simply, you are creating value. There is untapped value in the information on the web and it is extremely challenging to get to it. In fact, that is part of the reason I wrote this post. The fact is that some really viable initiatives will meet some real market challenges if the true promise of the semantic web takes off and people begin publishing their sites to semantic web standards such as RDF. That being said, you are welcome to question my assumptions, that is what makes the conversation. So let me comment back...

    It all depends on the value creation and whether that value can be dis-intermediated. And my belief is that the companies that do screen-scraping data gathering type approaches because the web is currently not published using these standards will have problems. Your last point acknowledges that a large share of your business is to help those NOT publishing with semantic implementation need help implementing a semantic layer where non existed. That is precisely the business I see disappearing because soon many will wake up to this fact and start publishing to standard and thus not need that additional layer from external services. In fact it will be part of publishing system frameworks. Do we honestly believe that a company such as Wordpress may not have an outside developer publish a plug-in in the future to help with this if not do it themselves in a later release if it makes sense?

    My belief and assumption is that it will be in companies best interests to publish to these standards and I do not believe it violates any economic premise. Frankly, the economics are simple. People will publish their sites in ways where others can utilize that data if that benefits the publisher. And I think that will be the case. I go back to to RSS. I remember the emergence of RSS quite clearly. It first met the debate that publishers would not publish feeds of their content, content was king and if you wanted to read the content, you needed to come to their site to do it. And that went on for some time but that battle is pretty much over, almost everyone has an RSS feed and many have going as far as to move to full-text RSS feeds, putting the content where their users need it. The premise extends here. There will those that say, if you want my information or data, I'm not going to put it in a standard where you can freely grab it, you want it, you need to come to my site to consume it. My premise is that isn't going to hold, just like it didn't with RSS.

    You bring up some very good points with regard to the API world. That is probably a whole new realm of conversation and perhaps there is a post in my future to discuss my thoughts on it. High level, I envision a shift happening there as well because players need to have market power to demand that you use an API. If they do not, other players will enter that can create value without this restriction and others will have to respond in kind. If the web has proven one thing it is that a track record of open systems is gaining momentum. Facebook, a company with seemingly growing market power, is facing this every day.

    In summary, I think there is some real viable businesses in this area as long as they are not predicated on the fact that sites will not publish to semantic standards on their own. My sense is they will. But is there business in helping them do so? Yes, from the extraction and publishing side. Where there is going to be an emerging business is in data portability, data access and data protection. Who can access the data, how and under what terms? All interesting questions and business opportunities.
  • Lui I read with great interest your article. First let me state for the record that I am a Kapow employee. I completely agree with you in terms of the movement we are seeing on the semantic web front. I also agree with the point you make about the semantic web being a disruptive technology that will create waves across the web. However, I disagree with some of your conclusions because I believe some of your assumptions are flawed. Technologies like Kapow are actively being used to expose semantics in large volumes of web accessible data. These baseline technologies have been engineered to help with restructuring the largely unstructured web.

    In addition your assumption that what today are large unstructured repositories will become available as well structured data feeds is in my opinion incorrect, because it ignores economics. For example, on the surface when you look at companies like Yelp, Facebook and Kayak who are exposing their data as API’s, it would be natural to reach the same conclusions that you state. However, looking deeper you will note that their API’s do not implement access mechanisms to large portions of their databases. They also implement control mechanisms that force the API user to conform to the access policies they implement. This basically makes many of these API’s very difficult to use for large scale applications and for rapid access in a machine to machine fashion. In addition these API’s don’t implement a semantic layer for dynamic machine to machine discovery and use. Using technology like Kapow one can both enhance the API with RDF and other extensions but in addition use the site itself by extracting data and restructuring it as RDF or other semantic mechanisms.

    Lastly the large volume of existing applications and content which continues to expand at an accelerated rate continues to move forward with little or no semantic implementations. This actually enhances the Kapow value proposition from the stand point that Kapow can be used to restructure and implement a semantic layer where non-existed. As an aside we have been asked by various Intelligence organizations dealing with semantics to enhance our products with semantics because they see Kapow as a huge advantage to moving data into semantic repositories.
blog comments powered by Disqus