<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.jasonkolb.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:media="http://search.yahoo.com/mrss/" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>JasonKolb.com</title>
	
	<link>http://www.jasonkolb.com</link>
	<description>The life of a technology entrepreneur.</description>
	<lastBuildDate>Wed, 20 Jul 2011 03:33:23 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.2</generator>
		<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.jasonkolb.com/Jasonkolbcom" /><feedburner:info uri="jasonkolbcom" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><media:copyright>Copyright 2007 Swoosh LLC</media:copyright><media:keywords>technology,future,onlin,identity,enterprise,business</media:keywords><media:category scheme="http://www.itunes.com/dtds/podcast-1.0.dtd">Technology/Tech News</media:category><itunes:owner><itunes:email>jason+PODCAST@jasonkolb.com</itunes:email><itunes:name>Jason Kolb</itunes:name></itunes:owner><itunes:author>Jason Kolb</itunes:author><itunes:explicit>no</itunes:explicit><itunes:keywords>technology,future,onlin,identity,enterprise,business</itunes:keywords><itunes:subtitle>A podcast about current and future developments in technology and business.</itunes:subtitle><itunes:summary>A podcast about current and future developments in technology and business.</itunes:summary><itunes:category text="Technology"><itunes:category text="Tech News" /></itunes:category><geo:lat>41.721886</geo:lat><geo:long>-88.329729</geo:long><feedburner:emailServiceId>Jasonkolbcom</feedburner:emailServiceId><feedburner:feedburnerHostname>http://feedburner.google.com</feedburner:feedburnerHostname><item>
		<title>The Future Evolution of Software</title>
		<link>http://feeds.jasonkolb.com/~r/Jasonkolbcom/~3/Eh0eNSB8eKU/future-evolution-of-software.html</link>
		<comments>http://www.jasonkolb.com/weblog/2011/07/future-evolution-of-software.html#comments</comments>
		<pubDate>Wed, 20 Jul 2011 03:31:38 +0000</pubDate>
		<dc:creator>jason+PODCAST@jasonkolb.com (Jason Kolb)</dc:creator>
				<category><![CDATA[Featured]]></category>
		<category><![CDATA[Ideas]]></category>
		<category><![CDATA[Next-Generation Data]]></category>
		<category><![CDATA[Predictions]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://www.jasonkolb.com/?p=1462</guid>
		<description><![CDATA[I&#8217;ve been thinking lately about where software has gone in the last few years, and where it might go in the next. I tried to identify some major trends and transitions, and I wrote them down as a matter of habit. If you&#8217;re interested, here they are. Completely done Stand-alone to social: Applications are no [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been thinking lately about where software has gone in the last few years, and where it might go in the next. I tried to identify some major trends and transitions, and I wrote them down as a matter of habit. If you&#8217;re interested, here they are.</p>
<h3>Completely done</h3>
<blockquote><li><strong>Stand-alone to social</strong>: Applications are no longer solitary experiences (think Word or Excel). Software no longer serves just a single purpose, it is also a collaboration space around that function.</li>
</blockquote>
<h3>Pretty Much done</h3>
<blockquote><li><strong>Desktop centered to mobile centered</strong>, with tablets as something of a grey area there. I think it&#8217;s a given now that most software targets mobile platforms first and foremost. In fact, I think I&#8217;d jump through hoops to figure out a mobile interface now before resorting to one that requires a fat client. Even a lot of heavy HTML5 mobile sites are too slow, but the devices are catching up.</li>
<li><strong>Premise-based to cloud-based</strong>: Most software has at least a cloud component now. Most mobile apps I install on my phone have their guts hosted in the cloud, even if the UI runs natively on the phone itself. Enterprises still lag here because of security concerns, but I think they will eventually cave. I actually predict a sort of hybrid enterprise, simply because there are so many legacy apps that can&#8217;t be easily converted.</li>
</blockquote>
<h3>Still In Progress</h3>
<blockquote><li><strong>Tailored information</strong>: Software experiences are being tailored to the user&#8217;s tastes and profiles. Pandora does this extremely well, for example. Information feeds are routinely customized using user-provided information such as social graphs, user profiles, pick lists, etc. </li>
<li><strong>Real-time updates</strong>: As large-scale, distributed information systems become accessible to developers (see Twitter), real-time information is becoming a core feature of more and more software. The popularity of native mobile apps makes interface robustness less of an issue, but  in any case <a href="http://news.ycombinator.com/item?id=2779340">Websockets are almost ready for prime-time</a>.</li>
<li><strong>Massive amounts of data</strong>: As the number of devices connected to the Net balloons along with the number of users, the data produced every day grows exponentially faster. That data is slowly be harnessed, and the platforms needed to crunch it&#8211;such as cloud clusters and Big Data engines&#8211;are finally maturing.</li>
</blockquote>
<h3>What&#8217;s Next</h3>
<blockquote><li><strong>All-purpose to specific</strong>: The all purpose development tools of yesterday are slowly being replaced by more complex solutions that provide finer-grained control. For example, relational databases are a kind of all-purpose data storage and lookup tool. NoSQL requires much more attention to detail and planning ahead, as you must finely tune the schema to your application. Relational databases do not require this, the underlying engine can be adapted to whatever it is you need to do. <a href="http://mbostock.github.com/protovis/">Data visualization librarie</a>s are another area where very specialized tools provide greatly improved functionality from the previous all-purpose charting and graphing tools.</li>
<li><strong>Abstract to tactile</strong>: user interaction is shifting away from abstracted devices such as keyboards and mice and toward analogies based on the real world. Gestures, body movement and interactive environments are all artifacts of this. Touch-based hardware is the first wave of this and has been incredibly popular, but I can&#8217;t wait until I have my mobile Kinect camera allowing me to NOT touch my tiny little phone screen.</li>
<li><strong>Mining the wisdom of crowds</strong>: The big data part will finally allow the wisdom of crowds to be harnessed. Or, should I say, the sentiment of crowds&#8211;I have my doubts about the theory of crowds being correct, technically. But the sentiment of crowds is definitely useful&#8211;supposedly people are making bank mining Twitter for stock predictions etc. And&#8230; people do tend to think alike, so it&#8217;s relatively easy to predict how people will react to things using all of this data (no matter how much of a special unique individual I like to think I am, there is a cluster of lots of people who are extremely similar).</li>
<li><strong>Manual user input transitioning to automated information discovery</strong>: Today&#8217;s software requires a lot of user input to divine things like preferences and tastes, this will shift to automated discovery using data provided by public and private information feeds (like Twitter feeds or email histories). Someday, Pandora will be able to create stations for you simply by looking at your feeds.</li>
</blockquote>
<p>And here&#8217;s a bonus I&#8217;ve been thinking about lately:</p>
<blockquote><li><strong>Legal implications of data and data pedigree</strong>: As governments and corporations increasingly use data to automate information discovery which is used to fuel policy and law enforcement decisions, data accuracy and sourcing will be extremely critical. Wrong information could and probably will result in <a href="http://www.newsnet5.com/dpp/money/consumer/troubleshooter/cleveland-man-unable-to-renew-drivers-license-due-to-a-case-of-mistaken-identity">mistaken identities</a>, <a href="http://gizmodo.com/5822580/facial-recognition-screws-with-the-wrong-man">beauracratic nightmares</a>, erroneous arrests, and could very well be a matter of life and death (thinking of drone attacks on suspected terrorist hideouts, for example). Data pedigree and history will become very important. I suspect there will regulations around this requiring some sort of data verification mechanism or some other nonsense.</li>
</blockquote>
<p>Am I wrong about anything? Have I missed anything?</p>
<div class="feedflare">
<a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=Eh0eNSB8eKU:GLBM-D8eMco:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=Eh0eNSB8eKU:GLBM-D8eMco:7Q72WNTAKBA"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?d=7Q72WNTAKBA" border="0"></img></a> <a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=Eh0eNSB8eKU:GLBM-D8eMco:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?i=Eh0eNSB8eKU:GLBM-D8eMco:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=Eh0eNSB8eKU:GLBM-D8eMco:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?i=Eh0eNSB8eKU:GLBM-D8eMco:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=Eh0eNSB8eKU:GLBM-D8eMco:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?d=qj6IDK7rITs" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/Jasonkolbcom/~4/Eh0eNSB8eKU" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.jasonkolb.com/weblog/2011/07/future-evolution-of-software.html/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://www.jasonkolb.com/weblog/2011/07/future-evolution-of-software.html</feedburner:origLink></item>
		<item>
		<title>The Fundamental Shift in BI &amp; Analytics</title>
		<link>http://feeds.jasonkolb.com/~r/Jasonkolbcom/~3/8lCWDEo65JM/the-fundamental-shift-in-bi-analytics.html</link>
		<comments>http://www.jasonkolb.com/weblog/2011/07/the-fundamental-shift-in-bi-analytics.html#comments</comments>
		<pubDate>Tue, 12 Jul 2011 23:06:39 +0000</pubDate>
		<dc:creator>jason+PODCAST@jasonkolb.com (Jason Kolb)</dc:creator>
				<category><![CDATA[Business Intelligence]]></category>
		<category><![CDATA[Featured]]></category>
		<category><![CDATA[Next-Generation Data]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[bi]]></category>
		<category><![CDATA[business intelligence]]></category>
		<category><![CDATA[data mining]]></category>

		<guid isPermaLink="false">http://www.jasonkolb.com/?p=1455</guid>
		<description><![CDATA[I&#8217;m probably biased because I&#8217;m working on this problem, but I think there is a fundamental change coming in business intelligence and analytics. And I think I&#8217;ve only heard one company talk about it: Google. I try to keep an eye on new BI companies and products, and there are some really great new ones [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m probably biased because I&#8217;m working on this problem, but I think there is a fundamental change coming in business intelligence and analytics. And I think I&#8217;ve only heard one company talk about it: Google.</p>
<p>I try to keep an eye on new BI companies and products, and there are some really great new ones out there: <a href="http://www.qlikview.com">QlikView</a> and <a href="http://www.palantir.com">Palantir</a> come to mind. Technology platforms and the like aside, I feel like there&#8217;s one underlying assumption that&#8217;s reflected in these projects that I would consider legacy at this point. Or at least old-school, one that&#8217;s going to have to change to keep up.</p>
<p>That underlying assumption that&#8217;s going to change is that people look for data, instead of data finding people.</p>
<p>Google&#8217;s Eric Schmidt once talked about this, saying: </p>
<blockquote><p>&#8220;I actually think most people don&#8217;t want Google to answer their questions. They want Google to tell them what they should be doing next.&#8221;</p></blockquote>
<p>This is an easy thing to gloss over, but if you think about how this applies to BI &#038; analytics, <u>nobody</u> is doing this right now. You still have to use a tool to find things that are interesting to you, instead of the tool becoming invisible and delivering interesting things to you.</p>
<p>Pandora is the closest thing I&#8217;ve seen to this&#8211;I&#8217;ve read their patent and it&#8217;s very specific to music. Somebody is going to get this right for broader analytics. Soon.</p>
<div class="feedflare">
<a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=8lCWDEo65JM:vkHHXvHDy3Q:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=8lCWDEo65JM:vkHHXvHDy3Q:7Q72WNTAKBA"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?d=7Q72WNTAKBA" border="0"></img></a> <a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=8lCWDEo65JM:vkHHXvHDy3Q:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?i=8lCWDEo65JM:vkHHXvHDy3Q:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=8lCWDEo65JM:vkHHXvHDy3Q:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?i=8lCWDEo65JM:vkHHXvHDy3Q:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=8lCWDEo65JM:vkHHXvHDy3Q:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?d=qj6IDK7rITs" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/Jasonkolbcom/~4/8lCWDEo65JM" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.jasonkolb.com/weblog/2011/07/the-fundamental-shift-in-bi-analytics.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.jasonkolb.com/weblog/2011/07/the-fundamental-shift-in-bi-analytics.html</feedburner:origLink></item>
		<item>
		<title>Data as a Renewable Commodity and Profit Center</title>
		<link>http://feeds.jasonkolb.com/~r/Jasonkolbcom/~3/pdmUomAEe_U/data-as-a-renewable-commodity-and-profit-center.html</link>
		<comments>http://www.jasonkolb.com/weblog/2011/06/data-as-a-renewable-commodity-and-profit-center.html#comments</comments>
		<pubDate>Mon, 06 Jun 2011 20:12:17 +0000</pubDate>
		<dc:creator>jason+PODCAST@jasonkolb.com (Jason Kolb)</dc:creator>
				<category><![CDATA[Featured]]></category>
		<category><![CDATA[Next-Generation Data]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[as a]]></category>
		<category><![CDATA[business]]></category>
		<category><![CDATA[business intelligence]]></category>
		<category><![CDATA[commodity]]></category>
		<category><![CDATA[create value]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[data management]]></category>
		<category><![CDATA[data set]]></category>
		<category><![CDATA[data source]]></category>
		<category><![CDATA[data warehouse]]></category>
		<category><![CDATA[data warehousing]]></category>
		<category><![CDATA[information technology management]]></category>
		<category><![CDATA[management]]></category>
		<category><![CDATA[microsoft]]></category>
		<category><![CDATA[profit]]></category>
		<category><![CDATA[profit center]]></category>
		<category><![CDATA[public data]]></category>
		<category><![CDATA[riper]]></category>

		<guid isPermaLink="false">http://www.jasonkolb.com/?p=1431</guid>
		<description><![CDATA[One of the reasons I love data is because there&#8217;s so much potential for mining real value from it, especially when you combine it with other, new data sources. In fact it acts a lot like a traditional commodity such as copper or wool in that someone produces it, and then someone else buys the [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.jasonkolb.com/wp-content/uploads/2011/06/digital-money-263x300.png" alt="" title="digital-money" width="263" height="300" class="alignright size-medium wp-image-1436" />One of the reasons I love data is because there&#8217;s so much potential for mining real value from it, especially when you combine it with other, new data sources. In fact it acts a lot like a traditional commodity such as copper or wool in that someone produces it, and then someone else buys the raw material and makes something new from it. It&#8217;s unique from traditional commodities, however, in that it doesn&#8217;t get used up at all when it&#8217;s used to create something new&#8211;this makes it particularly interesting from an economic point of view.</p>
<p>In addition, anyone can make it, it doesn&#8217;t get used up, and the industry of using data to create new and valuable things is still so young and ripe for profit-making. In fact I think it&#8217;s one of the areas that America needs to focus on if its economy is to recover because for the most part it&#8217;s still virgin territory and it&#8217;s going to create a lot of economic value. What I really don&#8217;t want to see is foreign companies being the first to capitalize on the data as that would suck most of the value out of our economy, just what we don&#8217;t need right now.</p>
<p>Data as a commodity is also interesting to data producers, which is potentially just about every business with an IT department. It is a new revenue stream that most businesses don&#8217;t even realize they should be considering. Any business that keeps a data warehouse needs to start thinking about if and how it wants to monetize that data. Topics like data anonymization and regulatory restraints need to thought out well in advance (meaning, right now) so that the wheels can get moving. I can tell you that from my experience next to nobody is taking advantage of their data, seeing it as a the backend behind their reports at best and a storage expense worst, instead of a potential revenue center.</p>
<p>In order to monetize data, however, or to mine it for value, you need some type of exchange where buyers and sellers can exchange it. One of the trends I&#8217;ve been keeping an eye on is the development of these marketplaces, and there are now several marketplaces offering data as a paid commodity, most coming online within the past year:	</p>
<blockquote>
<li><a href="http://datamarket.azure.com/">Microsoft Azure DataMarket</a> prices its data by transactions per month. The main advantage it has right now is a normalized OData schema which provides a baked-in integration layer which is a huge value-add for developers. However, it doesn&#8217;t seem to have a very mature API for content publishers to put data into the marketplace, which is unfortunate.</li>
<li><a href="http://www.infochimps.com">InfoChimps</a> seems to have the most interesting data sets at this point (depending on your use, obviously) like data mined from the Twitter firehose such as trust and authority ranking, and some really oddball stuff like <a href="http://blog.programmableweb.com/2011/04/22/ufo-sightings-brought-to-you-by-infochimps/">UFO sighting reports</a>. However it seems to require a different API for each data set which doesn&#8217;t allow you to easily integrate multiple data sets.</li>
<li><a href="http://www.factual.com">Factual</a> has a nice API which allows developers to correct data (although I&#8217;m not sure if the corrections are shared across developers). However, I wasn&#8217;t able to find any API at all for data providers to put data into the marketplace, and again there is no common schema for data meaning that all integration is pushed out to the consumer.</li>
</blockquote>
<p>None of these API&#8217;s appear to support streaming for real-time applications, which is unfortunate, but I&#8217;m hoping that changes as the space matures. The data publisher side really needs some love. Aside from the Microsoft offering they don&#8217;t seem to provide any type of help with integration either, which is definitely going to make it harder on consumers of the data as it puts the onus of correctly integrating the data sets on them.</p>
<p>I haven&#8217;t been able to find any really good examples of game-changing data in these marketplaces yet&#8211;it mostly seems to be cleansed/normalized/mined versions of large public data sets, which is unfortunate (some of the Twitter data sets on InfoChimps being the exception as far as I can tell). It&#8217;s going to be interesting to watch this space to see who ends up with the most data sources, the most data publishers, the easiest data integration, and thus the biggest competitive advantage. It would behoove each of these guys to focus on that, I believe. The hardest problem there is going to be lowering the barrier to participation for all types of businesses and create a streamlined process for adding and vetting data sources. </p>
<p>I believe this is going to be a huge, huge, opportunity and that data marketplaces&#8211;especially if they offer out-of-the-box integration&#8211;are going to make money hand over fist. It&#8217;s also going to be an area ripe for mergers and acquisitions. It&#8217;s still so early here that it&#8217;s like the Wild West, we&#8217;ll see if we get the gold rush again.</p>
<div class="feedflare">
<a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=pdmUomAEe_U:LNsFJ8VhAnE:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=pdmUomAEe_U:LNsFJ8VhAnE:7Q72WNTAKBA"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?d=7Q72WNTAKBA" border="0"></img></a> <a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=pdmUomAEe_U:LNsFJ8VhAnE:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?i=pdmUomAEe_U:LNsFJ8VhAnE:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=pdmUomAEe_U:LNsFJ8VhAnE:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?i=pdmUomAEe_U:LNsFJ8VhAnE:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=pdmUomAEe_U:LNsFJ8VhAnE:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?d=qj6IDK7rITs" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/Jasonkolbcom/~4/pdmUomAEe_U" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.jasonkolb.com/weblog/2011/06/data-as-a-renewable-commodity-and-profit-center.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.jasonkolb.com/weblog/2011/06/data-as-a-renewable-commodity-and-profit-center.html</feedburner:origLink></item>
		<item>
		<title>Data Relationships 2.0</title>
		<link>http://feeds.jasonkolb.com/~r/Jasonkolbcom/~3/LskxGjwJBHs/data-relationships-2-0.html</link>
		<comments>http://www.jasonkolb.com/weblog/2011/06/data-relationships-2-0.html#comments</comments>
		<pubDate>Wed, 01 Jun 2011 23:12:55 +0000</pubDate>
		<dc:creator>jason+PODCAST@jasonkolb.com (Jason Kolb)</dc:creator>
				<category><![CDATA[Business Intelligence]]></category>
		<category><![CDATA[Featured]]></category>
		<category><![CDATA[Next-Generation Data]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[business intelligence]]></category>
		<category><![CDATA[computing]]></category>
		<category><![CDATA[concept]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[data feed]]></category>
		<category><![CDATA[data mining]]></category>
		<category><![CDATA[data pipe]]></category>
		<category><![CDATA[data space]]></category>
		<category><![CDATA[extract]]></category>
		<category><![CDATA[flow]]></category>
		<category><![CDATA[inter process communication]]></category>
		<category><![CDATA[load]]></category>
		<category><![CDATA[pipe]]></category>
		<category><![CDATA[pipeline]]></category>
		<category><![CDATA[pipes]]></category>
		<category><![CDATA[piping products]]></category>
		<category><![CDATA[programming paradigms]]></category>
		<category><![CDATA[relationship]]></category>
		<category><![CDATA[relationships]]></category>
		<category><![CDATA[reporting]]></category>
		<category><![CDATA[similarity]]></category>
		<category><![CDATA[sorry]]></category>
		<category><![CDATA[transform]]></category>

		<guid isPermaLink="false">http://www.jasonkolb.com/?p=1421</guid>
		<description><![CDATA[One of the most under-appreciated tools of the last decade is Yahoo Pipes. If you&#8217;ve never seen it, it allows you to wield together different data feeds like RSS and do some cleansing operations on them, spitting them out as a unified data feed on the back-end. It also makes it so easy that a [...]]]></description>
			<content:encoded><![CDATA[<p>One of the most under-appreciated tools of the last decade is <a href="http://pipes.yahoo.com">Yahoo Pipes</a>. If you&#8217;ve never seen it, it allows you to wield together different data feeds like RSS and do some cleansing operations on them, spitting them out as a unified data feed on the back-end. It also makes it so easy that a non-programmer can easily do it. I will be really sad if they ever shut it down, but I kind of anticipate it given the amount of support it receives.</p>
<p>Anyway, the reason I&#8217;m thinking about Yahoo Pipes is that it&#8217;s kind of a unique and easy way to think about data compared to traditional data. Instead of thinking about data as something you go out and get, it treats data as something that trickles in and is then categorized, put into the right place, and so on when it arrives. This always struck me as a very elegant way to treat data, rather than as the chunks or flat files that are batched, filtered, treated with rules, and so on, and then sent to the next batch process.</p>
<p>But most of all I like the pipe concept&#8211;that data can flow between two points. It&#8217;s a very user-friendly idea and it works well in practice (aside from the tragic bugginess of the Pipes product itself). It&#8217;s an idea I&#8217;ve been exploring a lot lately, and I think two-way pipes are interesting as a concept for &#8220;Relationships 2.0&#8243; (I&#8217;m sorry for the 2.0, please forgive me).</p>
<p><img src="http://www.jasonkolb.com/wp-content/uploads/2011/06/relationship-wormhole-300x225.jpg" alt="A diagram of a wormhole that I&#039;m calling a relationship" title="relationship-wormhole" width="300" height="225" class="alignright size-medium wp-image-1423" />But most of all I like the idea of a pipe as a wormhole, a way for data to be in two places at the same time. This is just a location in data-space that, when you walk through it, you&#8217;re in a completely different place but looking at the same stuff. An odd concept, but think of all of the old-school relationships that covers in one idea. One thing&#8211;this pipe, wormhole, whatever&#8211;and you get from the stuff your&#8217;e looking at here to all of this related stuff over there. In fact, there&#8217;s no difference, you&#8217;re seeing it all at once.</p>
<p>All of this boils down to thinking about relationships between locations in data-space rather than between entities. If an entity is in one <strong>place</strong> then it also in another place. If an entity moves into one point in data-space (and by data-space I mean considering each piece of data&#8211;a field in a traditional database&#8211;as its own dimension, very much like OLAP does) then it pops into existence in the other point as well. Whatever you&#8217;re looking at becomes dynamic because wormholes can pop open at any time and a bunch of new information can tumble through.</p>
<p>So say I&#8217;m looking at something of interest&#8211;a property the bank mispriced, to give you a real-world example. If I want to see more &#8220;like that&#8221;, I&#8217;m going to end up looking at other properties that are similar in some way. Say, for example, it is owned by a particular bank, Countrywide Financial for grins. If that bank is taken over by the FDIC and is acquired by Bank of America, I can do one of two things to make my data right: I can create relationships from all of the properties that were owned by Countrywide to Bank of America, or I can create a pipe&#8211;a wormhole&#8211;from where Countrywide used to be to where Bank of America is. And then if the merger is squashed for some reason I can undo the whole thing by removing the relationship.</p>
<p>If I&#8217;m looking at a mispriced asset that I&#8217;d like to find more similar assets, this is huge. By creating this one thing, a two-way pipe, wormhole, whatever, a whole new world of data is exposed to me. The amount of data that is quite instantly visible from the same place (whether I&#8217;m looking at a report, a dashboard, an app, whatever) has grown instantly. (What&#8217;s cool, too, is that you can expand or contract the wormhole requiring more or less similarity to flow thru it. A geeky way to say that you can easily adjust how similar the stuff is that you&#8217;re looking at.)</p>
<p>This sounds like a pretty abstract thing, but it has me kind of excited because this type of merging and un-merging is what data integration is all about. It&#8217;s a huge, hairy problem, as you know if you&#8217;ve ever been involved in an ETL project, and this seems to solve a lot of that. Now, trying to implement this has not been a walk in the park either, but it&#8217;s definitely a nice challenging programming challenge.</p>
<p><a href="http://www.jasonkolb.com/weblog/2011/05/data-wormholes-and-relationships.html">Some earlier thoughts about data wormholes located here.</a></p>
<div class="feedflare">
<a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=LskxGjwJBHs:bc2PRwl_1tA:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=LskxGjwJBHs:bc2PRwl_1tA:7Q72WNTAKBA"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?d=7Q72WNTAKBA" border="0"></img></a> <a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=LskxGjwJBHs:bc2PRwl_1tA:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?i=LskxGjwJBHs:bc2PRwl_1tA:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=LskxGjwJBHs:bc2PRwl_1tA:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?i=LskxGjwJBHs:bc2PRwl_1tA:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=LskxGjwJBHs:bc2PRwl_1tA:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?d=qj6IDK7rITs" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/Jasonkolbcom/~4/LskxGjwJBHs" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.jasonkolb.com/weblog/2011/06/data-relationships-2-0.html/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://www.jasonkolb.com/weblog/2011/06/data-relationships-2-0.html</feedburner:origLink></item>
		<item>
		<title>A Different Way of Thinking About Data</title>
		<link>http://feeds.jasonkolb.com/~r/Jasonkolbcom/~3/6xlYitjFs7I/a-different-way-of-thinking-about-data.html</link>
		<comments>http://www.jasonkolb.com/weblog/2011/05/a-different-way-of-thinking-about-data.html#comments</comments>
		<pubDate>Tue, 24 May 2011 14:16:56 +0000</pubDate>
		<dc:creator>jason+PODCAST@jasonkolb.com (Jason Kolb)</dc:creator>
				<category><![CDATA[Featured]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://www.jasonkolb.com/?p=1411</guid>
		<description><![CDATA[I think that at some point in the near future the tools we use to consume data is going to change completely and totally from what we&#8217;ve grow accustomed to. They simply won&#8217;t be able to sift through the sheer volume of data that is being generated (which is only increasing). Eric Schmidt recently said [...]]]></description>
			<content:encoded><![CDATA[<p>I think that at some point in the near future the tools we use to consume data is going to change completely and totally from what we&#8217;ve grow accustomed to. They simply won&#8217;t be able to sift through the sheer volume of data that is being generated (which is only increasing). Eric Schmidt <a href="http://techcrunch.com/2010/08/04/schmidt-data/">recently said</a> that <strong>every two days we now create as much information as we did from the dawn of civilization up until 2003.</strong> Nobody&#8217;s ability to consume data has improved since 2003.</p>
<p>After thinking about this problem for a long time, I feel that the concepts themselves that will be used to think about data on the back end will change significantly. Instead of thinking about things like keys and indexes, we&#8217;ll be talking about things like similarity and focus.</p>
<p><strong>Entities</strong>, as defined by &#8220;some data&#8221;, are effectively clusters of data that behave and are treated as one. They can be formed and ripped apart in real-time as data becomes available or is invalidated by better data. Because of this entities are effectively transient. However, instead of disappearing completely their data is simply sucked through a <a href="&lt;a href=">wormhole</a>&#8220;&gt;wormhole to the data location of the entity they&#8217;re merging into. This allows the data to flow back to the original location in the future if needed.<sup class='footnote'><a href='#fn-1411-1' id='fnref-1411-1'>1</a></sup></p>
<p><img class="alignright size-full wp-image-1415" title="150px-Escher_Cube" src="http://www.jasonkolb.com/wp-content/uploads/2011/05/150px-Escher_Cube.png" alt="" width="150" height="154" /><strong>Data Locations</strong> are spots in n-dimensional data space where entities reside. Entities can be in the same location in some dimensions and in other locations in others, just the same as you might be standing in the same exact place Napoleon stood at one point, just at a much different point in the time dimension. Or, take two points on a cube which are at the exact same place in the 2D x,y dimensions but at very different places when viewed in three dimensions.</p>
<p><strong>Similarity</strong>, then, determines how close two pieces of data are to each other, based on the &#8220;important&#8221; data dimensions (as defined by <a href="http://www.jasonkolb.com/weblog/2011/05/the-joys-of-cardinality.html">cardinality</a>). Above a certain similarity threshold they are considered to belong to the same entity, below it they are considered two distinct entities. This is effectively measuring how close the two entities are in &#8220;enough dimensions to matter&#8221;. This is going to be different in each instance and depending on the data available.</p>
<p><strong>Focus</strong>, then, is the act of looking at more or less data. Loosening the filter parameters slightly. If you&#8217;re looking at a point in data space, decreasing the similarity you&#8217;re looking at expands the result set you&#8217;re looking at. Increasing the similarity tightens your focus on a spot, reducing the amount of data returned.</p>
<p>This changes the inputs to all sorts of things. Data integration points, data mining tasks, alerting, all kinds of things are touched by this idea.</p>
<p>I know I&#8217;ve been veering into the deeply geeky here, but I find this stuff fascinating, sorry! I&#8217;ll leave you with one more Schmidt quote from the same talk:</p>
<blockquote><p>“I spend most of my time assuming the world is not ready for the technology revolution that will be happening to them soon”</p></blockquote>
<div class='footnotes'>
<div class='footnotedivider'></div>
<ol>
<li id='fn-1411-1'>This has some interesting implications on graph databases. If there are no solid entities as such you&#8217;re indexing two different locations in n-dimensional space, not entities, which is not the way those databases operate at the moment. <span class='footnotereverse'><a href='#fnref-1411-1'>&#8617;</a></span></li>
</ol>
</div>
<div class="feedflare">
<a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=6xlYitjFs7I:mg7xCuXuRl8:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=6xlYitjFs7I:mg7xCuXuRl8:7Q72WNTAKBA"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?d=7Q72WNTAKBA" border="0"></img></a> <a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=6xlYitjFs7I:mg7xCuXuRl8:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?i=6xlYitjFs7I:mg7xCuXuRl8:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=6xlYitjFs7I:mg7xCuXuRl8:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?i=6xlYitjFs7I:mg7xCuXuRl8:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=6xlYitjFs7I:mg7xCuXuRl8:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?d=qj6IDK7rITs" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/Jasonkolbcom/~4/6xlYitjFs7I" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.jasonkolb.com/weblog/2011/05/a-different-way-of-thinking-about-data.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.jasonkolb.com/weblog/2011/05/a-different-way-of-thinking-about-data.html</feedburner:origLink></item>
		<item>
		<title>Data Wormholes and Relationships</title>
		<link>http://feeds.jasonkolb.com/~r/Jasonkolbcom/~3/dYJoXS-0fiM/data-wormholes-and-relationships.html</link>
		<comments>http://www.jasonkolb.com/weblog/2011/05/data-wormholes-and-relationships.html#comments</comments>
		<pubDate>Fri, 20 May 2011 02:31:40 +0000</pubDate>
		<dc:creator>jason+PODCAST@jasonkolb.com (Jason Kolb)</dc:creator>
				<category><![CDATA[Business Intelligence]]></category>
		<category><![CDATA[Featured]]></category>
		<category><![CDATA[Next-Generation Data]]></category>
		<category><![CDATA[analysis]]></category>
		<category><![CDATA[coordinates]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[entities]]></category>
		<category><![CDATA[relationships]]></category>
		<category><![CDATA[wormholes]]></category>

		<guid isPermaLink="false">http://www.jasonkolb.com/?p=1404</guid>
		<description><![CDATA[This is kind of an obscure topic, but I&#8217;ve been trying to figure out what the best way to count things is, and how to best identify boundaries and relationships between pools of data which may or may not be the same thing. That is, given a bunch of data about something, the boundaries between [...]]]></description>
			<content:encoded><![CDATA[<p>This is kind of an obscure topic, but I&#8217;ve been trying to figure out what the best way to count things is, and how to best identify boundaries and relationships between pools of data which may or may not be the same thing.</p>
<p>That is, <a href="http://www.jasonkolb.com/wp-content/uploads/2011/05/wormhole-diagram.gif"><img src="http://www.jasonkolb.com/wp-content/uploads/2011/05/wormhole-diagram.gif" alt="A wormhole from the physics world" title="wormhole-diagram" width="240" height="180" class="alignright size-full wp-image-1405" /></a> given a bunch of data about something, the boundaries between one thing and another are pretty fuzzy. They might be different things, or they might actually be the same thing that I know different sets of things about. This is essentially the crux of data integration, but approaching it from a slightly different angle.</p>
<p>This might sound kind of abstract and pointless, but I&#8217;m starting to see that it&#8217;s not. And the reason for that is relationships. Creating relationships between two different entitie&#8211;saying that they are the &#8220;same as&#8221; each other for instance&#8211;is not the same as having two different places in space where data can collect and accumulate and saying that those two places are actually the same (what I&#8217;m calling a &#8220;data wormhole&#8221;). In one model the entities are connected, in the other two areas in n-dimensional data is connected via a wormhole, and data is transparently in both places at once. It&#8217;s like a graph database without entities. It&#8217;s useful because you can just store bits of data and they are returned as one or more entities depending on the precision you&#8217;re looking for or how similar you want them to be, and the context is built up around it automatically.</p>
<p>The problem is that this totally destroys the relational database paradigms because all of a sudden there are no keys at all, only n-dimensional coordinates. This in turn destroys the traditional concept of a query so you&#8217;re required to completely reinvent that as well&#8211;instead of saying &#8220;I want all entities related to X&#8221; you&#8217;re saying &#8220;I want everything within N similarity of Y&#8221;, which almost hurts my brain. You also have to index <em>everything</em>, which is not feasible in a traditional relational environment.</p>
<p>I&#8217;m pretty convinced the power this approach brings to the table outweighs the time it takes to rebuild the concept of a query though, and I can&#8217;t think of any other way to accomplish it.</p>
<div class="feedflare">
<a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=dYJoXS-0fiM:stpA3nbxTsw:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=dYJoXS-0fiM:stpA3nbxTsw:7Q72WNTAKBA"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?d=7Q72WNTAKBA" border="0"></img></a> <a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=dYJoXS-0fiM:stpA3nbxTsw:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?i=dYJoXS-0fiM:stpA3nbxTsw:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=dYJoXS-0fiM:stpA3nbxTsw:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?i=dYJoXS-0fiM:stpA3nbxTsw:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=dYJoXS-0fiM:stpA3nbxTsw:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?d=qj6IDK7rITs" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/Jasonkolbcom/~4/dYJoXS-0fiM" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.jasonkolb.com/weblog/2011/05/data-wormholes-and-relationships.html/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		<feedburner:origLink>http://www.jasonkolb.com/weblog/2011/05/data-wormholes-and-relationships.html</feedburner:origLink></item>
		<item>
		<title>Personalization vs. Sharing</title>
		<link>http://feeds.jasonkolb.com/~r/Jasonkolbcom/~3/vRM7KrxX8ak/personalization-vs-sharing.html</link>
		<comments>http://www.jasonkolb.com/weblog/2011/05/personalization-vs-sharing.html#comments</comments>
		<pubDate>Mon, 16 May 2011 05:28:38 +0000</pubDate>
		<dc:creator>jason+PODCAST@jasonkolb.com (Jason Kolb)</dc:creator>
				<category><![CDATA[Featured]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://www.jasonkolb.com/?p=1398</guid>
		<description><![CDATA[I came across this TED talk about &#8220;online filter bubbles&#8221;, which is pretty interesting if you have the time: The gist of it is that personalization algorithms tend to give us more of what we click on the fastest, which can quickly create a myopic view of the world. The presenter&#8217;s solution is for the [...]]]></description>
			<content:encoded><![CDATA[<p>I came across this TED talk about &#8220;online filter bubbles&#8221;, which is pretty interesting if you have the time:</p>
<p><iframe width="560" height="349" src="http://www.youtube.com/embed/B8ofWFx525s" frameborder="0" allowfullscreen></iframe></p>
<p>The gist of it is that personalization algorithms tend to give us more of what we click on the fastest, which can quickly create a myopic view of the world. The presenter&#8217;s solution is for the systems using personalization (Google, Yahoo News, etc) to integrate &#8220;socially responsible&#8221; filtering mechanisms to make sure that everyone gets their daily dose of &#8220;information vegetables&#8221;.</p>
<p>This is backwards.</p>
<p>Personalization is only one half of the spectrum of information consumption. The other half is sharing. If everyone you socialize with online shares what&#8217;s interesting to them, assuming you don&#8217;t surround yourself with clones, you will see content that you otherwise wouldn&#8217;t. Conversely, if you share content that you find interesting, that content will make its way into your social circle regardless of personalization.</p>
<p>Systems like Twitter are the antidote for this problem, not some sort of official &#8220;information food pyramid&#8221;. If you&#8217;re not getting important information it&#8217;s because you&#8217;ve put yourself in an echo chamber.</p>
<div class="feedflare">
<a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=vRM7KrxX8ak:ICpx_RtEovw:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=vRM7KrxX8ak:ICpx_RtEovw:7Q72WNTAKBA"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?d=7Q72WNTAKBA" border="0"></img></a> <a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=vRM7KrxX8ak:ICpx_RtEovw:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?i=vRM7KrxX8ak:ICpx_RtEovw:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=vRM7KrxX8ak:ICpx_RtEovw:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?i=vRM7KrxX8ak:ICpx_RtEovw:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=vRM7KrxX8ak:ICpx_RtEovw:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?d=qj6IDK7rITs" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/Jasonkolbcom/~4/vRM7KrxX8ak" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.jasonkolb.com/weblog/2011/05/personalization-vs-sharing.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.jasonkolb.com/weblog/2011/05/personalization-vs-sharing.html</feedburner:origLink></item>
		<item>
		<title>The Joys of Cardinality</title>
		<link>http://feeds.jasonkolb.com/~r/Jasonkolbcom/~3/leaqbM2I5-E/the-joys-of-cardinality.html</link>
		<comments>http://www.jasonkolb.com/weblog/2011/05/the-joys-of-cardinality.html#comments</comments>
		<pubDate>Sat, 14 May 2011 16:53:29 +0000</pubDate>
		<dc:creator>jason+PODCAST@jasonkolb.com (Jason Kolb)</dc:creator>
				<category><![CDATA[Business Intelligence]]></category>
		<category><![CDATA[Featured]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[analysis]]></category>
		<category><![CDATA[cardinal number]]></category>
		<category><![CDATA[cardinality]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[data management]]></category>
		<category><![CDATA[data stream]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[dimensional data]]></category>
		<category><![CDATA[geographic information system]]></category>
		<category><![CDATA[logic]]></category>
		<category><![CDATA[massive]]></category>
		<category><![CDATA[mathematics]]></category>
		<category><![CDATA[obstacle]]></category>
		<category><![CDATA[ordinal number]]></category>
		<category><![CDATA[original data]]></category>
		<category><![CDATA[real number]]></category>
		<category><![CDATA[real time data]]></category>
		<category><![CDATA[similarity indexing]]></category>
		<category><![CDATA[sql]]></category>
		<category><![CDATA[stream]]></category>
		<category><![CDATA[stream mining]]></category>

		<guid isPermaLink="false">http://www.jasonkolb.com/?p=1379</guid>
		<description><![CDATA[When you start getting into multi-dimensional data analysis, you quickly run into something called the &#8220;Curse of Cardinality&#8221; (so yes, the post title is a geeky play on words). The gist of it is is that the more possible values there are for something, the quicker the index size grows, which means at some point [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.jasonkolb.com/wp-content/uploads/2011/05/PROSTAGLANDIN.png"><img class="alignright size-medium wp-image-1383" title="Data can get pretty complex" src="http://www.jasonkolb.com/wp-content/uploads/2011/05/PROSTAGLANDIN-300x224.png" alt="" width="300" height="224" /></a>When you start getting into multi-dimensional data analysis, you quickly run into something called the &#8220;Curse of Cardinality&#8221; (so yes, the post title is a geeky play on words). The gist of it is is that the more possible values there are for something, the quicker the index size grows, which means at some point you are storing far more data just to index those values than the original data itself. It&#8217;s why n-dimensional databases typically have hierarchies, they&#8217;re breaking down those possible values into digestable chunks.</p>
<p>This means that it takes very little data or storage to index things like sex, state, height, age, etc, because there are relatively few values for those. When you start getting into things that vary more (income, city, phone number, etc), indexing that data becomes prohibitively expensive.</p>
<p>This law, the curse of cardinality, came from computer science. It really has very little to do with the underlying statistics, which don&#8217;t really care about the cardinality much at all. It&#8217;s an equipment-imposed restriction that is holding back all kinds of progress.</p>
<p>In fact, after looking at this problem for a while, I think cardinality is actually a very important piece of information that can be used to change the way data is worked with.</p>
<p>For example, if I want to know what a user finds interesting about an entity, let&#8217;s say a person, there are a lot of options. We know his name, city, state, income, height, hair color and eye color. If the user indicates that she likes this person, which features of the person does actually like? This is a classic classification or recommendation problem.</p>
<p>Cardinality, in this case, is very helpful. If you know that income varies much more than state, you can assign more weight to it as you do your classification.  Knowing that there is more variation in something makes it much more interesting, because that&#8217;s how you find the outliers. It&#8217;s virtually impossible to find outliers in say, the state that someone is living in, because there are only 50 choices. You can look at the unique factors around the thing that you&#8217;re looking at, and find more &#8220;like that&#8221;, leaving out the general, uninteresting stuff. This is the field of &#8220;similarity indexing&#8221;, which consists of a few research papers and whatever code is hidden away in skunkworks projects.<sup class='footnote'><a href='#fn-1379-1' id='fnref-1379-1'>1</a></sup></p>
<p>All that to say that the data that varies the most is usually the most interesting, and the data that you probably actually <em>want</em> to be paying attention to. However, cardinality is typically dealt with as an obstacle to overcome rather than a potential asset. The problem is that the tools available today don&#8217;t actually allow you to use cardinality for anything. Good news is, I think that is going to change. The maturation of some of the Big Data platforms is going to allow the types of solutions that are needed to get past this limiting mindset.</p>
<p>The nascent field of stream mining is doing some very interesting things with real-time stream sampling which allow this type of analysis to be done accurately and in real-time.<sup class='footnote'><a href='#fn-1379-2' id='fnref-1379-2'>2</a></sup> The problems are difficult and some of the solutions are still in the research stage, but those are the types of problems that are also the most interesting.</p>
<p>This kind of analysis, I&#8217;m convinced, is the future of data. It is all about recognizing what makes data interesting to you, and then giving you more of that.</p>
<div class='footnotes'>
<div class='footnotedivider'></div>
<ol>
<li id='fn-1379-1'><a href="http://www.osti.gov/bridge/servlets/purl/964373-iwICev/964373.pdf">Fastbit</a> is an implementation of some bleeding-edge ideas to get around the Curse of Cardinality, but it fails to use it to build a similarity index. Also, the license sucks. <span class='footnotereverse'><a href='#fnref-1379-1'>&#8617;</a></span></li>
<li id='fn-1379-2'>One of the hardest things about working with cardinality is that you can&#8217;t work with the whole data set&#8211;it&#8217;s too big for real-time&#8211;and yet you want statistically valid results. This makes it very tricky to do things like estimate cardinality in real-time, and these are the kids of problems that you have to solve if you actually want software that does this. <span class='footnotereverse'><a href='#fnref-1379-2'>&#8617;</a></span></li>
</ol>
</div>
<div class="feedflare">
<a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=leaqbM2I5-E:YyLnCXwXYsI:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=leaqbM2I5-E:YyLnCXwXYsI:7Q72WNTAKBA"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?d=7Q72WNTAKBA" border="0"></img></a> <a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=leaqbM2I5-E:YyLnCXwXYsI:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?i=leaqbM2I5-E:YyLnCXwXYsI:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=leaqbM2I5-E:YyLnCXwXYsI:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?i=leaqbM2I5-E:YyLnCXwXYsI:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=leaqbM2I5-E:YyLnCXwXYsI:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?d=qj6IDK7rITs" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/Jasonkolbcom/~4/leaqbM2I5-E" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.jasonkolb.com/weblog/2011/05/the-joys-of-cardinality.html/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://www.jasonkolb.com/weblog/2011/05/the-joys-of-cardinality.html</feedburner:origLink></item>
		<item>
		<title>The First 5 Months of Kinect Hacking: Amazing</title>
		<link>http://feeds.jasonkolb.com/~r/Jasonkolbcom/~3/Rb7fxAbBUzE/the-first-5-months-of-kinect-hacking-amazing.html</link>
		<comments>http://www.jasonkolb.com/weblog/2011/05/the-first-5-months-of-kinect-hacking-amazing.html#comments</comments>
		<pubDate>Sat, 07 May 2011 10:03:36 +0000</pubDate>
		<dc:creator>jason+PODCAST@jasonkolb.com (Jason Kolb)</dc:creator>
				<category><![CDATA[Featured]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://www.jasonkolb.net/weblog/2011/05/the-first-5-months-of-kinect-hacking-amazing.html</guid>
		<description><![CDATA[I hadn't been keeping up on the Kinect mods that are coming out, I've been too busy doing far less glamorous backend work lately. But I came across this montage put together by Johnny Lee that compiles some of the...
]]></description>
			<content:encoded><![CDATA[<p>I hadn&#39;t been keeping up on the Kinect mods that are coming out, I&#39;ve been too busy doing far less glamorous backend work lately. But I came across this montage put together by <a href="http://procrastineering.blogspot.com/2011/05/kinect-projects-first-5-months.html" target="_self">Johnny Lee</a> that compiles some of the more interesting ones.</p>
<p>This is truly a glimpse into the future, some of this stuff is mind-blowing.</p>
<p>
<object height="390" style="height: 390px; width: 640px;" width="640"><param name="movie" value="http://www.youtube.com/v/8nlk6HhDpDw?version=3" /><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><embed allowfullscreen="true" allowscriptaccess="always" height="390" src="http://www.youtube.com/v/8nlk6HhDpDw?version=3" type="application/x-shockwave-flash" width="640" /><br />
</object>
</p>
<p>I can&#39;t wait to see what the next 5 months will bring. I am so excited to have access to the SDK now, I&#39;m really looking forward to building some interactive data visualizations with it.</p>
<div class="feedflare">
<a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=Rb7fxAbBUzE:Md4Av1Nfnt0:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=Rb7fxAbBUzE:Md4Av1Nfnt0:7Q72WNTAKBA"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?d=7Q72WNTAKBA" border="0"></img></a> <a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=Rb7fxAbBUzE:Md4Av1Nfnt0:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?i=Rb7fxAbBUzE:Md4Av1Nfnt0:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=Rb7fxAbBUzE:Md4Av1Nfnt0:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?i=Rb7fxAbBUzE:Md4Av1Nfnt0:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=Rb7fxAbBUzE:Md4Av1Nfnt0:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?d=qj6IDK7rITs" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/Jasonkolbcom/~4/Rb7fxAbBUzE" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.jasonkolb.com/weblog/2011/05/the-first-5-months-of-kinect-hacking-amazing.html/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		<media:content url="http://feeds.jasonkolb.com/~r/Jasonkolbcom/~5/1gfiBk7UDsE/8nlk6HhDpDw" fileSize="3034" type="application/x-shockwave-flash" /><itunes:explicit>no</itunes:explicit><itunes:subtitle>I hadn't been keeping up on the Kinect mods that are coming out, I've been too busy doing far less glamorous backend work lately. But I came across this montage put together by Johnny Lee that compiles some of the... </itunes:subtitle><itunes:author>Jason Kolb</itunes:author><itunes:summary>I hadn't been keeping up on the Kinect mods that are coming out, I've been too busy doing far less glamorous backend work lately. But I came across this montage put together by Johnny Lee that compiles some of the... </itunes:summary><itunes:keywords>technology,future,onlin,identity,enterprise,business</itunes:keywords><feedburner:origLink>http://www.jasonkolb.com/weblog/2011/05/the-first-5-months-of-kinect-hacking-amazing.html</feedburner:origLink><enclosure url="http://feeds.jasonkolb.com/~r/Jasonkolbcom/~5/1gfiBk7UDsE/8nlk6HhDpDw" length="3034" type="application/x-shockwave-flash" /><feedburner:origEnclosureLink>http://www.youtube.com/v/8nlk6HhDpDw?version=3</feedburner:origEnclosureLink></item>
		<item>
		<title>Relationship Strength</title>
		<link>http://feeds.jasonkolb.com/~r/Jasonkolbcom/~3/D6QPVKB4nto/relationship-strength.html</link>
		<comments>http://www.jasonkolb.com/weblog/2011/04/relationship-strength.html#comments</comments>
		<pubDate>Mon, 18 Apr 2011 19:59:15 +0000</pubDate>
		<dc:creator>jason+PODCAST@jasonkolb.com (Jason Kolb)</dc:creator>
				<category><![CDATA[Featured]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://www.jasonkolb.net/weblog/2011/04/relationship-strength.html</guid>
		<description><![CDATA[The skunkworks project that I've been working on for the past month or so incorporates the idea of relationships between entities to enable automatic discovery and data recommendations In said project there is a bit of code that watches the...
]]></description>
			<content:encoded><![CDATA[<p>The skunkworks project that I&#39;ve been working on for the past month or so incorporates the idea of relationships between entities to enable automatic discovery and data recommendations</p>
<p>In said project there is a bit of code that watches the database for changes to an entity, and when it sees a change to that entity it automatically re-evaluates it against nearby, similar entities. &#0160;I want to keep a record of the entities that are being compared against one another, so I create a relationship and store it for future reference.</p>
<p>This is pretty standard stuff. It&#39;s basically exactly what a relational database does. However, I ran up against one concept which I haven&#39;t seen much about.</p>
<p>That is, I realized that there is a whole gradient of similarities between entities, from &quot;exactly the same&quot; to &quot;completely different&quot;. If you turn this into a binary value, either related or not, you lose that entire piece of information.</p>
<p>This seems like quite a waste because if you&#39;re looking at entities that are related to a specific one that you&#39;re looking at, it makes sense to prioritize them based on their similarity. And if you&#39;re doing something like, say, using those similar entities to come up with a market price for the entity you&#39;re interested in, you probably want to use something like a weighted average based on how similar the other entities are.</p>
<p>As far as I know there is no way right now to store that relationship strength in existing data systems. Relational databases in particular do nothing for you here, you can either choose related, or not related. RDF-based systems could, I suppose, accomodate this, but what you really want in this case is a &quot;strength&quot; attribute on the &quot;sameAs&quot; property, and I have never encountered such a thing in the wild.</p>
<p>Anyway, I thought it was an interesting and useful idea. Since I&#39;m da boss on my project and it sounded fun I went ahead and spent a day adding it in while I was in that bit of the code, so I&#39;ll let you know how it works out.</p>
<div class="feedflare">
<a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=D6QPVKB4nto:ucTGk0YBQ4o:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=D6QPVKB4nto:ucTGk0YBQ4o:7Q72WNTAKBA"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?d=7Q72WNTAKBA" border="0"></img></a> <a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=D6QPVKB4nto:ucTGk0YBQ4o:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?i=D6QPVKB4nto:ucTGk0YBQ4o:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=D6QPVKB4nto:ucTGk0YBQ4o:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?i=D6QPVKB4nto:ucTGk0YBQ4o:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.jasonkolb.com/~ff/Jasonkolbcom?a=D6QPVKB4nto:ucTGk0YBQ4o:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/Jasonkolbcom?d=qj6IDK7rITs" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/Jasonkolbcom/~4/D6QPVKB4nto" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.jasonkolb.com/weblog/2011/04/relationship-strength.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.jasonkolb.com/weblog/2011/04/relationship-strength.html</feedburner:origLink></item>
	<copyright>Copyright 2007 Swoosh LLC</copyright><media:credit role="author">Jason Kolb</media:credit><media:rating>nonadult</media:rating></channel>
</rss>

