Data is called the new oil and data monetisation opportunities quicken imagination. Visible and invisible data streams fuel our economy: vehicle sensors identify road hazards ahead, agricultural machinery diverts poor-quality produce from our tables, pedestrian flow affects smart street light management, and so on. The potential of data awes and excites us. But just like oil, raw data must be ‘refined’. DATA, something relatively abstract, must become a PRODUCT, something very concrete, to deliver value. We will talk about data productisation and monetisation: how companies can create value with and generate revenue from their enterprise data.
When we think about web news as a source of information it seems that there’s nothing new to talk about. Google News et similia are out there for years and everybody can access that huge amount of information, mostly for free. However, if you want to extract real value from news articles, things are getting much more complicated. At SpazioDati we are focused on collecting as much information as possible about all Italian companies from many different sources and news is one of the richest, but at the same time hardest, kinds of sources we are dealing with. So we built Sedano, our news processing pipeline that is able to ingest, clean, deduplicate, annotate, classify and cluster several thousands of news articles per day and make them available to our users. We will talk about the challenges we faced, the solutions we implemented, and the open issues we are currently working on.