This talk will briefly explain how THRON uses Amazon AWS and will provide a few examples on how leveraging cloud architectures brought successful results, especially regarding managing data analytics to provide real-time insights. Lambda architecture, data pipeline and various AWS services will be described before sharing lessons learnt. Dario will also share his vision regarding the future of developers.
SpazioDati collects public information about all Italian companies from many different sources, the most challenging being the World Wide Web. Our Internet Data Gathering project crawls and processes data from the entire Italian web, using distributed frameworks such as Hadoop, Nutch, Elasticsearch and Spark. This talk will give an overview of the extraction pipeline and present some of the issues we tackled during and after development.