Mad Street Den goes to PyData Delhi 2017

September 5, 2017.MadStreetDen Staff

In the ever growing community of Python, Team Mad Street Den was at the PyData conference held at IIIT Delhi between 2nd – 3rd September 2017. Not only did we attend the event, some of us were also speakers during the action packed weekend. Here is a round up on all that happened at the event, and the topics our team spoke about.

Things to Know While Choosing a Deep Learning library

ML Engineer, Saurabh Agarwal spoke of the different choices one needs to make while choosing a framework for implementing Deep Learning. Given a plethora of choices, one is bound to get confused and a lot of terminology to a beginner ends up sounding like jargon.

Saurabh talked in detail about the important considerations for making a choice, and covered all major libraries including Theano, Caffe, Caffe2, Pytorch, Tensorflow, Chainer, MxNet. Furthermore, he listed out the basic principles on which these libraries were based on and what each of these libraries were good for. The purpose of the talk was to educate a beginner on the subject and evaluate the different types of libraries commonly used, in order to help make informed choices.

For more on this talk, read here.

IoT Meets Serverless

In this world of IoT, the hardware assembly of tiny sensors is just the tip of the iceberg. The real complexity lies in processing the massive amount of data these sensors generate.

Our Engineer Naren, spoke about how one cannot afford to spend most of their time on building and maintaining backend servers to process this huge stream of data. The quicker you bring the data to the presentation tier, he said, the more you can experiment and drive answers to new business questions. This is where one can leverage the agility of serverless architecture models. He spoke about how AWS serverless architectures help in taking care of all the undifferentiated heavy lifting tasks such as clusters and servers management, thereby allowing the developer to focus on assembling the IoT hardware, bringing data into the system, and building significant business insights.

For slides to the talk, go here.

Interestingness of Interestingness Measures

Data mining is the process of discovering previously unknown interesting relationships which can be used to increase customer engagement, boost  sales amongst other things. Our Data Engineer, Simrat Hanspal talked about the fundamental steps in pattern mining of transactional datasets- like, the extraction of frequent and interesting itemsets – a set of entities connected by the frequently occurring relationships between them.

For instance, identifying patterns from housing purchase data such as the correlations between age and income groups, can lead to explainable relationships between the different data points which in turn leads to knowledge discovery. This kind of analysis is often confounded by the presence of spurious correlations and data sparsity especially in e-commerce where much of the traffic is often directed to a small percentage of the catalog.

One interesting example of interesting rule mining that she mentioned in her talk, was the discovery of the strong link between diapers and beers from transactional data of walmart. This association was definitely not intuitive because you would expect customers to buy other baby products along with diapers. It was then understood that these purchases were made by fathers baby sitting on weekends.

For more on this topic, read here.

Real-time Log Analytics Using Probabilistic Data Structures in Redis

There are two ways to solve any problem: Accurately or approximately.

In his talk on Sunday, September 3rd, our Head of Engineering Team, Srini spoke about how accurate data structures has its disadvantages; with too much memory usage and is unscalable for real-time data. The talk focused on how to take advantage of the newly released Redis 4.0 with pluggable modules to build a data pipeline which uses probabilistic data structures to get real-time insights. In addition, he explained the different PDSs(Probabilistic Data Structures) in Redis like HyperLogLog, Bloom Filters, Count Min Sketch, Top-K and how they can be leveraged to optimize the resource utilization in log processing.

For slides, go here and here to watch all the talks from day 2.

© 2017 Mad Street Den Inc. All rights reserved.