Data Science at Home

By Francesco Gadaleta

Listen to a podcast, please open Podcast Republic app. Available on Google Play Store.


Category: Podcasting

Open in iTunes


Open RSS feed


Open Website


Rate for this podcast

Subscribers: 207
Reviews: 0

Description

Technology, machine learning and algorithms

Episode Date
[RB] Replicating GPT-2, the most dangerous NLP model (with Aaron Gokaslan)
37:47

In this episode, I am with Aaron Gokaslan, computer vision researcher, AI Resident at Facebook AI Research. Aaron is the author of OpenGPT-2, a parallel NLP model to the most discussed version that OpenAI decided not to release because too accurate to be published.

We discuss about image-to-image translation, the dangers of the GPT-2 model and the future of AI.
Moreover, 
Aaron provides some very interesting links and demos that will blow your mind!

Enjoy the show! 

References

Multimodal image to image translation (not all mentioned in the podcast but recommended by Aaron)

Pix2Pix: 
 
CycleGAN:
 
GANimorph
 
Oct 18, 2019
What is wrong with reinforcement learning?
21:48

After reinforcement learning agents doing great at playing Atari video games, Alpha Go, doing financial trading, dealing with language modeling, let me tell you the real story here.
In this episode I want to shine some light on reinforcement learning (RL) and the limitations that every practitioner should consider before taking certain directions. RL seems to work so well! What is wrong with it?

 

Are you a listener of Data Science at Home podcast?
A reader of the Amethix Blog? 
Or did you subscribe to the Artificial Intelligence at your fingertips newsletter?
In any case let’s stay in touch! 
https://amethix.com/survey/

 

 

References

Oct 15, 2019
Have you met Shannon? Conversation with Jimmy Soni and Rob Goodman about one of the greatest minds in history
32:21

In this episode I have an amazing conversation with Jimmy Soni and Rob Goodman, authors of “A mind at play”, a book entirely dedicated to the life and achievements of Claude Shannon. Claude Shannon does not need any introduction. But for those who need a refresh, Shannon is the inventor of the information age

Have you heard of binary code, entropy in information theory, data compression theory (the stuff behind mp3, mpg, zip, etc.), error correcting codes (the stuff that makes your RAM work well), n-grams, block ciphers, the beta distribution, the uncertainty coefficient?

All that stuff has been invented by Claude Shannon :) 

 
Articles: 
 
Claude's papers:
 
A mind at play (book links): 
Oct 10, 2019
Attacking machine learning for fun and profit (conversation with the authors of SecML)
34:04

As ML plays a more and more relevant role in many domains of everyday life, it’s quite obvious to see more and more attacks to ML systems. In this episode we talk about the most popular attacks against machine learning systems and some mitigations designed by researchers Ambra Demontis and Marco Melis, from the University of Cagliari (Italy). The guests are also the authors of SecML, an open-source Python library for the security evaluation of Machine Learning (ML) algorithms. Both Ambra and Marco are members of research group PRAlab, under the supervision of Prof. Fabio Roli.
 

SecML Contributors

Marco Melis (Ph.D Student, Project Maintainer, https://www.linkedin.com/in/melismarco/)
Ambra Demontis (Postdoc, https://pralab.diee.unica.it/it/AmbraDemontis) 
Maura Pintor (Ph.D Student, https://it.linkedin.com/in/maura-pintor)
Battista Biggio (Assistant Professor, https://pralab.diee.unica.it/it/BattistaBiggio)

References

SecML: an open-source Python library for the security evaluation of Machine Learning (ML) algorithms https://secml.gitlab.io/.

Demontis et al., “Why Do Adversarial Attacks Transfer? Explaining Transferability of Evasion and Poisoning Attacks,” presented at the 28th USENIX Security Symposium (USENIX Security 19), 2019, pp. 321–338. https://www.usenix.org/conference/usenixsecurity19/presentation/demontis

W. Koh and P. Liang, “Understanding Black-box Predictions via Influence Functions,” in International Conference on Machine Learning (ICML), 2017. https://arxiv.org/abs/1703.04730

Melis, A. Demontis, B. Biggio, G. Brown, G. Fumera, and F. Roli, “Is Deep Learning Safe for Robot Vision? Adversarial Examples Against the iCub Humanoid,” in 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), 2017, pp. 751–759. https://arxiv.org/abs/1708.06939

Biggio and F. Roli, “Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning,” Pattern Recognition, vol. 84, pp. 317–331, 2018. https://arxiv.org/abs/1712.03141

Biggio et al., “Evasion attacks against machine learning at test time,” in Machine Learning and Knowledge Discovery in Databases (ECML PKDD), Part III, 2013, vol. 8190, pp. 387–402. https://arxiv.org/abs/1708.06131

Biggio, B. Nelson, and P. Laskov, “Poisoning attacks against support vector machines,” in 29th Int’l Conf. on Machine Learning, 2012, pp. 1807–1814. https://arxiv.org/abs/1206.6389

Dalvi, P. Domingos, Mausam, S. Sanghai, and D. Verma, “Adversarial classification,” in Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Seattle, 2004, pp. 99–108. https://dl.acm.org/citation.cfm?id=1014066

Sundararajan, Mukund, Ankur Taly, and Qiqi Yan. "Axiomatic attribution for deep networks." Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017. https://arxiv.org/abs/1703.01365 

Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. "Model-agnostic interpretability of machine learning." arXiv preprint arXiv:1606.05386 (2016). https://arxiv.org/abs/1606.05386

Guo, Wenbo, et al. "Lemna: Explaining deep learning based security applications." Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. ACM, 2018. https://dl.acm.org/citation.cfm?id=3243792

Bach, Sebastian, et al. "On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation." PloS one 10.7 (2015): E0130140. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0130140 

Oct 01, 2019
[RB] How to scale AI in your organisation
13:21

Scaling technology and business processes are not equal. Since the beginning of the enterprise technology, scaling software has been a difficult task to get right inside large organisations. When it comes to Artificial Intelligence and Machine Learning, it becomes vastly more complicated. 

In this episode I propose a framework - in five pillars - for the business side of artificial intelligence.

 

Sep 26, 2019
Replicating GPT-2, the most dangerous NLP model (with Aaron Gokaslan)
37:47

In this episode, I am with Aaron Gokaslan, computer vision researcher, AI Resident at Facebook AI Research. Aaron is the author of OpenGPT-2, a parallel NLP model to the most discussed version that OpenAI decided not to release because too accurate to be published.

We discuss about image-to-image translation, the dangers of the GPT-2 model and the future of AI.
Moreover, 
Aaron provides some very interesting links and demos that will blow your mind!

Enjoy the show! 

References

Multimodal image to image translation (not all mentioned in the podcast but recommended by Aaron)

Pix2Pix: 
 
CycleGAN:
 
GANimorph
 
Sep 23, 2019
Training neural networks faster without GPU [RB]
22:21

Training neural networks faster usually involves the usage of powerful GPUs. In this episode I explain an interesting method from a group of researchers from Google Brain, who can train neural networks faster by squeezing the hardware to their needs and making the training pipeline more dense.

Enjoy the show!

 

References

Faster Neural Network Training with Data Echoing
https://arxiv.org/abs/1907.05550

Sep 17, 2019
How to generate very large images with GANs
14:41

In this episode I explain how a research group from the University of Lubeck dominated the curse of dimensionality for the generation of large medical images with GANs.
The problem is not as trivial as it seems. Many researchers have failed in generating large images with GANs before. One interesting application of such approach is in medicine for the generation of CT and X-ray images.
Enjoy the show!

 

References

Multi-scale GANs for Memory-efficient Generation of High Resolution Medical Images https://arxiv.org/abs/1907.01376

Sep 06, 2019
[RB] Complex video analysis made easy with Videoflow
30:42

In this episode I am with Jadiel de Armas, senior software engineer at Disney and author of Videflow, a Python framework that facilitates the quick development of complex video analysis applications and other series-processing based applications in a multiprocessing environment. 

I have inspected the videoflow repo on Github and some of the capabilities of this framework and I must say that it’s really interesting. Jadiel is going to tell us a lot more than what you can read from Github 

 

References

Videflow Github official repository
https://github.com/videoflow/videoflow

 

Aug 29, 2019
[RB] Validate neural networks without data with Dr. Charles Martin
44:46

In this episode, I am with Dr. Charles Martin from Calculation Consulting a machine learning and data science consulting company based in San Francisco. We speak about the nuts and bolts of deep neural networks and some impressive findings about the way they work. 

The questions that Charles answers in the show are essentially two:

  1. Why is regularisation in deep learning seemingly quite different than regularisation in other areas on ML?

  2. How can we dominate DNN in a theoretically principled way?

 

References 

Aug 27, 2019
How to cluster tabular data with Markov Clustering
20:43

In this episode I explain how a community detection algorithm known as Markov clustering can be constructed by combining simple concepts like random walks, graphs, similarity matrix. Moreover, I highlight how one can build a similarity graph and then run a community detection algorithm on such graph to find clusters in tabular data.

You can find a simple hands-on code snippet to play with on the Amethix Blog 

Enjoy the show! 

 

References

[1] S. Fortunato, “Community detection in graphs”, Physics Reports, volume 486, issues 3-5, pages 75-174, February 2010.

[2] Z. Yang, et al., “A Comparative Analysis of Community Detection Algorithms on Artificial Networks”, Scientific Reports volume 6, Article number: 30750 (2016)

[3] S. Dongen, “A cluster algorithm for graphs”, Technical Report, CWI (Centre for Mathematics and Computer Science) Amsterdam, The Netherlands, 2000.

[4] A. J. Enright, et al., “An efficient algorithm for large-scale detection of protein families”, Nucleic Acids Research, volume 30, issue 7, pages 1575-1584, 2002.

Aug 21, 2019
Waterfall or Agile? The best methodology for AI and machine learning
14:26

The two most widely considered software development models in modern project management are, without any doubt, the Waterfall Methodology and the Agile Methodology. In this episode I make a comparison between the two and explain what I believe is the best choice for your machine learning project.

An interesting post to read (mentioned in the episode) is How businesses can scale Artificial Intelligence & Machine Learning https://amethix.com/how-businesses-can-scale-artificial-intelligence-machine-learning/

Aug 14, 2019
Training neural networks faster without GPU
22:21

Training neural networks faster usually involves the usage of powerful GPUs. In this episode I explain an interesting method from a group of researchers from Google Brain, who can train neural networks faster by squeezing the hardware to their needs and making the training pipeline more dense.

Enjoy the show!

 

References

Faster Neural Network Training with Data Echoing
https://arxiv.org/abs/1907.05550

Aug 06, 2019
Validate neural networks without data with Dr. Charles Martin
44:46

In this episode, I am with Dr. Charles Martin from Calculation Consulting a machine learning and data science consulting company based in San Francisco. We speak about the nuts and bolts of deep neural networks and some impressive findings about the way they work. 

The questions that Charles answers in the show are essentially two:

  1. Why is regularisation in deep learning seemingly quite different than regularisation in other areas on ML?

  2. How can we dominate DNN in a theoretically principled way?

 

References 

 

 
Jul 23, 2019
Episode 69: Complex video analysis made easy with Videoflow
30:42

In this episode I am with Jadiel de Armas, senior software engineer at Disney and author of Videflow, a Python framework that facilitates the quick development of complex video analysis applications and other series-processing based applications in a multiprocessing environment. 

I have inspected the videoflow repo on Github and some of the capabilities of this framework and I must say that it’s really interesting. Jadiel is going to tell us a lot more than what you can read from Github 

 

References

Videflow Github official repository
https://github.com/videoflow/videoflow

 

Jul 16, 2019
Episode 68: AI and the future of banking with Chris Skinner [RB]
41:42

In this episode I have a wonderful conversation with Chris Skinner.

Chris and I recently got in touch at The banking scene 2019, fintech conference recently held in Brussels. During that conference he talked as a real trouble maker - that’s how he defines himself - saying that “People are not educated with loans, credit, money” and that “Banks are failing at digital”.

After I got my hands on his last book Digital Human, I invited him to the show to ask him a few questions about innovation, regulation and technology in finance.

Jul 09, 2019
Episode 67: Classic Computer Science Problems in Python
28:35

Today I am with David Kopec, author of Classic Computer Science Problems in Python, published by Manning Publications.

His book deepens your knowledge of problem solving techniques from the realm of computer science by challenging you with interesting and realistic scenarios, exercises, and of course algorithms.
There are examples in the major topics any data scientist should be familiar with, for example search, clustering, graphs, and much more.

Get the book from https://www.manning.com/books/classic-computer-science-problems-in-python and use coupon code poddatascienceathome19 to get 40% discount.

 

References

Twitter https://twitter.com/davekopec

GitHub https://github.com/davecom

classicproblems.com

Jul 02, 2019
Episode 66: More intelligent machines with self-supervised learning
18:56

In this episode I talk about a new paradigm of learning, which can be found a bit blurry and not really different from the other methods we know of, such as supervised and unsupervised learning. The method I introduce here is called self-supervised learning.

Enjoy the show!

 

Don't forget to subscribe to our Newsletter at amethix.com and get the latest updates in AI and machine learning. We do not spam. Promise!

 

References

Deep Clustering for Unsupervised Learning of Visual Features

Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey

 

Jun 25, 2019
Episode 65: AI knows biology. Or does it?
12:14

The successes of deep learning for text analytics, also introduced in a recent post about sentiment analysis and published here are undeniable. Many other tasks in NLP have also benefitted from the superiority of deep learning methods over more traditional approaches. Such extraordinary results have also been possible due to the neural network approach to learn meaningful character and word embeddings, that is the representation space in which semantically similar objects are mapped to nearby vectors.
All this is strictly related to a field one might initially find disconnected or off-topic: biology.

 


Don't forget to subscribe to our Newsletter at amethix.com and get the latest updates in AI and machine learning. We do not spam. Promise!


 

References

[1] Rives A., et al., “Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences”, biorxiv, doi: https://doi.org/10.1101/622803

[2] Vaswani A., et al., “Attention is all you need”, Advances in neural information processing systems, pp. 5998–6008, 2017.

[3] Bahdanau D., et al., “Neural machine translation by jointly learning to align and translate”, arXiv, http://arxiv.org/abs/1409.0473.

Jun 23, 2019
Episode 64: Get the best shot at NLP sentiment analysis
12:58

The rapid diffusion of social media like Facebook and Twitter, and the massive use of different types of forums like Reddit, Quora, etc., is producing an impressive amount of text data every day. 

There is one specific activity that many business owners have been contemplating over the last five years, that is identifying the social sentiment of their brand, by analysing the conversations of their users.

In this episode I explain how one can get the best shot at classifying sentences with deep learning and word embedding.

 

 

Additional material

Schematic representation of how to learn a word embedding matrix E by training a neural network that, given the previous M words, predicts the next word in a sentence. 

 

word2vec_training.png?w=702&ssl=1

 

 

Word2Vec example source code

https://gist.github.com/rlangone/ded90673f65e932fd14ae53a26e89eee#file-word2vec_example-py

 

 

References

[1] Mikolov, T. et al., "Distributed Representations of Words and Phrases and their Compositionality", Advances in Neural Information Processing Systems 26, pages 3111-3119, 2013.

[2] The Best Embedding Method for Sentiment Classification, https://medium.com/@bramblexu/blog-md-34c5d082a8c5

[3] The state of sentiment analysis: word, sub-word and character embedding 
https://amethix.com/state-of-sentiment-analysis-embedding/

 

Jun 14, 2019
Episode 63: Financial time series and machine learning
21:08

In this episode I speak to Alexandr Honchar, data scientist and owner of blog https://medium.com/@alexrachnog
Alexandr has written very interesting posts about time series analysis for financial data. His blog is in my personal list of best tutorial blogs. 

We discuss about financial time series and machine learning, what makes predicting the price of stocks a very challenging task and why machine learning might not be enough.
As usual, I ask Alexandr how he sees machine learning in the next 10 years. His answer - in my opinion quite futuristic - makes perfect sense. 

You can contact Alexandr on

 

Enjoy the show!

 

Jun 04, 2019
Episode 62: AI and the future of banking with Chris Skinner
42:03

In this episode I have a wonderful conversation with Chris Skinner.

Chris and I recently got in touch at The banking scene 2019, fintech conference recently held in Brussels. During that conference he talked as a real trouble maker - that’s how he defines himself - saying that “People are not educated with loans, credit, money” and that “Banks are failing at digital”.

After I got my hands on his last book Digital Human, I invited him to the show to ask him a few questions about innovation, regulation and technology in finance.

May 28, 2019
Episode 61: The 4 best use cases of entropy in machine learning
21:35

It all starts from physics. The entropy of an isolated system never decreases… Everyone at school, at some point of his life, learned this in his physics class. What does this have to do with machine learning?
To find out, listen to the show.

 

References

Entropy in machine learning 
https://amethix.com/entropy-in-machine-learning/

May 21, 2019
Episode 60: Predicting your mouse click (and a crash course in deeplearning)
39:50

Deep learning is the future. Get a crash course on deep learning. Now!
In this episode I speak to Oliver Zeigermann, author of Deep Learning Crash Course published by Manning Publications at https://www.manning.com/livevideo/deep-learning-crash-course

Oliver (Twitter: @DJCordhose) is a veteran of neural networks and machine learning. In addition to the course - that teaches you concepts from prototype to production - he's working on a really cool project that predicts something people do every day... clicking their mouse. 

If you use promo code poddatascienceathome19 you get a 40% discount for all products on the Manning platform

Enjoy the show!

 

References:

 

Deep Learning Crash Course (Manning Publications)

https://www.manning.com/livevideo/deep-learning-crash-course?a_aid=djcordhose&a_bid=e8e77cbf

 

Companion notebooks for the code samples of the video course "Deep Learning Crash Course"

https://github.com/DJCordhose/deep-learning-crash-course-notebooks/blob/master/README.md

 

Next-button-to-click predictor source code

https://github.com/DJCordhose/ux-by-tfjs

 

May 16, 2019
Episode 59: How to fool a smart camera with deep learning
24:11

In this episode I met three crazy researchers from KULeuven (Belgium) who found a method to fool surveillance cameras and stay hidden just by holding a special t-shirt. 
We discussed about the technique they used and some consequences of their findings.

They published their paper on Arxiv and made their source code available at https://gitlab.com/EAVISE/adversarial-yolo

Enjoy the show!

 

References

Fooling automated surveillance cameras: adversarial patches to attack person detection 
Simen ThysWiebe Van RanstToon Goedemé

 

Eavise Research Group KULeuven (Belgium)
https://iiw.kuleuven.be/onderzoek/eavise

May 07, 2019
Episode 58: There is physics in deep learning!
19:55

There is a connection between gradient descent based optimizers and the dynamics of damped harmonic oscillators. What does that mean? We now have a better theory for optimization algorithms.
In this episode I explain how all this works.

All the formulas I mention in the episode can be found in the post The physics of optimization algorithms

Enjoy the show.

 

Apr 30, 2019
Episode 57: Neural networks with infinite layers
16:19

How are differential equations related to neural networks? What are the benefits of re-thinking neural network as a differential equation engine? In this episode we explain all this and we provide some material that is worth learning. Enjoy the show!

 

Residual Block

Residual block

 

 

References

[1] K. He, et al., “Deep Residual Learning for Image Recognition”, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770-778, 2016

[2] S. Hochreiter, et al., “Long short-term memory”, Neural Computation 9(8), pages 1735-1780, 1997.

[3] Q. Liao, et al.,”Bridging the gaps between residual learning, recurrent neural networks and visual cortex”, arXiv preprint, arXiv:1604.03640, 2016.

[4] Y. Lu, et al., “Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equation”, Proceedings of the 35th International Conference on Machine Learning (ICML), Stockholm, Sweden, 2018.

[5] T. Q. Chen, et al., ” Neural Ordinary Differential Equations”, Advances in Neural Information Processing Systems 31, pages 6571-6583}, 2018

Apr 23, 2019
Episode 56: The graph network
16:34

Since the beginning of AI in the 1950s and until the 1980s, symbolic AI approaches have dominated the field. These approaches, also known as expert systems, used mathematical symbols to represent objects and the relationship between them, in order to depict the extensive knowledge bases built by humans.
The opposite of the symbolic AI paradigm is named connectionism, which is behind the machine learning approaches of today

Apr 16, 2019
Episode 55: Beyond deep learning
17:23

The successes that deep learning systems have achieved in the last decade in all kinds of domains are unquestionable. Self-driving cars, skin cancer diagnostics, movie and song recommendations, language translation, automatic video surveillance, digital assistants represent just a few examples of the ongoing revolution that affects or is going to disrupt soon our everyday life.
But all that glitters is not gold…

Read the full post on the Amethix Technologies blog

Apr 09, 2019
Episode 54: Reproducible machine learning
11:50

In this episode I speak about how important reproducible machine learning pipelines are.
When you are collaborating with diverse teams, several tasks will be distributed among different individuals. Everyone will have good reasons to change parts of your pipeline, leading to confusion and definitely a number of options that soon explode.
In all those cases, tracking data and code is extremely helpful to build models that are reproducible anytime, anywhere.
Listen to the podcast and learn how.

 

Mar 09, 2019
Episode 53: Estimating uncertainty with neural networks
15:08

Have you ever wanted to get an estimate of the uncertainty of your neural network? Clearly Bayesian modelling provides a solid framework to estimate uncertainty by design. However, there are many realistic cases in which Bayesian sampling is not really an option and ensemble models can play a role.

In this episode I describe a simple yet effective way to estimate uncertainty, without changing your neural network’s architecture nor your machine learning pipeline at all.

The post with mathematical background and sample source code is published here.

Jan 23, 2019
Episode 52: why do machine learning models fail? [RB]
15:58

The success of a machine learning model depends on several factors and events. True generalization to data that the model has never seen before is more a chimera than a reality. But under specific conditions a well trained machine learning model can generalize well and perform with testing accuracy that is similar to the one performed during training.

In this episode I explain when and why machine learning models fail from training to testing datasets.

Jan 17, 2019
Episode 51: Decentralized machine learning in the data marketplace (part 2)
23:08

In this episode I am completing the explanation about the integration fitchain-oceanprotocol that allows secure on-premise compute to operate in the decentralized data marketplace designed by Ocean Protocol.

As mentioned in the show, this is a picture that provides a 10000-feet view of the integration.

 SEA-ocean-fitchain.png

 

I hope you enjoy the show!

Jan 08, 2019
Episode 50: Decentralized machine learning in the data marketplace
24:17

In this episode I briefly explain how two massive technologies have been merged in 2018 (work in progress :) - one providing secure machine learning on isolated data, the other implementing a decentralized data marketplace.

In this episode I explain:

  • How do we make machine learning decentralized and secure?
  • How can data owners keep their data private?
  • How can we benefit from blockchain technology for AI and machine learning?

 

I hope you enjoy the show!

 

References

fitchain.io decentralized machine learnin

Ocean protocol decentralized data marketplace

Dec 26, 2018
Episode 49: The promises of Artificial Intelligence
21:00

It's always good to put in perspective all the findings in AI, in order to clear some of the most common misunderstandings and promises.
In this episode I make a list of some of the most misleading statements about what artificial intelligence can achieve in the near future.

Dec 19, 2018
Episode 48: Coffee, Machine Learning and Blockchain
28:48

In this episode - which I advise to consume at night, in a quite place - I speak about private machine learning and blockchain, while I sip a cup of coffee in my home office.
There are several reasons why I believe we should start thinking about private machine learning...
It doesn't really matter what approach becomes successful and gets adopted, as long as it makes private machine learning possible. If people own their data, they should also own the by-product of such data.

Decentralized machine learning makes this scenario possible.

Oct 21, 2018
Episode 47: Are you ready for AI winter? [Rebroadcast]
56:55

Today I am having a conversation with Filip Piękniewski, researcher working on computer vision and AI at Koh Young Research America.
His adventure with AI started in the 90s and since then a long list of experiences at the intersection of computer science and physics, led him to the conclusion that deep learning might not be sufficient nor appropriate to solve the problem of intelligence, specifically artificial intelligence.  
I read some of his publications and got familiar with some of his ideas. Honestly, I have been attracted by the fact that Filip does not buy the hype around AI and deep learning in particular.
He doesn’t seem to share the vision of folks like Elon Musk who claimed that we are going to see an exponential improvement in self driving cars among other things (he actually said that before a Tesla drove over a pedestrian).

Sep 11, 2018
Episode 46: why do machine learning models fail? (Part 2)
17:12

In this episode I continue the conversation from the previous one, about failing machine learning models.

When data scientists have access to the distributions of training and testing datasets it becomes relatively easy to assess if a model will perform equally on both datasets. What happens with private datasets, where no access to the data can be granted?

At fitchain we might have an answer to this fundamental problem.

 

Sep 04, 2018
Episode 45: why do machine learning models fail?
16:21

The success of a machine learning model depends on several factors and events. True generalization to data that the model has never seen before is more a chimera than a reality. But under specific conditions a well trained machine learning model can generalize well and perform with testing accuracy that is similar to the one performed during training.

In this episode I explain when and why machine learning models fail from training to testing datasets.

Aug 28, 2018
Episode 44: The predictive power of metadata
21:08

In this episode I don't talk about data. In fact, I talk about metadata.

While many machine learning models rely on certain amounts of data eg. text, images, audio and video, it has been proved how powerful is the signal carried by metadata, that is all data that is invisible to the end user.
Behind a tweet of 140 characters there are more than 140 fields of data that draw a much more detailed profile of the sender and the content she is producing... without ever considering the tweet itself.

 

References
You are your Metadata: Identification and Obfuscation of Social Media Users using Metadata Information https://www.ucl.ac.uk/~ucfamus/papers/icwsm18.pdf

Aug 21, 2018