Data Crunch

By Vault Analytics

Listen to a podcast, please open Podcast Republic app. Available on Google Play Store.


Category: Natural Sciences

Open in iTunes


Open RSS feed


Open Website


Rate for this podcast

Subscribers: 175
Reviews: 0

Description

If you want to learn how data science, artificial intelligence, machine learning, and deep learning are being used to change our world for the better, you’ve subscribed to the right podcast. We talk to entrepreneurs and experts about their experiences employing new technology—their approach, their successes, their failures, and the outcomes of their work. We make these difficult concepts accessible to a wide audience.

Episode Date
Building Data Products that Work in the Health and Wellness Industry
19:40
Our guest today holds a PhD in organizational psychology and has been working on data products in the health and wellness space for over a decade. We cover a lot of ground in this interview: how to create data products that work, how to avoid the unexpected consequences of poorly designed data interventions, and the importance of ethnographic thinking in data science. We'll also talk about reducing friction in data collection, the coaching data product model, and surprising things we can learn when people's routine's are broken. From today's episode, you'll come away with a better understanding of how to build contextually relevant data products that make a difference in people's lives.
Jun 01, 2019
The Road to a Data-Driven Culture in Your Organization
24:17
How do you whittle the murky business of creating a data-driven culture down to a proven process? Today we talk to a guest who has done this time and time again, helping companies transform their operations. He points out the small nuances and details about the process, like questions to ask to start on the right foot, critical feedback loops to put in place along the way, and how to overcome some of the most common problems that make people give up. Ginette: I’m Ginette. Curtis: And I’m Curtis. Ginette: And you are listening to Data Crunch. Curtis: A podcast about how applied data science, machine learning, and artificial intelligence are changing the world. Ginette: Data Crunch is produced by the Data Crunch Corporation, an analytics training and consulting company. Now, let's jump into our interview with Ryan Deeds, VP of technology and data management at Assurex Global. Ginette Methot:            How do you whittle the murky business of creating a data driven culture down to proven process? Today we talk to a guest who has done this time and time again helping companies transform their operations. He points out the small nuances and details about the process, like questions to ask to start on the right foot, critical feedback loops to put in place along the way and how to overcome some of the most common problems that make people give up. I'm Ginette and I'm Curtis and you are listening to data crunch, a podcast about how applied data science, machine learning and artificial intelligence are changing the world, a vault analytics production. Let's jump into our interview with Ryan deeds that VP of technology and data management at Assurex global. Ryan Deeds:                  Uh, I think it's an interesting time in the whole a data experience because I think so many people failed. You know, in the last like decade that this next couple of years everybody's now trying to look at root cause. And so culture actually is becoming important now, you know? And so that's kind of a cool thing. Curtis Seare:                 What do you mean by that? In terms of a lot of people have failed. Ryan Deeds:                  I think when you look at bi projects from 2003 to 2013, they were just, companies went through litany of failures and trying to get data to a place that what made sense was easily accessible, had had a good quality. Um, but they didn't address that. They just put the visualizations on top of kind of crappy data and they did that over and over and over again. Um, and then finally it seems like, you know, in the last year or two years, we start really having a conversation about what has to happen inside an organization to make data usable. I mean, it's just like water, right? You can't just take water from a stream and start drinking it. You got to process it and clean it and make it and make it valuable and make it worthy of consumption. And that's exactly the thing we got to do with data. Curtis Seare:                 Sure. Maybe we can dive into that as well, because you've had this experience taking a lot of companies through those steps, right? So what do you see as the major roadblocks? How do you start this process of helping people get their hands around? How do I get value from my data? Ryan Deeds:                  So it's interesting. I kind of have, uh, you know, I've done this a lot and so I have, uh, organizations that come to me and they say, hey, you know, we want to, we were ready to start leveraging data. Um, and the, the typical thing is there's just a lack of expectation of the time it takes. Um, and so I threw together like a timeline to try to help, uh, educate individuals on that, you know, and kind of like the steps that it would take to get to usable data, um, in, and the first is really a recognition that today we don't, you know, the organization that we're in is not effectively using data, um, as a, as a strategic advantage.
May 01, 2019
Statistics Done Wrong—A Woeful Podcast Episode
21:28
Beginning: Statistics are misused and abused, sometimes even unintentionally, in both scientific and business settings. Alex Reinhart, author of the book "Statistics Done Wrong: The Woefully Complete Guide" talks about the most common errors people make when trying to figure things out using statistics, and what happens as a result. He shares practical insights into how both scientists and business analysts can make sure their statistical tests have high enough power, how they can avoid “truth inflation,” and how to overcome multiple comparisons problems. Ginette: In 2009, neuroscientist Craig Bennett undertook a landmark experiment in a Dartmouth lab. A high tech fMRI machine was used on test subjects, who were “shown a series of photographs depicting human individuals in social situations with a specified emotional valence” and asked “to determine what emotion the individual in the photo must have been experiencing.” Would it be found that different parts of the brain were associated with different emotional associations? In fact, it was. The experiment was a success. The results came in showing brain activity changes for the different tasks, and the p-value came out to 0.001, indicating a significant result. The problem? The only participant was a 3.8 pound 18-inch mature Atlantic salmon, who was “not alive at the time of scanning.” Ginette: I’m Ginette. Curtis: And I’m Curtis. Ginette: And you are listening to Data Crunch. Curtis: A podcast about how applied data science, machine learning, and artificial intelligence are changing the world. Ginette: Data Crunch is produced by the Data Crunch Corporation, an analytics training and consulting company. Ginette: This study was real. It was real data, robust analysis, and an actual dead fish. It even has an official sounding scientific study name—”Neural correlates of interspecies perspective taking in the post-mortem Atlantic Salmon”. Craig Bennett did the experiment to show that statistics can be dangerous territory. They can be abused and misleading—whether or not the experimenter has nefarious intentions. Still, statistics are a legitimate and powerful tool to discover actual truths and find important insights, so they cannot be ignored. It becomes our task to wield them correctly, and to be careful when accepting or rejecting statistical assertions we come across. Today we talk to Alex Reinhart, author of the book “Statistics done wrong—The Woefully complete guide”. Alex is an expert on how to do statistics wrong. And incidentally, how to do them right. Alex: We end up using statistical methods in science and in business to answer questions, often very simple questions, of just “does this intervention or this treatment or this change that I made, does it have an effect?” Often in a difficult situation, because there are many things going on, you know, if you're doing a medical treatment there’s many different reasons that people recover in different times, and there's a lot of variation, and it’s hard to predict these things. If you’re doing an A-B test on a website, your visitors are all different. Some of them will want to buy your product or whatever it is, and some of them won’t, and so there’s a lot of variation that happens naturally, and we’re always in the position of having to ask, “This thing/change I made or invention I did, does it have an effect, and can I distinguish that effect from all the other things that are going on.” And this leads to a lot of problems, so statistical methods exist to help you answer that questions by seeing how much variation is there naturally, and this effect I saw, is it more than I would have expected had my intervention not worked or not done anything, but it doesn’t give you certainty. It gives us nice words, which is like “statistically significant,” which sounds important, but it doesn't give you certainty. You're often asking the question, “Is this effect that I’m seeing from my experim...
Mar 27, 2019
Getting into Data Science
22:51
What does it take to become a data scientist? We speak with three people who have become data scientists in the last three years and find out what it takes, in their opinions, to land a data science job and to be prepared for a career in the field. Curtis: We’ve talked a lot in our recent episodes about all the interesting things you can do with data science, and we’ve only talked a little bit recently about what it actually takes to get into the field, which is a topic that a lot of you have reached out to us and asked us to cover in a more thorough way. So today, we’re taking a broader approach on this topic by talking to three data scientists who have become data scientists in the last three years. You’re going to be able to hear all the details of each of their three journeys, how they got started, how they landed their jobs, and what their best advice is for getting into the field, and this will give you a broad view about how to get into data science from three people who have actually done it. Ginette: I’m Ginette. Curtis: And I’m Curtis. Ginette: And you are listening to Data Crunch. Curtis: A podcast about how applied data science, machine learning, and artificial intelligence are changing the world. Ginette: A Vault Analytics production. Ginette: Here at Data Crunch we’ve been hard at work developing a technology that allows executives and business leaders to gain insight from their data instantly—simply by talking to the air. We hook up your data to an Alexa device with custom skills built in to understand the questions you have about your business - and give you answers. Figure out sales forecasts, marketing performance, operational compliance, progress on KPIs, and more by just talking to Alexa. We are officially launching the product this week and have room for three initial customers—if you're interested, head over to datacrunchcorp.com/alexa or datacrunchpodcast.com/alexa (both work), and book some time to chat with us. We’ll assess if your company is a good fit, and if so, we look forward to working with you! Tyler Folkman: My name’s Tyler Folkman. I've gotten into data science in kind of a strange route to be honest. I did my undergrad in economics, actually originally thinking to get into computer science, but for some reason, I had this thought that computer science was going to get outsourced; I don't know if that was a thing, but I think people back in the early 2000s were talking about computer science getting outsourced, so I thought about business, which ended up begin economics, which I really liked, and then ended up doing economic consulting, which is, basically in usually large litigation cases, lawyers hire economists to value damages, so for example, when Samsung and Apple were suing each other, I worked on the Samsung side to help value how much they might sue Apple for, for patent infringement, and a lot of that involves statistical analyses, data analytics, econometrics as economists would call it. And I got really interested in just this idea of data being a really powerful tool for making decisions and coming to conclusions, and so I started hearing about machine learning on the Internet, kind of dabbling with Python, which at the time, I was a Windows user, and it was a huge pain to get Python installed, but I kind of got it up and running, played around with things like SciKit learn, read some blogs, and really got into machine learning and found that it was really housed more in the computer science department at that time, and just kind of decided to apply to some computer science departments and was lucky to get in at University of Texas at Austin and do some studies there, join a machine learning lab and got to do some work at Amazon. Really got a really good set of experiences to kind of help me learn how to be both a programmer and a machine learning person, a little bit of statistics, and jumped straight from there over here to Ancestry and was luc...
Mar 01, 2019
Automated Machine Learning with TransmogrifAI
12:49
Would you rather take a year to develop a proprietary algorithm for your company that has an accuracy of 95% or use an open source platform that takes a day to develop an algorithm that has nearly the same accuracy? In most business cases, you'd choose the latter. In this episode, we talk to Till Bergmann who works on a team that developed TransmogriAI, an open source project that helps you build models quickly.
Jan 31, 2019
The Data Scientist's Journey with Nic Ryan
19:34
What does it take to become a data scientist? Nic Ryan has been in the field for over a decade and answered thousands of questions from people looking to get into the field. In this episode, he talks about his journey into data science and his experiencing mentoring aspiring data scientists, giving advice to both beginners and seasoned professionals. Nic Ryan: I think there's sometimes a problem in data science education, and what people find interesting is they tend to focus on the algorithms, which as you know from doing data science projects is really just the last little bit. There's tens or even sometimes hundreds of decisions steps that are made until you get to that particular point.  Ginette: I’m Ginette. Curtis: And I’m Curtis. Ginette: And you are listening to Data Crunch. Curtis: A podcast about how applied data science, machine learning, and artificial intelligence are changing the world. Ginette: A Vault Analytics production. Ginette: Ad space Curtis: Let’s introduce you to our guest: Nic Ryan. He is an experienced data scientist and LinkedIn influencer who has helped a lot of aspiring data scientists in their journey into the profession. He’s been part of many different data teams, small and large, in big companies and startups, and he wrote a book called, “The Data Scientist's Journey. The Guide for Aspiring Data Scientists,” which is based off the thousands of questions he’s been asked about becoming a data scientist. Nic: It started off with failure. Originally, I wanted to go over to the States to play basketball, so I’m a failed basketball player, and there’s a couple reasons why I didn’t make it: one is I wasn’t tall enough to be a small forward, which is a bit ironic. I’m only 6’2”, but probably the more important reason is I wasn’t very good, but I didn’t know that at the time, so I didn't get a scholarship to play basketball, but I did get a scholarship to do actuarial studies. So it’s not a bad backup plan. But from there, I ended up falling into more of the stats side of things, of insurance, so the statistical modeling, pricing, fire, and theft, I really enjoyed that kind of stuff, so over time, I did more of that. Did some of my post-grad actuarial exams, and I was doing some reading on the weekends and finding out more about stats and a bit about code and a bit about R, and what really did it for me was having an incredibly long train ride to get to work. It was a couple hours each way, and so this is of course, this is the era of MOOCs, and rather than just talking to people, I just ended up joining the MOOCs, and so, really enjoyed that, and this whole thing of data science has just kind of grown around me, and I ended up working for one of the banks and doing their credit scoring and consulting with different banks for a long period of time, and I got a call out of the blue to, a guy just gave me a plane ticket and said come talk to us. So I flew there, and they offered me what was really a head of data science role, so there was a team overseas and a couple teams in Australia doing data science, and yeah, we did some pretty awesome things with NLP and bank statements and built some pretty sophisticated risk models; it was probably best in the country at that time. It’s about 60 miles away from Sydney where I worked, and so it was a real opportunity. It was probably two hour door to door each way, and that was the other thing as well: that was a long time away from family, which wasn’t cool. I had a couple young kids. That’s part of the reason I have my own business now is that I’ve spent too much time away from my daughters. The result of it being I had a whole heap of dead time that I could either use or not use, and so I was able to teach myself code and teach myself some more stats and machine learning and stuff pretty quickly when you have a couple hours of dead time each day, you become pretty good, pretty quickly,
Dec 28, 2018
Cutting-Edge Computational Chemistry Enabled by Deep Learning
17:43
Machine learning is becoming a bigger part of chemistry as of the last two or three years. Industries need to have people trained in both fields, and it's taken time for them to make their way into this sector. Olexandr Isayev is at the forefront of that wave, and he talks to us about what he's done while melding deep learning and chemistry together and his vision of where he sees this field going with this new tech.
Nov 27, 2018
Python and the Open Source Community
24:50
Python versus R. It's a heated debate. We won't solve this raging controversy today, but we will peek into the history of Python, particularly in the open source community surrounding it, and see how it came to be what it is today—a well used and flexible programming language. Travis Oliphant: Wes McKinney did a great job in creating Pandas . . . not just creating it but organized a community around it, which are two independent steps and both necessary, by the way. A lot of people get confused by open source. They sometimes think you just kind of going to get people together and open source emerges from the foam, but what ends up happening, I’ve seen this now at least eight, nine different times, both with projects I’ve had a chance and privilege to interact with, but also other people's projects. It really takes a core set of motivated people, usually not more than three. Ginette: I’m Ginette. Curtis: And I’m Curtis. Ginette: And you are listening to Data Crunch. Curtis: A podcast about how applied data science, machine learning, and artificial intelligence are changing the world. Ginette: A Vault Analytics production. Ginette: This episode of Data Crunch is supported by Lightpost Analytics, a company helping bridge the last mile of AI: making data and algorithms understandable and actionable for a non-technical person, like the CEO of your company. Lightpost Analytics is offering a training academy to teach you Tableau, an industry-leading data visualization software. According to Indeed.com, the average salary for a Tableau Developer is above $50 per hour. If done well, making data understandable can create breakthroughs in your company and lead to recognition and promotions in your job. Go to lightpostanalytics.com/datacrunch to learn more and get some freebies. Here at Data Crunch, we love playing with artificial intelligence, machine learning, and deep learning, so we started a fun new side project. We just launched a new podcast that tests the boundaries of what can be done with Google’s cutting-edge deep learning speech generation algorithms. We use surprisingly human-like voices to host the podcast that reads all the unusual Wikipedia articles you haven’t had a chance to read yet, like chicken hypnosis, the history of an amusing German conspiracy theory, strange trends in Russian politics, and much more to come. It’s worth listening to to hear what this tech sounds like and you’ll learn unique and bizarre trivia that you can share at your next dinner party. Search for a podcast called “Griswold the AI Reads Unusual Wikipedia Articles,” now found on all your favorite popular podcast platforms. Curtis: There has been a heated, ongoing debate about which programming language is better when working with machine learning and data analytics: Python or R, and while we won’t be wresting that particular question, we will overview a bit of history for both and then dive into significant history behind one of these languages, Python, with a major contributor to the language, a man who significantly influenced the way that data scientists use Python today. Ginette: As a very short historical background, Python came to the scene in 1991 when Guido Van Rossem developed it. His language has developed a reputation as easy to use because it’s syntax is simple, it’s versatile, and it has a shallow learning curve. It’s also a general purpose language that is used beyond data analysis and great for implementing algorithms for production use. As for R, it followed shortly after Python. In 1995, Ross Ihaka and Robert Gentleman created it as an easier way to do data analysis, statistics, and graphic models, and it was mainly used in academia and research until more recently. It’s specifically aimed at statistics, and it has extensive libraries and a solid community. As a controversial side note, according to Gregory Piatetsky Shapiro’s KDNuggets poll, late last year,
Oct 24, 2018
Machine Learning, Big Data, and Your Family History
21:10
How can artificial intelligence, machine learning, and deep learning benefit your family? These technologies are moving into every field, industry, and hobby, including what some say is the United State's second most popular hobby, family history. Today, it's so much easier to trace your roots back to find out more about your progenitors. Tyler Folkman, senior manager at Ancestry, the leading family history company, describes to us how he and his team use convolutional neural networks, LSTMs, conditional random fields, and the like to more easily piece together the puzzle of your family tree. Ginette: Today we peek into an area rich in data that has lots of interesting AI and machine learning problems. Curtis: The second most popular hobby in the United States, some claim, is family history research. And whether that’s true or not, it's has had a lot of growth recently. Personal DNA testing products have exploded in popular over the past three years, but beyond this popular product, lots of people go a step further and start tracing their roots back to piece together the puzzle of their family tree. Today we’re going to dive into the data side of this hobby with the leading family history research company. Ginette: I’m Ginette. Curtis: And I’m Curtis. Ginette: And you are listening to Data Crunch. Curtis: A podcast about how data and prediction shape our world. Ginette: A Vault Analytics production. Ginette: This episode of Data Crunch is supported by Lightpost Analytics, a company helping bridge the last mile of AI: making data and algorithms understandable and actionable for a non-technical person, like the CEO of your company. Lightpost Analytics is offering a training academy to teach you Tableau, an industry-leading data visualization software. According to Indeed.com, the average salary for a Tableau Developer is above $50 per hour. If done well, making data understandable can create breakthroughs in your company and lead to recognition and promotions in your job. Go to lightpostanalytics.com/datacrunch to learn more and get some freebies. Tyler: My name's Tyler Folkman. Curtis: Who is a Senior manager of data science at Ancestry. Tyler: As I look across Ancestry and family history, we almost have, like, every kind of machine learning problem you might want, I mean, probably not every kind, but we have genetically based machine learning problems on the DNA science side. We have search optimization because people need to search our databases. We have recommendation problems because we want to hint the best resources out to people or provide them. For example, if we have a hundred things we think might be relevant to a person, what order do we showed them? So we use recommendation algorithms for that. We have a lot of computer vision problems because people upload pictures and a lot of our documents, if they're not like digitized yet, meaning that they’ve extracted the text, they might just be raw photos, or even just the things that our pictures uploaded, we want to understand what's in them, so is this a picture of a graveyard is it a family portrait? Is it an old photo? And so tons of computers vision stuff, natural language processing. On the business side, we have marketing problems just like any other business, like how do you optimize marketing spend? How do you optimize customer experience, customer flow? And so it's really a cool place because you really can get exposed to almost any type of problem you might be interested in. Curtis: So back in the 80s, before you could go easily find information on the Internet, genealogists had to spend a ton of time trekking around to libraries to try to find information on their ancestors. Ancestry saw a business opportunity and started selling floppy disks, and eventually CDs, full of genealogical resources for genealogists to easily access in their home. Tyler: And then they grew up through the Internet age and moved out ...
Sep 26, 2018
Machine Learning Takes on Diabetes
17:16
When Bryan Mazlish's son was diagnosed with Type I diabetes, there were unexpected challenges. Managing diabetes on a day-to-day basis was tough, so he hacked into his son's insulin pump and continuous glucose monitor to create the world's first ambulatory real-world artificial pancreas. Now his mission is to make it available to everyone. Bryan Mazlish: A nice demo that we showed at Google IO earlier this summer, where we showed our use case for one of their forthcoming APIs. We’re really at the vanguard of digital health medical device enterprise software, and it's incredibly exciting but also challenging place to be. We're enthusiastic about the prospects for what we can do for a whole lot of people. Ginette: I’m Ginette. Curtis: And I’m Curtis. Ginette: And you are listening to Data Crunch. Curtis: A podcast about how data and prediction shape our world. Ginette: A Vault Analytics production. This episode of Data Crunch is brought to you by Lightpost Analytics, a company helping bridge the last mile of AI: Making data and algorithms understandable and actionable for a non-technical person, like the CEO of your company. Lightpost Analytics is offering a training academy to teach you Tableau, an industry-leading data visualization software. According to Indeed.com, the average salary for a Tableau Developer is above $50 per hour. If done well, making data understandable can create breakthroughs in your company and lead to recognition and promotions in your job. Go to lightpostanalytics.com/datacrunch to learn more and get some freebies. Curtis: Today we get to speak with a man who, after studying computer science at Harvard, went to start a stock-trading algorithm company on Wall Street until his life experienced a twist. Now he’s the president and co-founder of one of the leading digital health medical device enterprise software companies, which employs machine learning to customize and automate medicine intake, all because of an unexpected challenge that showed up in his life. Bryan: My name is Bryan Mazlish. I’m one of the founders of Bigfoot biomedical. My background is in quantitative finance. I spent 20 years on Wall Street, first at a large investment bank and then about a decade running a fully automated trading business where we built algorithms to buy and sell stocks completely automated fashion, and it was about 6 or 7 years ago that my path took a change . . . Ginette: Bryan’s son was diagnosed with Type 1 diabetes, which Bryan says wasn’t entirely unexpected because his wife has the same disease. But what was unexpected was the intensity of managing the disease on a day-to-day basis. He was surprised with how antiquated the insulin management technology was. There wasn’t technology that could anticipate his son’s insulin needs and automatically give him the insulin he needed. Bryan: You have a need to take insulin to just simply to live. This is something that needs to be delivered on a constant basis, 24 hours a day. You can take this in one of two ways: you can use an insulin pump that delivers this in a continuous basis, and you can also take a once-a-day injection, and the benefit of the pump is that you can vary that at different points in the day. When you take an injection, it lasts for up to 24 hours, and it doesn't have the same flexibility, but it does have the benefit of not having to wear a device to deliver the insulin. And that's just the baseline, on top of that you need to take insulin to offset meals, primarily carbohydrates and high glucose levels. So when you're going to sit down to eat breakfast, lunch, or dinner, or even a snack, you need to estimate the amount of carbohydrate and glucose impact of the meal that you're about to consume, and then dose that amount of insulin, either through an insulin pump or through an injection at that time. Ginette: Figuring out how much insulin to give yourself is tough.
Aug 31, 2018
Digital Twins, the Internet of Things, and Machine Learning
21:24
In a world where so many things are Internet connected, how is machine learning playing a role? Bruce Sinclair speaks with us about the intersection of IoT, AI/ML, and the digital twin. Bruce: Where AI, and in particular machine learning, and then in particular neural networks, and then in particular deep learning neural networks, where they apply is mostly in this model making, so with IoT, there are two types of models for the digital twin: we have the analytical model that's created through more analytical techniques, and then we have the cognitive models that are being created through a machine learning and artificial intelligence techniques. I kind of like to separate the two, but the the impact in both cases are profound. Ginette: I’m Ginette. Curtis: And I’m Curtis. Ginette: And you are listening to Data Crunch. Curtis: A podcast about how data and prediction shape our world. Ginette: A Vault Analytics production. Today, if you haven’t guessed already, we’re talking about the intersection of data, artificial intelligence, and the internet of things, or IoT. So we’re talking to an expert well versed in this topic. A little bit about his background: Among many other things he’s done, like found and head companies, he’s authored a book on the Internet of Things, created a certification program for people who want to become certified IoT professionals, and he explains all things IoT on his podcast called “The Internet of Things Business Show.” Today, we’ll learn about AI in the IoT world and more specifically digital twins—a concept named by Gartner two years in a row now as one of the top ten strategic technology trends for both 2017 and 2018. Let’s dive into this topic with our guest. Bruce Sinclair: My name is Bruce Sinclair. I am the president of IoT Inc. We consult for brands, manufacturers, and vendors and help them with their IoT strategies, both on the business side and on the product side, and we produce content, so part of the content is the podcast, and we do trainings, so we're training executives on how to introduce IoT within their business, and how to—most importantly—be profitable with IoT, and the reason I started IoT Inc. was that I saw pretty quickly that there was a lot of hype around the Internet of Things, and this hype was all around the shiny new things, in particular the technology, but as most technologies, they run out of steam if they can't make any money. And so I was very deliberate in focusing on the business aspect of IoT to try to help executives and managers to understand how to apply this technology. Curtis: One of the most important concepts in IoT is the digital twin, which is a virtual reflection of a physical object. One major use of the digital twin is taking the virtual reflection of an object and virtually change it before actually changing things in the physical object in the real world. Today a digital twin is generated from data coming from sensors embedded in a physical object. Bruce: So the Internet of Things, for everyone that’s listening, is really just the Internet being put into physical objects. The Internet being networking, things being the device. That's really, at least when you look at it from a business perspective, that’s not where the action’s at, and not coincidentally, where the action’s at is in data analytics, data science, and a subset of that being AI, and the purpose of putting the Internet in the physical objects at the highest level is to capture data. So we capture data in our sensors, which is more of our internal data sources, and we capture data on the Internet, and that is using business systems, that is using microservices, and coincidentally or interestingly, it's also other products, and this leads us to the most important technology for the Internet of things and this is the digital twin, and the digital twin is the virtualization of the physical into the digital, so this is where it kind of allows us to take the ...
Jul 31, 2018
Building a Machine Learning Company that Decodes Web Analytics, with Per Damgaard
15:01
The most important thing is to have an AI-enable infrastructure. It sounds very boring, but that was the learning that I got from the bank as well. It’s actually very easy for us to build the model, but what took a long time was to have the AI infrastructure that enables us to do so. Per: The most important thing is to have an AI-enable infrastructure. It sounds very boring, but that was the learning that I got from the bank as well. It’s actually very easy for us to build the model, but what took a long time was to have the AI infrastructure that enables us to do so. Ginette: I’m Ginette. Curtis: And I’m Curtis. Ginette: And you are listening to Data Crunch. Curtis: A podcast about how data and prediction shape our world. Ginette: A Vault Analytics production. Ginette: Before we get into this episode, let’s bring you behind the scenes at Data Crunch. We’re going to show you what we’ve learned about your tastes so far. According to the podcast analytics, which are still rudimentary and can only tell us so much, you really liked our last episode with DataOps. You also enjoyed the "No PhD Necessary" episode, the "How Artificial Intelligence Might Change Your World" episode. Almost all of you have loved the history of data science series. In fact, the third one in the series is our most popular episode in terms of how much of the show you listen to. But in terms of sheer listening numbers, the Hilary Mason episode, titled "The Complex World of Data Scientists and Black-Box Algorithms," tops our charts, with the Ran Levi episode, titled "Deep Learning—A Powerful Tool with a Name that Means Nothing," coming in second place. What this seems to tell us is you like interesting data history, you like interesting projections into the future, and you like learning practical ways you can be successful with data projects. But since the podcast analytics are still rudimentary, we want to hear if our conclusions are correct. So if you want to steer our future seasons, let us know what you want to hear more about by filling out a short survey. Just go to datacrunchpodcast.com/survey, and we would love to hear from you! Today we talk to the cofounder and CEO of a Danish company that employs machine learning to gather insights on what content on your website leads people to take action. If you’re looking into building a company using artificial intelligence or machine learning, this episode will be of particular interest to you because he talks about the impetus for his idea, some tools he used to build his product, some challenges, how he hired his team, when he uses or discards algorithms, and how he packages his product. And you can even try a free version of his product, which he mentions at the end of the show. Per Damgaard Husted: My name is Per Damgaard Husted. I'm the founder and CEO of Canecto. Canecto is a new way of doing web analytics based on machine learning, and the reason we do machine learning is because we want to understand the intention of the users so that we can predict how they are interacting on the website. We focus a lot on how content influences people to make decisions on a website, so it sort of compliments the user journey that you have and the UX and the SEO, but we focus on the content. Curtis: So how did Per come up with this idea of extracting insights from users’ interaction with content? Per: The background was that actually I needed this tool. I was a manager in one of the big Danish banks, and I was in charge of the online banking elements, and I got a lot of traffic, or we got a lot of traffic statistics about what's going on, but I didn't really know anything about that users’ intent. I wanted to make our website better. I wanted to understand what motivates them. I wanted to understand what content we produced. We produced a lot of content in the bank, and we had no tools that could explain how the users’ interaction with the content drove them to take specific a...
Jun 28, 2018
Why DataOps Matter
16:04
If you’re building a data product, these questions are likely occupying your mind: how do you get your customers to trust your data? How do you know your product’s something your customers will want? How do you produce those products more quickly without compromising accuracy? Today we talk with someone who has a lot of experience answering these questions. Ginette: If you’re building a data product, these questions are likely occupying your mind: how do you get your customers to trust your data? How do you know your product’s something your customers will want? How do you produce those products more quickly without compromising accuracy? Today we talk with someone who has a lot of experience answering these questions. Ginette: I’m Ginette. Curtis: And I’m Curtis. Ginette: And you are listening to Data Crunch. Curtis: A podcast about how data and prediction shape our world. Ginette: A Vault Analytics production. Curtis: If you’re a company aiming to research emerging technologies, like AI, ML, IoT, or edge computing, and you find your company lacking expertise, we know where you can the expertise to pad your research team: this team is a group of ex-fortune 500, b2b tech product managers with in-depth market analysis, product planning, and development expertise in bringing successful products, software, and services to the market, and they have significant in-depth technology skills on their team. They drive emerging tech research, product strategy, and tech marketing that resonates with customers, and they’re good at it. If a service like this would be helpful to you for a proposal you’re writing or a for a product that you’re creating, reach out to us at hello@vaultanalytics.com, and we’ll be in touch. Ginette: Now let’s jump into today’s episode. We’re talking with someone who’s worked with data teams for many years and has learned a thing or two. This is Chris Bergh. Chris: I’m Chris Bergh. I'm head chef of a company called Data Kitchen in Cambridge, Massachusetts, and we're a company that helps teams of people who do AI or machine learning or data engineering or data visualization deliver insight faster with higher-quality, and so how did I, how did I get to this point to found a company to focus on what we called dataops? Well, I guess I'm a working class kid from Wisconsin. I went to, in the late 80s actually, I went to Columbia to study AI back when AI was just a corner of the world that people, no one knew what it was, and you didn't walk through an airport and run into it, and then I worked on some AI systems at NASA and MIT to automate air traffic control, and then I sort of got into software development and managing software teams. Curtis: To fill out this picture a little more, Chris has two patents under his belt and has had two companies acquired, one by Microsoft, while he was building the company in the C-suite. So he’s no stranger to the difficult experiences that come with companies’ growing pains.   Chris: About 10 years ago I got into data and analytics, and the company I worked for was about a 60 person company. We did everything that you could do in analytics, and we did data visualization. We had data scientists. We had data engineers. We even decided to build our own complete software platform that did everything in analytics, and I was the chief operating officer, and I worked with a guy who was from Harvard Medical School, really knew, it was a healthcare analytics company, really knew health care and really could talk to customers and figure out what they wanted, but then he'd come back to me and say, “Chris, here I've got this idea. Customer has this pain. Could you get some people together and figure out how to solve it, so I would go off and pull the data scientist and maybe data engineer and maybe someone who knew Tableau and maybe a software engineer in a room, and we’d talked it through. And I’d, I’d, you know,
Jun 13, 2018
Drones and AI
19:48
We are joined by the host of podcast Commercial Drones FM, Ian Smith, who gives us a fascinating understanding of how drones are being used today and in the future. From petri-dish wielding drones that follow whales, to miniature drones working in warehouses, to thermal sensing drones in the mining industry—drones are starting to be used extensively and will continue to grow in the future. We go over the technology, the use cases, the regulations, and the future. Intro: There’s never been a good way, ever, to get snot from a whale to see how healthy they are or do other types of experiments. It can hover right above the whale as it’s surfacing, and it will just have a little petri dish that when the whale blows it’s blowhole, all the snot just goes on it. Then they bring it back to the boat, and then they analyze it later. Curtis: One big area that uses AI and will continue to increase use of it is drone technology. One of the big things that machine learning enables drones to do is be aware of its surroundings. Computer vision classifiers help the drones identify objects that it is seeing and take appropriate action, such as avoiding obstacles, performing maintenance recon, and charting autonomous flight paths. Ginette: Let’s talk to someone steeped in all things drones who can give us insights into drones and how AI currently plays a role and will continue to play a role as drones evolve. This is Ian Smith. Ian: I got into drones in 2013, but before that I had actually built and flown model aircraft, like RCE aircraft with little tiny gas engines, and the balsa wood, and the glue that you have to wait overnight for it to set, and yeah it was a lot of work, and I wound up flying helicopters for my career, so I’m a commercial helicopter pilot. I was a flight instructor, and I heard in 2013 that RC aircraft that model aircraft had come so far that there was people that were using them. They were calling them drones, and they were taking pictures with them and selling them to people, but it was illegal in the United States because there was no regulation from the FAA at the time. So of course I decided to get into this as much as I could, since I wasn’t flying at the time, and ever since then in 2013 it’s been my career, and I worked for a company in France called Delair, and today I work for a company in San Francisco where I’m based now called DroneDeploy, and I host a podcast about drones called Commercial Drones FM as a side project. Curtis: So if you’re looking for more on drones after this episode, go check out Ian’s podcast. He covers all things drone and will keep you up on the latest. Let’s take a broad look at some of the use cases for drones. Ian: Some of the use cases, some of the industries that are using drones really are . . . agriculture was one that everyone latched on to. The construction industry of course. Inspecting assets, so whether that’s oil and gas or utilities or something else entirely, like wind turbines, or something like that. There’s general land surveyors that use drones for mapping activities, and of course there’s the film and photography. Everybody’s by now has seen a Youtube video of a drone or a drone shot in a movie or TV show. . . . Then there’s the mining industry who use them to calculate volumetrics of stockpiles, and search and rescue for finding people and putting crazy sensors on these drones that can sense thermal signatures. The way they’re being used, it’s really up to your imagination. Pretty much anything outside that can get a GPS signal these days. They're going to go towards more indoors things and closed, confined spaces too, so we're seeing just amazing use cases. People have these incredible imaginations, and the more you ask somebody what would a drone do for you? You just get these awesome responses, and it’s really cool to hear what people come up with. They’re even using them for wildlife monitoring,
May 19, 2018
Travel AI with Pana
14:45
Travel’s an interesting industry because it’s inherently global which makes it inherently complex, and it’s so behind other industries when it comes to innovative and advanced technology being applied. A great example of that is when you buy a ticket on an Expedia or Priceline, etc., it’s likely that 75% of the time that a fax is sent to the hotel to tell them that you’ll be staying there that night. Ginette: I’m Ginette. Curtis: And I’m Curtis. Ginette: And you are listening to Data Crunch. Curtis: A podcast about how data and prediction shape our world. Ginette: A Vault Analytics production. Ginette: Data Crunch is brought to you by data.world, the productive, secure platform for modern data teamwork. Organizations like The Associated Press, Rare, Encast, and Square Panda use data.world to replace outdated barriers with deep connections among data, people, and impact. This makes data easier to find, helps people work together better, and puts data and insights in the hands of those who need it. To learn more, visit data.world and request a demo. Curtis: Envision in your mind’s eye our globe and all the airplane flights in the sky at any given time. Now, zoom into a busy city on that globe and notice all the cars being rented by business professionals and the hotels that they’re checking into. Even in just one city, the amount of transactions is dizzying. The travel industry has a lot going on, and yet, sometimes it’s surprisingly antiquated. Devon: I'm Devon Tivona. I'm a founder at Pana. My background is actually technical. I went to school for engineering, spent the first five years of my career as a engineer, then a product lead, and most recently as a founder of this company. Ginette: The founders of Pana were intrigued with the possibilities of what they could do in the professional travel space, and as they talked with travelers, they saw an opportunity. Devon: We were talking particularly to frequent traveler[s]. And we kept hearing over and over again two primary pain points. One was felt like “with all the new found technology in the travel space, I still have to be my own travel agent. And it was great 10 years ago when I could just email someone, and they would take care of all of the logistics for me, but now all the technology has made it so I have to do all that work.” And then the second pain point that we started hearing was “then once I buy my plane ticket or my hotel ticket, if I need to make a change or something goes wrong and I want to get ahold of a real human being, that's like pulling teeth from these companies, particularly if I bought my ticket online.” So we kind of had this vision for could we build the 21st century version of the travel agent, but do so, you know, in a scalable Internet business sort of way. We didn’t want to build a boutique travel agency. We wanted to build something big. Travel’s an interesting industry because it’s inherently global which makes it inherently complex, and it’s so behind other industries when it comes to innovative and advanced technology being applied, particularly because it’s so big, not because it doesn’t have awesome people working in the space. A great example of that is when you buy a ticket on an Expedia or Priceline, etc., it’s likely that 75% of the time that a fax is sent to the hotel that you’ll be staying there that night. And for me when I heard that I was like, “okay, this is a really interesting industry because I can always be building stuff here as a technologist.” Curtis: Pana focused on the corporate travel space in particular because it felt it had more user pain points than other travel workflows. Devon: I think that there's, a there's a lot of a lot of varied user pain that are experienced throughout a travel journey, particularly I would say on the corporate travel side of things. I think that leisure travel, there’s billions of dollars being spent on optimizing conversion flows of you buy...
Apr 29, 2018
The Patent Law Land Grab
19:37
Before the airplane was invented, some people were concerned that everything that could be invented had been invented. Obviously, that was not the case then, and it's certainly not the case now. So as you create novel inventions, how do you protect them? What's the process? And what tools can help you and your team navigate the world of patents? Janal Kalis: It was like a black hole. Almost nothing got out of there alive. So it became slightly more possible to try and steer your application away by using magic words . . . it didn’t always work but sometimes it did. Ginette: I’m Ginette. Curtis: And I’m Curtis. Ginette: And you are listening to Data Crunch. Curtis: A podcast about how data and prediction shape our world. Ginette: A Vault Analytics production. Here at Data Crunch, we research how data, artificial intelligence, and machine learning are changing things. We see new applications every single day as we research, and we realize we can’t possibly keep you well enough informed with just our podcast. So to help keep you, our listeners, informed, we’ve started collecting and categorizing all of the artificial applications we see in our daily research. It’s on a website we just launched. Go explore the future at datacrunchpodcast.com/ai, and if you want to keep up with the artificial intelligence beat, we send a weekly newsletter highlighting the top three to four applications we find each week that you can sign up for on the website. It’s an easy read, we really enjoy writing it, and we hope you’ll enjoy reading. And, now let’s get back to today’s episode. Curtis: Today we dive into a world filled with strategy, intrigue, and artful negotiation, a world located in the wild west of innovation. Ginette: In this world, you fight for your right to own something you can’t touch: your ideas. You and your team ride out into this wild west to mark your territory, drawing a border with words. Sometimes during this land grab, people get a lot of what they want, but generally they don’t, so you have to negotiate with the people in charge, called examiners, to decide what you can own, but what if you’re assigned someone who isn’t fair? Or what if you want to avoid someone who isn’t fair? Is there anything you can do? Maybe, but first you need to understand how the system works. Let’s dive into the world of patents and hear from Trent Ostler, a patent practitioner at Illumina. Trent: The kind of back and forth that goes on oftentimes is trying to get broad coverage for a particular invention, and chances are, the examiner, at least initially, will reject those claims. Curtis: Claims define the boundaries of the invention you’re seeking to protect. It’s like buying a plot of land. There are boundaries that come with the property. These claims define how far your ownership of the invention extends. Claims can be used to tell the examiner why he or she should allow, or approve, your exclusive rights to your idea, giving you ownership over that idea, or in other words, grant you a patent. Trent: The examiner will say that they are broad. The claims don't deserve patent protection. And he could say that they would have been obvious. He could say that it's been done before—it's not novel, and so what this means for anyone trying to get a patent is that it's very complex. There are thousands of pages of rules and cases that come out that further refine what it is that's too broad or what it is that makes something obvious, and oftentimes there is a balancing act of coming close to the line to get the protection that you deserve but not going overboard. Ginette: So there’s a back-and-forth volley between the inventor’s lawyers and the examiner. The examiner says, “hey, you don’t deserve these claims,” and he or she gives you a sound reason or argument for it, and then you and your team try to persuade him or her otherwise, and hopefully overcome those rejections by arguing for why your claims are rea...
Mar 27, 2018
Exposing World Corruption with a Unique Dataset
16:11
Transparency International started when a rebellious World Bank employee quit to dedicated himself to exposing corruption. Now the organization claims the media's attention for about one week a year when it publishes its annual Corruption Perceptions Index, an index that ranks countries in order of perceived corruption. Find out how the organization sources the data, what an important bias is in that data, and how that data ultimately impacts the world. Alejandro Salas: I studied political science and I got very interested in all the topics related to good governance, to ethics in the public sector, etc., and I started working in the Mexican public sector, and—oh, the things I could see there. I was a very junior person working in the civil service, and I got all sorts of offers of presents and things in order to gain access to certain information, access to my boss—so very early on in my professional career, I started to see corruption from very close to me, and I think that's something that marked my interest in this topic. Ginette: I’m Ginette. Curtis: And I’m Curtis. Ginette: And you are listening to Data Crunch. Curtis: A podcast about how data and prediction shape our world. Ginette: A Vault Analytics Here at Data Crunch, we research how data, artificial intelligence, and machine learning are changing things, and we’re noticing an explosion of real-world applications of artificial intelligence and machine learning that are changing how people work and live today. We see new applications every single day as we research, and we realize we can’t possibly keep you well enough informed with just our podcast. At the same time, we think it’s really important that people understand the impact machine learning is having on our world, because it’s changing and is going to change nearly every industry. So to help keep our listeners informed, we’ve started collecting and categorizing all of the artificial applications we see in our daily research and adding them on generally a daily basis to a collection available on a website we just launched. Go explore the future at datacrunchpodcast.com/ai, and if you want to keep up with the artificial intelligence beat, we send out a weekly newsletter highlighting the top 3–4 applications we find each week that you can sign up for on the website. It’s an easy read, we really enjoy writing it, and we hope you’ll enjoy reading. And now let’s get back to today’s podcast. Curtis: We’ve spent a lot of time on our episodes talking to interesting people about what creative things they’ve done with data, like detecting eye cancer in children, identifying how to save the honey bees, and catching pirates on the high seas, but today we’re going to talk about a simple measurement. A creative and clever way to measure something that is incredibly hard to measure. And powerful results come from a measurement that puts some numbers behind a murky issue so people can start to have important conversations about it. And we’re going to look at an example that’s all over the news right now. Ginette: This dataset that’s all over the news right now has an interesting history. While it draws criticism from some sources, it draws high praise from others. But before we get too ahead of ourselves, let’s officially meet Alejandro, the man at the beginning of this episode. Alejandro: My name is Alejandro Salas. I am the regional director for the Americas at Transparency International. I come from Mexico. I started 14 years ago, and I was hired to work mainly in the Central America region, which is also a region where there's a lot of corruption that affects mainly public security, access to health services, access to education. In general the basic public services are broadly affected by corruption. That was my point of entry to this organization. Curtis: Something important to note here is Transparency International’s origins. It’s a surprising story because Transparency Internationa...
Feb 21, 2018
Data Science Reveals When Donald Trump Isn't Donald Trump
15:16
Few things are as controversial in these perilous times as Donald Trump's Twitter account, often laced with derogatory language, hateful invective, and fifth-grade name-calling. But not all of Trump's tweets sound like they came straight out of a dystopian dictator's mouth. Some of them are actually nice. Probably because he didn't write them. Join us on a discerning journey as two data scientists tackle Donald Trump's Twitter account and, through quantitative methods, reveal to us which hands are behind the tweets. Episode Transcript For the full episode, listen by selecting the Play button above or by selecting this link, or you can also listen to the podcast through Apple Podcasts, Google Play, Stitcher, and Overcast. Dave Robinson: So the original Trump analysis is certainly the most popular blog post I’ve ever written. It got more than half a million hits in the first week and it still gets visits . . . and the post still gets a number of visits each week. I was able to write it up for the Washington Post and was interviewed by NPR. Ginette: “I’m Ginette.” Curtis: “And I’m Curtis.” Ginette: “And you are listening to Data Crunch.” Curtis: “A podcast about how data and prediction shape our world.” Ginette: “A Vault Analytics production.” Curtis: Here at Data Crunch, as we research how data and machine learning are changing things, we’re noticing an explosion of real-world applications of artificial intelligence that are changing how people work and live today. We see new applications every single day as we research, and we realize we can’t possibly keep you well enough informed with just our podcast. At the same time, we think it’s really important that people understand the impact machine learning is having on our world, because it’s changing and is going to change nearly every industry. So to help keep our listeners informed, we’ve started collecting and categorizing all of the artificial intelligence applications we see in our daily research. These are all available on a website we just launched, which Data Elixir recently recognized as a recommended website for their readers to check out. The website includes, for example, a drone taxi that will one day autonomously fly you to work, a prosthetic arm that uses AI to aid a disabled pianist to play again, and a pocket-sized ultrasound that uses AI to detect cancer. Go explore the future at datacrunchpodcast.com/ai, and if you want to keep up with the artificial intelligence beat, we send out a weekly newsletter highlighting the top 3-4 applications we find each week that you can sign up for on the website. It’s an easy read, we really enjoy writing it, and we hope you’ll enjoy reading. And now let’s get back to today’s podcast. Ginette: Today, we’re chatting with someone who made waves over a year ago with a study he conducted and he recently did a follow up study that we’ll hear about. Here’s Dave Robinson. Dave: I'm a data scientist at Stack Overflow, we’re a programming question-and-answer website, and I help analyze data and build machine learning features to help get developers answers to their questions and help them move their career forward, and I came from originally an academic background where I was doing research in computational biology, and after my PhD I was really interested in what other kinds of data I could apply a combination of statistics and data analysis and computer programming too. Curtis: Dave studied stats at Harvard and then went on to get his PhD in Quantitative and Computational Biology from Princeton. He did a study on Donald Trump’s tweets in 2016 you may have heard about and posted it to his blog, Variance Explained. For the full episode, listen by selecting the Play button above or by selecting this link, or you can also listen to the podcast through Apple Podcasts, Google Play, Stitcher, and Overcast. Sources Picture Source Photo by Kayla Velasquez on Unsplash Music
Jan 19, 2018
No PhD Necessary
13:45
The ubiquity of and demand for data has increased the need for better data tools, and as the tools get better and better, they ease the entry into data work. In turn, as more people enjoy the ease of use, data literacy becomes the norm. Ginette: “I’m Ginette.” Curtis: “And I’m Curtis.” Ginette: “And you are listening to Data Crunch.” Curtis: “A podcast about how data and prediction shape our world.” Ginette: “A Vault Analytics production.” “We have a gift for you this holiday season. We’re giving you, our listeners, a website . . . it’s a website of all the AI applications we come across or hear about in our daily research. We post bite-size snippets about the interesting applications we are finding that we can’t feature on the podcast so that you can stay informed and see how AI is changing the world right now. There are so many interesting ways that AI is being used to change the way people are doing things. For example, did you know that there is an AI application for translating chicken chatter? Or using drones to detect and prevent shark attacks on coastal waters? To experience your holiday gift, go to datacrunchpodcast.com/ai.” Curtis: “If you’ve listened to our History of Data Science series, you know about the amazing advances in technology behind the leaps we’ve seen in data science over the past several years, and how AI and machine learning are changing the way people work and live. “But there is another trend that’s also been happening that isn’t talked about as much, and it’s playing an increasingly important role in the story of how data science is changing the world. “To introduce the topic, we talked with someone who is part of this trend, Nick Goodhartz.” Nick Goodhartz: “So I went to school at Baylor University, and I studied finance and entrepreneurship and a minor in music. I ended up taking a job with a start-up as a data analyst essentially. So it was an ad technology company that was a broker between websites and advertisers, and so I analyzed all the transactions between those and tried to find out what we are missing. “We were building out these reports in Excel, but there was a breaking point when we had this report that we all worked off of, but it got too big to even email to each other. It was this massive monolith of an Excel report, and we figured there's got to be a better way, and someone else on our team had heard of Tableau, and so we got a trial of it. In 14 days we—actually less than 14 days—we were able to get our data into Tableau, take a look at some things we were curious about, and pinpointed a possible customer who had popped their head out and then disappeared. We approached them and signed a half million dollar deal, and that paid for Tableau a hundred times over, so it was one of those moments where you really realize, ‘man, there’s something to this.’ “That's what got me into Tableau and what changed my mind about data analysis because at school analyzing finance it was nothing but Excel and mindless tables of stock capitalization and all this stuff and what made it fascinating was finding a way to look at it and answer questions on the fly, and then it actually changed the way I look at things around me. I find myself now watching a television show and thinking ‘well this episode wasn't as interesting. I wonder what the trends of the ratings look like.’ It really has changed the way I think about data because of how easy it's been to access it.” Ginette: “Nick is a member of a growing portion of people who didn’t think they’d end up doing analytics. He didn’t have the specific training for it, he doesn’t have a computer science or statistics degree, and he doesn’t spend nights and weekends writing code. And yet, he was able to produce extremely useful insights from his company’s data stores and help land a large business deal. Not only that, he found the process of finding insights from data so fascinating that it spilled over into his le...
Dec 19, 2017
How to Succeed at IoT—Amid Increasing Complexity
17:43
The growth of the Internet of Things, or IoT, is often compared with the industrial revolution. A completely new phase of existence. But what does it take to be part of this revolution by building an IoT product? It's complex, and Daniel Elizalde gives us a peek into what the successful process looks like. For the full episode, listen by selecting the Play button above or by selecting this link, or you can also listen to the podcast through Apple Podcasts, Google Play, Stitcher, and Overcast. Donate 15 Seconds If you liked this episode, please consider giving us a review on iTunes! It helps other people find the show and lets us know how we’re doing. Partial Transcript (for the full episode, select play above or go here) Ginette: “So, today, we’re defining an IoT product, or an Internet of Things product, as “a product that has a combination of hardware and software. It acquires signals from the real world, sends that information to the cloud through the Internet, and it provides some value to your customers. ”Okay, so before we introduce you to our guest, consider this: The IoT Market is infernally hot. In 2016, we had 6.4 billion connected ‘things’ in use worldwide, and Gartner research firm projects that number will nearly double to 11.2 billion in 2018, and then nearly doubling again to 20.4 billion IoT products in 2020. For context, this last number is about 2 and a half times the number of people on earth. “Let’s look at an example of IoT at work. Let’s say you’re an oyster farmer, and you need to keep your oysters under a certain temperature because harmful bacteria might grow if you don’t—which would result in people getting very sick after eating your product. If that happened, the FDA could shut your operation down. “This is where IoT products can help you. You can track water temperature with sensors. Those sensors can send that data to the cloud, where you can access it. The system will even send you an alert if the temperature ranges outside your chosen temperature criteria. You can use cameras that show when the oysters are harvested and how long the oysters are out of cold water before they’re put on ice. By using these sensors and cameras to record harvest date, time, location, and temperature at all stages of harvest, you have recorded evidence that you’ve properly handled the harvest. “So, for the purposes of today’s episode, let’s now switch to the other perspective—to the perspective of someone who wants to make and sell an IoT product. Imagine you and two of your friends recently launched an IoT startup—you’re able to secure funding to build your IoT product, and you’ve hired some team members to help you get your beta version off the ground. But you’re new to building products like this, and the rest of your team is also pretty new to it as well. So you decide to talk with someone who is an expert in the IoT space who can give you and your team pointers—and you’re lucky enough to find this man.” Daniel: “My name is Daniel Elizalde. I am the founder of Tech Product Management. My company focuses on providing training for companies building IoT products, specifically I focus on training product managers. I've been doing IoT really for over 18 years, before it was called IoT, and I worked in small companies and large companies, consulting, and UX agencies. Most of my career has been on the product side of things, anywhere from single contributor to head of product and most recently, I left the corporate world, and I founded Tech Product Management. I teach online. I have an online course for a certification program for IOT product managers. I also teach at Stanford continuing studies, and I do consulting and workshops for companies. “I started to get a lot of request for an online program. And so that's when I decided to build my online training, and it's actually a certification program where you take all the material, then you take a test, and you get a certification.”
Nov 17, 2017
After Disaster Strikes: Data in Disaster Recovery
26:29
We’ve seen photos of disasters depicting fearful and fleeing victims, ravaged properties, and despondent survivors. In this episode, we explore two ways data can help survivors heal and how data also tells their stories. For the full episode, listen by selecting the Play button above or by selecting this link, or you can also listen to the podcast through Apple Podcasts, Google Play, Stitcher, and Overcast.   Donate 15 Seconds If you liked this episode, please consider giving us a review on iTunes! It helps other people find the show and lets us know how we're doing!   Partial Transcript (for the full episode, select play above or go here) Aaron Titus: “I almost disbelieved my own numbers, even though I chose the most conservative ones. It's just outrageous. I'm like, ‘Really? A 233x ROI?’ That's insane.” Ginette: “I’m Ginette." Curtis: “And I’m Curtis.” Ginette: “And you are listening to Data Crunch.” Curtis: “A podcast about how data and prediction shape our world.” Ginette: “A Vault Analytics production.” “Today’s episode is brought to you by Lightpost Analytics. Data skills are in intense demand and are key for organizations to remain competitive; in fact, Forbes listed the industry’s leading data visualization software, Tableau, as the number three skill with the most explosive growth in demand, so investing in yourself to stay relevant in today's hyper-competitive, data-rich, but insights-hungry world is extremely important. Lightpost Analytics is a trusted training partner to help you develop the Tableau skills you need to stay relevant. Check them out at lightpostanalytics.com and let them know that Data Crunch sent you."  “Today, we look at what it takes to understand a larger story—when many disparate voices come together to tell you something much more powerful, and specifically how it can help people deal with the large scale devastation of natural disasters. Let’s jump into how one man did something about his pet peeve, and it produced $300,000,000.00 dollars in savings. And then we’ll pop over to New Zealand to explore how a disaster situation affected Christchurch and what people did about it.” Aaron: “I was a disaster relief volunteer in New Jersey during hurricanes Irma (Ginette: Here Aaron actually means Irene) and Sandy, and my area got very hard hit by Irma, and I started off as a relief volunteer and ended up directing a lot of those relief efforts for my church, and while I was there, I remember standing in very long lines, and a thousand of us would gather together at a field command center and spend an hour and a half waiting to get checked in, which is lightning speed for 1,000 people, but it's still an hour and a half. “And while everybody was waiting, they’d pull out their phones and would start playing Angry Birds, and the technologist in me would just scream inside, “I could have you all checked in with your work orders in 30 seconds, not an hour and a half!” “And I abhor inefficiency—to a fault—like it's almost a little bit of a sickness. I really ought to be better, but I really abhor inefficiency, and I hate it when people waste my time, and I hate wasting people's times, especially volunteers. As a volunteer manager, your most precious asset are your volunteers and the time that they give to you, and when you waste that, not only are you wasting an hour right now, and that’s an hour that you're not helping somebody, but then that volunteer has a bad experience, and they don't come back next week, and so you're not just wasting an hour, you're wasting weeks when you've wasted volunteers’ time.” Curtis: “This is Aaron Titus, the executive director for Crisis Cleanup, a platform that connects volunteers with survivors who opt. in for help cleaning up their properties after a disaster. After this moment of frustration, Aaron decides he’s going to do something about this inefficiency, and he spends over a year designing a system while tryin...
Oct 18, 2017
The Complex World of Data Scientists and Black-Box Algorithms
25:17
Hilary Mason is a huge name in the data science space, and she has an extensive understanding of what's happening in this space. Today, she answers these questions for us: What are the backgrounds of your typical data scientists? What are key differences between software engineering and data science that most companies get wrong? How should you measure the effectiveness of your work or your team's work as a data scientist for the best results? What is a good approach for creating a successful data product? How can we peak behind the curtain of black-box deep learning algorithms? Below is a partial transcript. For the full interview, listen to the podcast episode by selecting the Play button above or by selecting this link, or you can also listen to the podcast through Apple Podcasts, Google Play, Stitcher, and Overcast. Curtis: Today we hear from one of the biggest thinkers in the data science space, someone who DJ Patil endorses on LinkedIn for data science skills. She worked at bit.ly, the url shortener, and is a data scientist in residence at venture capital firm Accel Partners, a firm that helped fund some companies you may know, like Facebook, Slack, Etsy, Venmo, Vox Media, Lynda.com, Cloudera, Trifacta—and you get the picture. Ginette: The partner of this VC firm said that Accel wouldn’t have brought on just any data scientist. This position was specifically created because this particular data scientist might be able to join their team. Curtis: But beyond her position as data in residence with Accel, she founded a company that’s doing very interesting research, and today, she shares with us some of her experiences and perspective on where AI is headed. Ginette: I’m Ginette. Curtis: And I’m Curtis. Ginette: And you are listening to Data Crunch. Curtis: A podcast about how data and prediction shape our world. Ginette: A Vault Analytics production. Hilary: I'm Hilary Mason, and I'm the founder and CEO of Fast Forward Labs (Please note that Hilary is now the VP of Research at Cloudera). In addition to that, I'm a data science in residence for Accel Partners. And I've been working in what we now call data science, or even now call AI, for about twenty years at this point. Started my career in academic machine learning and decided startups were more fun and have been doing that for about 10,   12 years depending on how you count now, and it's a lot of fun! Ginette: Something I’d like to note here is there’s been a very recent change: Hilary’s company, Fast Forward Labs, and Cloudera recently joined forces, and Hilary’s new position is Vice President of Research at Cloudera. Now, one thing that Hilary talks to is where the data scientists she works with come from, which is a great example of the different paths people take to get into this field. Hilary I am a computer scientist, and I have studied computer science. It's funny because now at Fast Forward, our team only has only two computer scientists on it, and one of them is our general counsel, and one is me, and I'm running the business, so most of the people doing data science here come from very different backgrounds. We have a bunch of physicists, mathematicians, a   neuroscientist, a person who does brilliant machine learning design who was an English major, and so data science is one of those fields where one of the things I really love about it is that people come to it from so many different backgrounds, but mine happens to be computer science. The people on our team at Fast Forward   typically have a PhD in a quantitative field, such as physics, neuroscience, electrical engineering, and then have, through that, learned sufficient programming skill. One of the jokes I make about my team is that we're essentially a halfway house for wayward academics in the sense that we can absorb people and teach them to be good software engineers, help them understand the difference between theoretical machine learning an...
Sep 19, 2017
Deep Learning—A Powerful Tool, with a Name that Means Nothing
16:55
Tesla isn’t the only car brand in the world producing or aiming to produce self-driving cars. Every single car brand is working on developing self-driving cars. But what does this mean for our future? We talk about this and other interesting deep learning projects and history with Ran Levi, science and technology observer and podcaster, who explains in thought-provoking ways what we have to look forward to. Below is a partial transcript. For the full interview, listen to the podcast episode by selecting the Play button above or by selecting this link, or you can also listen to the podcast through Apple Podcasts, Google Play, Stitcher, and Overcast. Ran Levi: “I actually had the pleasure of being invited to Google's Mountain View headquarters, and they took me for a drive in one of their autonomous vehicles, and it was, to tell you about that drive because it was boring—boring in a good way. Nothing happened! We were just driving around. The car was driving itself all around Mountain View. And it worked. “The first time I entered such a car, I didn't know what to expect. I mean, I didn't know how reliable are those kinds of cars. So I had the idea that maybe I should sit somewhere where I can maybe jump and grab the wheel if necessary. You know, I was a bit dumb. They don't need me, really. And probably if I touch the steering wheel, I would probably make some mistake and ruin the car. It drives better without me.” Ginette: “I’m Ginette.” Curtis: “And I’m Curtis.” Ginette: “And you are listening to Data Crunch.” Curtis: “A podcast about how data and prediction shape our world.” Ginette: “A Vault Analytics production.” Ginette: “We have a great live show planned that we hope to give at SXSW 2018. It's a really awesome show about the power of niche artificial intelligence, and we’re going to share details from our research into what amazing things AI is doing right now on the fringe and in mainstream AI projects. We're really excited to share it, so if you’re going to SXSW, or you just want to be good hearted and help us out, please vote on our dual panel by going to panelpicker.sxsw.com, signing in, and liking our topic, which you can find by searching for ‘The Power of Niche AI: From Cucumbers to Cancer.’ “Today we get to talk to Ran Levi, who’s been researching and reporting on science and technology for the past 10 years. He’s a hugely successful science and tech podcaster in Israel, producing a Hebrew-language show called Making History, and he’s also producing two English podcasts right now for an international audience, so since he’s steeped in the subject, he has a lot of very interesting insights for us.” Ran: “I'm actually an electronics engineer by trade. I was an engineer for 15 years. I was both a hardware and software developer for several companies in Israel. And during my day job as an engineer, I wrote some books about the history of science and technology, which was always a big hobby of mine. And actually, I started a podcast about this very subject about 10 years ago, and it became quite a hit in Israel I’m happy to say. So about four years ago, I quit my day job, and I actually started my own podcasting company, and now we are podcasting both in Israel and in the U.S. for international audience and actually launched my brand new podcast last week. It's called Malicious Life about the history of malware and cybersecurity, which is a fun topic. Actually, the day I launched the podcast, there was a big ransom attack in Europe mostly. So it was . . . I didn't plan it. You've got no proof against me.” Ginette: “This is a topic well worth learning more about because cyber attacks can affect anything from your access to electricity to your bank account, so check out his new podcast on the website Malicious.life. But today, we’re talking about a different topic—deep learning. This is something Ran knows quite a bit about, technically and historically.”
Aug 09, 2017
When Song Lyrics and British Lit Meet Tidy Text
17:48
When Julia Silge's personal interests meet her professional proficiencies, she discovers new meaning in Jane Austen's literature, and she gauges the cultural influence of locations in pop songs. Even more impressive than these finds, though, is that she and her collaborator, Dave Robinson, have developed some new, efficient ways to mine text data. Check out the book they've written called Tidy Text Mining with R. Below is a partial transcript. For the full interview, listen to the podcast episode by selecting the Play button above or by selecting this link, or you can also listen to the podcast through Apple Podcasts, Google Play, Stitcher, and Overcast. Transcript Julia Silge: “One that I worked on that was really fun was about song lyrics. The last 50 years or so of pop songs, we have all these lyrics, so all this text data, and I wanted to ask the question, what places are mentioned more or less often in these pop songs.” Ginette: “I’m Ginette.” Curtis: “And I’m Curtis.” Ginette: “And you are listening to Data Crunch.” Curtis: “A podcast about how data and prediction shape our world.” Ginette: “A Vault Analytics production.” Curtis: “Brought to you by data.world, the social network for data people. Discover and share cool data, connect with interesting people, and work together to solve problems faster at data.world. Whether you’re already a frequent dataset contributor or totally new to data.world, there are several resources you can use to stay in the loop on the latest features, learn new skills, and get support. Check out docs.data.world for up-to-date API documentation, tutorials on SQL, and other query techniques, and much more!” Ginette: “We hope you’re enjoying some vacation time this summer. We just did, and now Data Crunch is back! To hear the latest from us, add us on Twitter, @datacrunchpod. Today we hear from an exciting guest—someone who is on the cutting edge of data science tool creation, someone exploring and developing new ways to slice and dice difficult data.” Julia: “My name is Julia Silge, and I'm a data scientist at Stack Overflow. My academic background is in physics and astronomy, but I’ve worked in academia, teaching and doing research, I worked at an ed tech start up, and I've made a transition now into data science.” Ginette: “Stack Overflow, where Julia works, is the largest online community for programmers to learn, share knowledge, and build their careers. It's a great resource when you need to solve a coding problem or develop new skills.” Curtis: “Now there are basically two main camps in data science: people who program with R, a statistical programming language, and people who program with Python, a high-level, general purpose language. Both languages have devoted followers, and both do excellent work. Today, we’re looking at R, and Julia is a big name in this space, as is her collaborator Dave Robinson.” Julia: “Text is increasingly a really important part of our work as people who are involved in data. Text is being generated all the time, at ever faster rates. This unstructured data is becoming a really important part of things that we do. I also am somebody that—my academic background is not in text or literature or natural language processing or anything like that, but I am somebody who's always been a reader and always been interested in language, and these sort of collection of circumstances kind of all came together to converge that me and Dave decided to develop some tools for making text mining something that people can do within this idiom of people who work using the R programming language. So we’ve developed a package called tidy text.” Ginette: “Now this particular tool is based on tidy data principles, which is basically organizing data in a uniform way so it’s ready for you to ferret out insights.” Julia: “There's a section of people who use tools that are built for dealing with tidy data principles,
Jul 16, 2017
How Data Is Eradicating Malaria in Zambia
17:16
According to the CDC, people have been writing descriptions of malaria—or a disease strikingly similar to it—for over 4,000 years. How is data helping Zambian officials eradicate these parasites? Tableau Foundation's Neal Myrick opens the story to us. Below is a partial transcript. For the full interview, listen to the podcast episode by selecting the Play button above or by selecting this link or you can also listen to the podcast through Apple Podcasts, Google Play, Stitcher, and Overcast. Neal: “When somebody walks from their village to their clinic because they're sick, health officials can see that person now as the canary in a coal mine.” Ginette: “I’m Ginette.” Curtis: “And I’m Curtis.” Ginette: “And you are listening to Data Crunch.” Curtis: “A podcast about how data and prediction shape our world.” Ginette: “A Vault Analytics production.” Curtis: “This episode is brought to you by data.world, the social network for data people. Discover and share cool data, connect with interesting people, and work together to solve problems faster at data.world. Looking for a lightweight way to deliver a collection of tables in a machine-readable format? Now you can easily convert any tabular dataset into a Tabular Data Package on data.world. Just upload the file to your dataset, select 'Tabular Data Package' from the 'Download' drop-down, and now your data can be effortlessly loaded into analytics environments. Get full details at meta.data.world.” Ginette: “Today we’re talking about something that can hijack different cells in your body for what we’ve deemed nefarious purposes. It enters your bloodstream when a mosquito transfers it from someone else who has it, to you. Once it’s in your body, it makes a B-line for your liver, and when safely inside your liver, it starts creating more of itself. “Sometimes, this parasite stays dormant for a long time, but usually it only takes a few days for it to get to work. It starts replicating, and there are suddenly thousands of new babies that burst into your bloodstream from your liver. When this happens, you might get a fever because of this parasite surge. As these new baby parasites invade your bloodstream, they hunt down and hijack red blood cells. They use these blood cells to make more of themselves, and once they’ve used the red blood cells, they leave them for dead and spread out to find more. Every time a wave of new parasites leaves the cells, it spikes the number of parasites in your blood, which may cause you to have waves of fever since it happens every few days. “This parasite can causes very dangerous side effects, even death. It can cause liver, spleen, or kidney failure, and it can also cause brain damage and a coma. To avoid detection, the parasites cause a sticky surface to develop on the red blood cell so the cell gets stuck in one spot so that it doesn’t head to the spleen where it’d probably get cleaned out. When the cells stick like this, they can clog small blood vessels, which are important passageways in your body. You may have guessed it, we’re describing malaria. “It plagues little children, pregnant women, and other vulnerable people. Children in particular are incredibly vulnerable, something that’s reflected in the statistics: one child dies every two minutes from malaria. “But often outbreaks are treatable, trackable, and preventable when the data is properly captured and analyzed. The United States eradicated malaria in the 1950s. But it still plagues other areas of the world, especially sub Saharan Africa. In 2015, 92 percent of all deaths related to malaria worldwide are in sub saharan Africa. “Today, we’re talking to the man who authorized a partnership aimed at eradicating malaria in one country that’s suffered heavily from it. The results, which we’ll get to, are impressive.” Neal: “My name is Neal Myrick. I'm the director of social impact at Tableau Software and the director of Tableau Foundation.
Jun 11, 2017
How Artificial Intelligence Might Change Your World
20:17
What does the creation of new artificial intelligence products look like today, and what do experts in this field foresee realistically happening in the near future? One thing's for sure, the way we work and function in life will change as a result of growth in this field. Listen and find out more. Below is a partial transcript. For the full interview, listen to the podcast episode by selecting the Play button above or by selecting this link or you can also listen to the podcast through Apple Podcasts, Google Play, Stitcher, and Overcast. Transcript Irmak Sirer: “It’s kind of like a Where’s Waldo of finding an expert in this entire giant ocean of people.”   Ginette: “I’m Ginette.” Curtis: “And I’m Curtis.” Ginette: “And you are listening to Data Crunch.” Curtis: “A podcast about how data and prediction shape our world.” Ginette: “A Vault Analytics production.” Curtis: “Brought to you by data.world, the social network for data people. Discover and share cool data, connect with interesting people, and work together to solve problems faster at data.world. A complex dataset with a ton of files can quickly become scary and unwieldy, but you need not fear! Now you can use file labels and descriptions to manage and organize your many files on data.world. With file labels and descriptions, you can quickly see what type of file it is, view a short description, and also filter down by file type. Wanna see an example of how data.world users are using file labels and descriptions to keep their dataset organized? Search "data4democracy/drug-spending" on data.world. Ginette: “Today we’re taking a closer look at something that is starting to seep into our daily lives. In one of its forms, it’s something Stephen Hawking, Bill Gates, and Elon Musk are concerned will eventually be a threat to mankind. In another form, though, you’re probably already using it, and it’s becoming a major game changer, kind of like the early days of the desktop computer. We’re talking about artificial intelligence. You use AI when you talk to Siri or your in-home assistant, Alexa or Echo, and some people are using it in the form of a self-driving car. “So daily applications of artificial intelligence are on the rise, becoming much more of a staple in our society, but AI’s definition shifts according to the source. Popular movies depict AI as having a consciousness, emotions, and exhibiting human-like characteristics. Usually it’s involved in some sort of world-domination plot to kill all the humans. Although most experts agree that artificial intelligence will never actually think and feel like a human, the existential threat still exists. This kind of apocalyptic AI is known as ‘general AI.’ But that’s a topic for another episode. Today, we’re focusing on the kind of AI that currently exists, otherwise known as narrow AI.” Curtis: “A narrow AI is called narrow because it’s usually focused on one specific task, where as a general AI would be able to be good pretty much any task thrown its way. The Google search bar is probably the most ubiquitous example of a narrow AI that most people use on a daily basis. The process usually goes like this: you give it an input like ‘How to own a llama as a pet.’ It does its processing. It gives you an output in the form of the 10 most relevant web pages to answer your questions (along, of course, with some paid advertisers who are trying to sell you a pet llama). “The simplicity of the interaction belies the complexity of the cognitive work that’s going on behind the scenes. Imagine if you had to do the same cognitive task without the help of Google. What would that actually entail? You would have to individually look at and read every single website, and there are over 1 billion, and peruse them to see if they have anything to do about llamas, not to mention then find the individual pages on those websites that actually answer your questions. “This is a really big task!
May 28, 2017
Preventing a Honeybee Fallout
17:48
What would the world look like without honeybees? In theory, if there were no honeybees, it could drastically change our lives. Bjorn Lagerman, though, never wants to know the actual answer to that question. but the honeybees current worst foe, Varroa Destructor, is killing off honeybee hives at intense rates. Bjorn's in the middle of a machine learning project to save the bees from the vampirish Varroa. Below is a partial transcript. For the full interview, listen to the podcast episode by selecting the Play button above or by selecting this link or you can also listen to the podcast through iTunes, Google Play, Stitcher, and Overcast. Bjorn Lagerman: “My name is Bjorn Lagerman. I live in the middle of Sweden. When I look back in my younger days, I remember, I sat in school, looked outside the window and decided I wanted to be outside. You know, I was raised in a stone desert in the middle of Stockholm in the old town; that's a medieval town. And inside the blocks, there were sort of an oasis of water and fountains and green in this stone desert, but the streets were very old streets. And then the contrast was that in the summertime, I spent that in the countryside, and that was total freedom—you kow, lakes, rivers, forests, and my parents let us do what we wished during all the days, just come home for dinner. So when I was 22, I thought bees might be a reason to spend more time in nature. So I went to the nearest beekeeper, . . . and he sold me my first colony, and from there on, I was really hooked.” Ginette: “I’m Ginette.” Curtis: “And I’m Curtis.” Ginette: “And you are listening to Data Crunch.” Curtis: “A podcast about how data and prediction shape our world.” Ginette: “A Vault Analytics production.” Curtis: “This episode is brought to you by data.world, the social network for data people. Discover and share cool data, connect with interesting people, and work together to solve problems faster at data.world. A complex dataset with a ton of files can quickly become scary and unwieldy, but you need not fear! Now you can use file labels and descriptions to manage and organize your many files on data.world. With file labels and descriptions, you can quickly see what type of file it is, view a short description, and also filter down by file type. Wanna see an example of how data.world users are using file labels and descriptions to keep their dataset organized? Search ‘data4democracy/drug-spending’ on data.world.” Ginette: “Imagine for a minute what the world would look like without bees. The image is potentially pretty bleak: we’d have much less guacamole, fruit smoothies, chocolate everything, various vegetables, pumpkin pie, peach cobbler, almond butter, cashews, watermelons, coconuts, lemon, limes, and many more food products. Let’s not forget the obvious—we wouldn’t have honey, which man can’t replicate well. “But fruits, vegetables, and chocolate aren’t the only food stuffs that would be affected. Bees support other animal life. They pollinate alfalfa, which helps feed dairy cows and boost their milk production, and on a more limited basis, alfalfa helps feed beef cows, sheep, and goats. Statistics vary, but bee pollination affects somewhere between one to two thirds of food on American’s plates. Beyond food, bees help grow cotton, so without bees, we’d have to rely more on synthetics for our cloth. “Honeybees in particular are incredibly hard workers. They pollinate 85 percent of all flowering plants. They collect from just one flower specie at a time, and in turn, the pollen they carry fertilizes the flower’s egg cells. One industrious honeybee worker can pollinate up to 5,000 flowers a day. One honeybee hive worth of workers can visit up to 500 million flowers a year. “With a reduced bee population, it gets harder to produce food. Let’s take an example. California grows 85 percent of the world’s almonds, and it takes at least 1.7 million hives to pollinate them,
May 14, 2017
When a Picture Is Worth a Life
25:11
What if you found out your infant had eye cancer? That news would rock anyone’s world. But what if you had a tool that helped you catch it early enough that your baby didn’t have to lose his or her eye and didn’t have to go through chemo? You’d probably do almost anything to get it. Bryan Shaw has dedicated his time to helping parents detect this cancer sooner so their children don't have to go through what his son went through—and he’s doing it for free. With computer scientists from Baylor University, he's harnessed the power of a machine learning algorithm to detect cancer that no human eye can detect. Below is a partial transcript. For the full interview, listen to the podcast episode by selecting the Play button above or by selecting this link or you can also listen to the podcast through iTunes, Google Play, Stitcher, and Overcast.  Bryan Shaw: “The very first person who ever contacted me because our app helped them was a gentleman in Washington State, and his little girl had myelin retinal nerve fiber layer, which is an abnormal myelination of the retina, and it can cause blindness, but it presents with white eye. And his little girl was five years old, and he kept seeing white-eye pics. He heard our story. He downloaded our app. Our app detected the white-eye pics. That emboldened him enough to grill the child's doctor. You know, 'My camera's telling me this. Look, this app. I heard this story . . .’ The doctor takes a close look. The girl had been 75 percent blind in one of her eyes for years, and nobody had ever caught it.” Ginette: “I’m Ginette.” Curtis: “And I’m Curtis.” Ginette: “And you are listening to Data Crunch.” Curtis: “A podcast about how data and prediction shape our world.” Ginette: “A Vault Analytics production.” Curtis: “Data Crunch is again brought to you by data.world, the social network for data people. Discover and share cool data, connect with interesting people, and work together to solve problems faster at data.world. Did you know that you can add files via URL to your data sets on data.world? Data.world APIs allow you to pull live survey data into your data set, enable automatic file updates, and more. Get the full details on data.world APIs at docs.data.world, or search ‘Austin Cycling Survey’ on data.world to see live survey sync in action in Rafael Pereira's data set!” Ginette: “One quick reminder that our data competition is currently up on data.world. Be sure to post your submissions by May 5. “Okay, now back to the story. If you know someone who’s about to have a child, has a child five or under, or plans to have children, you need to send them this episode, and you’re about to find out why from this man, Bryan Shaw.” Bryan: “When Noah was three-months-old, we started noticing that a lot of his pictures had white pupillary reflections, what doctors call leukocoria, white core, white pupil, and that can be a symptom of a lot of different eye diseases.” Ginette: “You probably put this together, but Noah is Bryan’s son. And to add in Noah’s mom’s perspective here, when she started noticing this strange white reflection in Noah’s eyes, like most moms today, she aggressively searched the Internet for answers. Like Bryan said, leukocoria could indicate a disease, or it could indicate nothing, but the Shaws decided they needed to tell their pediatrician about what they’d found.” Bryan: “Noah passed all his red reflex tests, until we told his pediatrician that we noticed leukocoria, and he had a very good pediatrician—Pearl Riney, Cambridge, Massachusetts. And then she really looked really, really closely. And on that test, she noticed a white pupillary reflection and immediately sent us that afternoon to an opthamologist.” Ginette: “At this point, Bryan’s wife, Elizabeth, was freaking out because she’d done all the research about leukocoria, or white eye, and she knew what white eye might mean for their four-month-old son.” Bryan: “In Noah's case,
Apr 29, 2017
How Many Slaves Work for You?
20:15
If someone came up to you and randomly asked you, "How many slaves work for you?" maybe you'd think, "Slavery ended a long time ago, Bro." Or maybe you would take the question seriously. With 20 million to 46 million people enslaved in the world, it is a serious question, and while we don't see it daily, some of these enslaved people make things for us. Even if we're judicious about what we buy, we would be surprised just how much global slavery goes into producing the goods we do buy. But how can we quantify it? How can we solve this? Justin Dillon, who has worked with the U.S. State Department and hundreds of businesses, thinks he has the answer. Transcript: Ginette: “Our world today is an extremely vast, complicated, and interconnected web of 7.5 billion people. We’re directly connected to some, and it’s really easy to see those connections on Facebook, Instagram, Twitter, LinkedIn. But there’s a whole other group of people we are much more subtly connected to—people who are basically (who are essentially working for us) invisible to us, 20 to 46 million of them. “Our guest today deals with this invisible web every day.” Ginette: “I’m Ginette.” Curtis: “And I’m Curtis.” Ginette: “And you are listening to Data Crunch.” Curtis: “A podcast about how data and prediction shape our world.” Ginette: “A Vault Analytics production . . .” Ginette: “Today’s episode is brought to you by data.world, the social network for data people. Discover and share cool data, connect with interesting people, and work together to solve problems faster at data.world. Quickly locating data, understanding it, and combining it with other sources can be difficult. The data.world Python library allows you to bring data.world datasets straight into your workflow. Easily work with data and metadata in your Python scripts and Jupyter notebooks. Ready to dive in? Learn how to use data.world’s Python library at meta.data.world. Curtis: “Before we get going, one other note about data.world—starting today until May 5th, we are hosting a data competition on their site, and we’d love your participation. Donald Trump’s tweets have been the source of a lot of media attention recently—many high profile news outlets have asserted his tweets show signs of authoritarianism, some say he’s using his twitter account to shape the new cycle, and some have even built algorithms to make stock market decisions based on his tweets. Whatever your stance is on the subject, we’ve uploaded a dataset of every single one of his Tweets up to data.world, and we want to see what you can make of the data. This is a create competition by nature—submissions can be of any format, but the point is we want to see what you can learn, assert, or create with this data set. It’s easy to participate—just go to data.world/datacrunch, and you’ll find the dataset and all of the details. Submit by May 5, and we’re going to take all the submissions that tell the most compelling stories, we want to feature them on a future podcast episode.” Ginette: “Now back to the story. A few months ago, I ran across a website. It sucked me in. It asked me a provocative question, which we’ll get to in just a second, but first, we’ll introduce you to the man who’ll situate the story for you—the main person behind the website.”   Justin: “My name’s Justin Dillon. I’m the founder and CEO of Made in a Free World. We started off years ago. I would say probably the genesis for us was me getting a call from the State Department in about 2010. I’d already been doing some projects, a few websites and, films that I was producing, around human trafficking and modern-day slavery.” Curtis: “Justin directed a documentary he released in 2008 called ‘Call + Response,’ which ranked as one of the top documentaries in 2011.” Justin: “And the State Department called and said, we would like to do a project with you, we like the way that you use data and tell stories,
Apr 15, 2017
Predicting the Unpredictable
21:16
We now know black swans exist, but Europeans once believed that spying one of their kind would be like stumbling across a unicorn in the woods—impossible. Then, Willem de Vlamingh spotted black swans in Australia, and this black bird, which once represented the impossible to Europeans, shifted to represent the unpredictable. One company now dons the name "Black Swan." Find out how it aims to predict what we currently consider to be unpredictable. Transcript Ginette: “Submerse yourself in early 1600s London culture for a minute. Shakespeare’s alive and in his late career. The first permanent English settlement in the Americas just happened. Oxygen hasn’t been discovered yet. But a lesser known cultural idiosyncrasy has to do with a large white bird, the swan. In Europe, the only swans anyone had seen or heard about were white, so of course, in their minds, a swan couldn’t be any other color. From this concept, a popular saying develops, originally stemming from a poem. You use it when you want to make a point that something either doesn't exist or couldn’t happen. You’d say something like this: ‘you’re not going to find out because it’s about as likely as seeing a black swan,’ meaning that, that thing or event was impossible. “But then a discovery blows everyone’s minds. Dutch explorer Willem de Vlamingh is sent on a highly important rescue mission. A lost ship with 325 people on it probably ran aground near Australia, and they needed him to go rescue these people and the goods on board. While Willem and the three ships under his command go and search Australia for this lost ship, they find lots of fish; unique trees; quokka, a cat-sized kangaroo-like creature; and . . . black swans. This last discovery inevitably permanently shifts the meaning of this saying. After this, people start using it more to say when something’s highly unlikely or an unpredictable moment. “Now this concept of an unpredictable moment is why Steve King named his company Black Swan, because they predict the seemingly unpredictable.” Ginette: “I’m Ginette.” Curtis: “And I’m Curtis.” Ginette: “And you are listening to Data Crunch.” Curtis: “A podcast about how data and prediction shape our world.” Ginette: “A Vault Analytics production.” Steve King: “I am Steve King; I’m the CEO of Black Swan. Black Swan is 250 people who focus on trying to predict consumer behavior using data science, artificial intelligence, and big data. We have lots of large clients. We mostly work with big companies that have big problems to solve. Our work sort of splits across the US and the UK. Black Swan is absolutely full of stories. A lot of the work we really do is finding a hard problem that no one’s really solved before and then using data science to crack it, but there always quite interesting stories because, you know, they’re stories of a little bit of adventure, luck, and skill.’” Ginette: “The UK’s Sunday Times has consistently placed Black Swan on its lists: in 2014, it was on the ‘Ones to Watch’ list in its Tech Track. In 2015, it was ranked number one on the Start-Up Track. And in 2016, it was ranked number one in the Export Track 100, because it had the fastest growing international sales for the UK’s small to medium enterprises. “So what’s the secret sauce to the rapid growth and success of Black Swan, a company that solves problems for large companies in many different industries? It turns out, they aim to be better than anyone else at accessing and crunching a specific datasource.” Steve: “The reason we’re quite broad is it actually sits on one simple idea, and the simple idea really is that the Internet is really the world’s biggest data source, and we call, we call the Internet the world’s biggest focus group. So pretty much every opinion of a consumer or the open data that governments are laying out is all there for you to consume, but the, the trick is can you consume it in a way to help you find patterns so you can ...
Apr 02, 2017
The Golden Age of Data Science
25:07
How did one boy's stuffed yellow elephant permanently intertwine itself in history? What is a data scientist? Why is right now the golden age for data science? We take a crack at all three of these questions—the second two, with the help of Gregory Piatetsky-Shapiro and Ryan Henning. Transcript Ginette: “Over the past few years, we’ve seen these news flashes: “An article in Harvard Business Review in 2014, titled: Data Scientist: the Sexiest Job of the 21st Century “Mashable’s article in 2015: So You Wanna Be a Data Scientist? A Guide to 2015’s Hottest Profession “Business Insider, 2016: Data Science was the #1 Profession as Rated by Glassdoor “A data science industry observer, KDnuggets, 2017: Data Scientist: Best Job in America, Again, which cites the most recent Glassdoor report outlining the very top jobs in America: “It turns out, four of the five top US jobs deal with data. In descending order, we find data scientist, devops engineer, data engineer, and analytics manager.” Curtis: “With four out of five of these top jobs orbiting data, clearly something’s going on here.” Ginette: “I’m Ginette.” Curtis: “And I’m Curtis.” Ginette: “And you are listening to Data Crunch.” Curtis: “A podcast about how data and prediction shape our world.” Ginette: “A Vault Analytics production.” Ginette: “Today is a culmination of everything we’ve talked about in our series on the history of data science. This is where all the contributions of Florence Nightingale, William Playfair, Ronald Fisher, Ada Lovelace, and many others come together in one place. We’ll add a couple more people to this list to answer these two questions: ‘What is a data scientist? And why is right now the golden age of data science?’” Curtis: “According to IBM, ‘everyday, we create 2.5 quintillion bytes of data.’ But what does a quintillion actually look like? “Well, if you take one quintillion pennies, you could actually place them face up end to end can and blanket the entire surface of the earth 1.5 times over. Or think about one quintillion ants. That would be like taking all of the ants that exist today on planet earth according to some estimates, and then you have to take that number and multiply it by 100. So, that ant pile in your front yard becomes 100 ant piles in your front yard. Basically ants take over the earth. And we make 2.5 quintillion bytes every single day! “The next question is, how much information does that actually represent? It’s 250,000 times the amount of information that all the printed material in the Library of Congress contains. And we make that every single day.” Ginette: “In 2013, SINTEF published this stat, quote: ‘90% of the world’s data has been created in the preceding two years.’ According to one Ph.D. technologist, this has been true for the last 30 years because every two years, we produce 10 times as much data.” Curtis: “This exponential growth is insane. Just as an example of this type of growth rate, if you take a hypothetical scenario, and you take the world’s population, and say it starts growing as rapidly as data is growing now, it would look like this: Currently, the world’s population, 7 billion people, could fit in the size of Texas if they were living as densely as they do in New York City. Now, in two year’s time with this growth rate, you’d actually have to cover the entire United States and half of Canada with people living in New York City-like density. And if you extrapolate that out ten years keeping the growth rate the same, you’d have to cover the entire planet, including all of the oceans, with New York Cities, and then you’d have to do that with 100–150 additional earths to fit all of those people. That’s the kind of growth rate we’re talking about.” Ginette: “With data collection on the rise, one report goes so far as to say that only the data literate will have the chops to be executives in the future, quote:
Mar 18, 2017
The Curated History of Data Science, Part 3
19:06
From a small building in Pennsylvania to widespread usage across the world, we track the compelling story of one of the greatest technological innovations in history, setting the stage for the age of data science. Ginette: “I’m Ginette.” Curtis: “And I’m Curtis.” Ginette: “And you are listening to Data Crunch.” Curtis: “A podcast about how data and prediction shape our world.” Ginette: “A Vault Analytics production.” Ginette: “Today our story starts at a business building.” Curtis: “The building is in Philadelphia, Pennsylvania, on Broad and Spring Garden Streets to be precise. Envision the late 1940s.” Ginette: “You see a man absorbed in thought entering the building, and you decide to follow him in.” Curtis: “When you walk through his office, you find some bright engineering minds working on a fairly new startup in town: the Eckert-Mauchly Computer Corporation, or EMCC. It turns out, this is the very first large-scale computer business in the United States.” Ginette: “While this business environment on the surface is vibrant and innovative, behind the scenes, it’s a pressure cooker full of confusion.” Curtis: “The owners, John Mauchly, who you followed into the office, and his business partner, J. Presper Eckert, are talking about something strange that’s been happening: most of their clients had been from the government, and now they’re quietly pulling away from doing business with EMCC without any explanation, which is both alarming and confusing to the business owners. It’d be one thing if the government gave a reason each time it pulled out of a contract, but without one, they have no idea what’s wrong or how to try and fix the situation. It’s like going through several breakups where the only explanation offered is, ‘it’s not you; it’s me.’ “So what’s actually going on here?” Ginette: “The answer is woven into John’s backstory, a backstory that also includes the story of the ENIAC, the very first fully electric general purpose computer. “In John’s earlier career, he was involved with scientific clubs and academia. He started as an engineer and eventually became a professor at the prestigious Moore School of Engineering at UPENN. At one point, he got lucky. He asked essentially this question to the right military person on campus: what if I could build a machine that would significantly reduce your trajectory calculation time for projectiles?” Curtis: “So the military ends up formally accepting his proposal, and John and Presper team up for three years on this top-secret military project to build the ENIAC.  “At the time, the ENIAC is really impressive in both size and ability. It weighs about the same as nine adult elephants, which is 27 tons, and it has about 17,500 vacuum tubes, each about the size of your average household light bulb. It has 5,000,000 hand-melted joints. And it’s the size of a small house—about 1,800 square feet. And in today’s dollars, it costs about $7 million. “It’s the very first of its kind. It’s both completely electric and a general purpose machine, meaning you can use it to calculate almost anything as long as you give it the right parameters. The bottom line is that it’s a lot faster than anything before it. It’s 2,400 times faster than human computers, and 1,000 times faster than any other type of machine computer at the time. For example, it took the calculation of a 60-second projectile down from 20 hours to just 30 seconds. To understand the magnitude of this, it's like moving from an average snail’s pace to the average speed of a car on a highway.” Ginette: “Here’s another way to look at this: if you drive your car (the ENIAC) across the country from L.A. to New York City at about 70 miles per hour without stopping, it would take you a little over a day and a half to drive there. In contrast, it’d take a snail (the human computer) without stopping about 11 years.” Curtis: “So it turns out the ENIAC isn’t ready in time f...
Mar 01, 2017
The Curated History of Data Science, Part 2
22:38
She isn’t your typical English girl from the early 1800s. She’s a girl who, because of her fortunate and unfortunate family circumstances, ends up perfectly situated to become part of something that will revolutionize the world. Ginette: “For many reasons, she isn’t your typical English girl from the early 1800s. She’s a girl who at one point examines birds to discover their body-to-wing ratio so she can invent a flying machine and write a book about it. These are goals that show mathematical skill, creativity, and initiative. She’s also a girl who, because of her fortunate and unfortunate family circumstances, ends up perfectly situated to become part of something that will revolutionize the world.” Ginette: “I’m Ginette.” Curtis: “And I’m Curtis.” Ginette: “And you are listening to Data Crunch.” Curtis: “A podcast about how data and prediction shape our world.” Ginette: “A Vault Analytics production.” Curtis: “In our last episode on the history of data science, we talked about the origins of charts and data visualization, which are an important to data science, but in today’s story, we’re going to start a new thread that’s absolutely essential to the fabric of this history. We’re going to talk about some brilliant inventors that gave rise to an idea that would change the course of history—arguably one of the most powerful ideas that has shaped our modern world. It’s a story of triumph and innovation, but also of tragedy, because even though the ideas they moved forward had a dramatic effect on all of us in the long run, in the short term, many of these people saw their dreams fall apart before their eyes. So today and in our next episode, we pay homage to some key people who started the wave that gave us technology that makes our modern lives possible. And we’re gonna to do that first by getting back to the story of the girl we mentioned in the intro.” Ginette: “Interestingly enough, this episode ties into our last episode in an unexpected way. The little girl we introduced to you earlier is born about the same time as Florence Nightingale. She’s about five years older. “We have to understand a little bit about her parents, Annabella and George, to have a better insight into her, so here’s a peek into their lives: They’re both highly intelligent, capable, and well-educated, and they’re from high society. George is more verbal and artistic, and Annabella is more logical and mathematical. “From the start, the pair is not a good match. Annabella sees George’s flaws, but she also sees George’s potential. Beyond that, Annabella is probably attracted to his very handsome (as a lot of people describe him), bad-boy, wild-and-wooly type. One good example of his rebellious nature and disdain for authority is how he exploits a loophole in college to flout what he considers is an absolutely outrageous school rule: since the university won’t let him bring his cherished pet dog with him, he defiantly keeps in his Cambridge University apartments a tame pet bear. Essentially, as loopholes work, the rule doesn’t explicitly say no pet bears, so the university in his mind can’t immediately do anything about it—this may be partly why he only lasts there a term. Anyway, these are the types of things Annabella thinks she can change about George. “On George’s side of things, he notices Annabella’s sharp intellect. She’s incredibly smart. From early childhood, her parents recognize her natural brilliance and essentially give her what most women can’t get in those days—the equivalent of a Cambridge University education. Something else George likes about Annabella is that she’s down to earth. So eventually, he proposes to her, and probably against her better judgement, she says ‘yes’, and they get married, but within a year, things get messy. “She notices George’s strange behavior. He’s dark, he’s angry, he’s brooding. And over time, he starts doing other odd things and even lashes out at her.
Feb 16, 2017
Eyes on the Pirates, Part 2
21:59
Pirates in folk stories and popular movies conjure up strong imagery: eye patches, Jolly Rogers, parrots, swashbuckling, scruffy voices that say “Aye, Matey.” But what do the lives of successful pirates look like today? And what's being done to stop them from plundering and smuggling our ocean's precious resources? World Wildlife Fund's project Detect IT: Fish takes aim at these pirates and other illegal actors with this cutting-edge project that reduces a time-consuming tracking process from days to minutes. Ginette Methot-Seare: “After nearly 15 years of lucrative, illegal activity, he was caught and convicted. The judge in this key case stated that his business activities were an ‘astonishing display of the arrogance of wealth and power.’ He destroyed evidence, and while under investigation, even hired a private I to follow an agent around. After serving prison time, the main perpetrator and his accomplices were ordered to pay 22.5 million dollars in restitution to South Africa for the damage they had done.” Curtis Seare: “Who was this man? Arnold Bengis, a modern-day pirate.” Ginette: “I’m Ginette.” Curtis: “And I’m Curtis.” Ginette: “And you are listening to Data Crunch.” Curtis: “A podcast about how data and prediction shape our world.” Ginette: “A Vault Analytics production.” Ginette: “Believe it or not, these episodes take hours and hours of hard work to produce, and the success of this show depends in large part on the listener reviews and ratings we get. If you like what we do, the best way to support us is to go to iTunes, Google Play, or your favorite medium for getting the episodes, and leave us a review. “If you’re willing to do that, a big thank you in advance, and a big thank you to those who already done it.” “At the end of our last episode, we promised you the story of one of the biggest pirate busts in history, and we will deliver, but before we go on, if you’re new to Data Crunch, you may want to start with the last episode, which will give you more background and context. “By some accounts, this is what happened: Arnold Bengis became incredibly wealthy after growing a business in South Africa. He had a house in Bridgehampton, New York, worth several million dollars, an apartment in the Upper West Side of Manhattan on the 41 floor, and a house in Four Beaches, an exclusive neighborhood in Cape Town, South Africa. “His 6,000-plus square foot Bridgehampton house, a large Spanish-tile stucco villa, overlooked the beautiful Mecox Bay to one side and the Atlantic ocean on the other. His six bedroom, seven full bathroom single-family home had what you’d expect to find at a palatial place: a well-manicured golf green; a luxurious pool; large, well-decorated rooms with chandeliers, and expensive furniture. When the house last sold, it went for 10 and a half million dollars. One of the agents of the National Oceanic and Atmospheric Administration, or NOAA, who investigated Bengis’s case even said he was in partial awe of the lifestyle Bengis was living, which was supported by illegal fishing business. “Bengis held his money, both personal and business, in a highly complex network of trusts and asset havens. The money was scattered abroad in many different places, like Switzerland, Gibraltar, Jersey Islands, and Britain. While authorities didn’t know everything about his money, what they did know was that he had vast assets. For example, in just one year, he deposited $13 million into one of his accounts. His lawyer said that one of his several trusts was worth more than $25 million, according to the book Hooked: Pirates, Poaching, and the Perfect Fish. “I know what you’re probably thinking: ‘How did this man make so much money from illegal fishing?’ We told you in our last episode that IUU fishing rakes in between $10 billion and $23.5 billion dollars a year, and that’s a conservative estimate. The larger picture is this: When you consider that the entire world’s trade...
Jan 31, 2017
Eyes on the Pirates, Part 1
30:55
The history books teach that slavery ended, but it still exists; it’s just morphed its form—different commodity, different location, but same abuses. The commodity is seafood. The location, Southeast Asia. The abuses, forced servitude with all its ugly associations. Some people make a substantial living off illegal, unregulated, and unreported (IUU) fishing, which fuels a dark underground. How is big data angling to stop it? Find out in our next two episodes. Transcript: Michele Kuruc: “People who were seeking better lives and, and coming to look for work were kidnapped by unscrupulous dealers, who forced them into lives we can’t even imagine.” Ginette Methot: “I’m Ginette.” Curtis Seare: “And I’m Curtis.” Ginette: “And you are listening to Data Crunch.” Curtis: “A podcast about how data and prediction shape our world.” Ginette: “A Vault Analytics production.” Ginette: “Welcome back to Data Crunch! We took a bit of a break over the holidays, and we hope you were able to too. “So upward and onward to 2017. What are we up to this year? We’ll be finishing our data science history miniseries for you, and we’ll be meeting some really cool people from KDnuggets, Galvanize Austin, and Datascope in Chicago. But before we do those episodes, we have to pivot because with major recent developments, this particular episode deserves to come out now. “The lives we can’t even imagine look like this according to the Associated Press. One Burmese man left his village when he was 18 years old. He followed a recruiter who promised him a construction job. When he arrived in Thailand, his captors held him with little food or water for a month. He was then forced onto a fishing boat. He was told that he was sold and would never be rescued. In that fishing environment, sometimes he worked 24-hours a day. He and his fellow fishers were whipped with stingray tails and shocked with electric devices. They were told during their time fishing that they would never be let go, not even when they died, and men in his similar situation were sometimes sold from ship captain to ship captain. “If they tried to escape the work, they were locked in cages on remote islands. In the 22 years he was away from home, he asked to go home twice. The first time he asked, the company official chucked a helmet at his head, which left a bloody gash that he had to hold closed. The second time he begged to go home, he was chained to the boat deck for three days in the blistering sun and when the night came, it was rainy, and he could do little to protect himself from it. During that three-day period, he had no food. He amazingly fashioned a lock pick and unlocked his shackles. He knew if he was caught, he’d be killed, so he dove into the water in the cover of night and swam ashore, hiding for his life. “You might ask why he didn’t go to local officials. The answer is he couldn’t because they might sell him back to the ship captains. So after eight years in the jungle hiding from the fishing companies, he finally got to go home because of the AP’s reporting. This is modern-day slavery. Every year, thousands of people are tricked or sold into this type of slavery in order to catch fish for lucrative markets. “If you’ve ever read Solomon Northup’s gripping autobiography, Twelve Years a Slave, the similarity is eery. They are both free men who are initially unknowingly abducted. They’re shackled, beaten into servitude, and forced to work in harsh conditions for many, many years. Both are desperate to go home to their families, and both experience miraculous escapes from tyrannical systems. But unfortunately, not everyone escapes. “This is a huge problem, and it’s frequently linked to illegal, unregulated, and unreported fishing, well known as IUU fishing. Unfortunately, IUU fishing is linked to some of the ugliest transnational crimes: modern-day slavery, human trafficking, drug trafficking,
Jan 13, 2017
The Curated History of Data Science, Part 1
12:45
Who were the people pushing the limits of their time and circumstances to bring us what we know today as data science? We examine what motivated them to do their important work and how they laid the foundations for our modern world where algorithms and analytics affect everything from communications to transportation to health care—to basically every aspect of our lives. This is their story. Transcript: Ginette: “She was obsessed with her failure—she thought she hadn’t done enough. And it didn’t matter that the public saw her as a heroine. So she ended up writing an 830-page report where she employed some power graphics, and this paired with her other efforts ended up changing the entire system.” Ginette and Curtis: “I’m Ginette, and I’m Curtis, and you are listening to Data Crunch, a podcast about how data and prediction shape our world. A Vault Analytics production.” Ginette: “In our last three episodes, we have just thrown you into the middle of data and prediction and the explosion of data science. And some of you have had some questions, like, How did data science become a thing? “In the next three episodes, we’re doing a miniseries where we’re going to address some of these questions, and I think you’ll find it very interesting. Our story starts with an impressive woman. “It’s 1854. It’s the Crimean War, and a woman shows up at a hospital to help. She finds horrifying conditions. To paint an accurate picture for you, here’s a little bit of what she found: the sewage and ventilation systems were broken; the floor was an inch thick with waste—probably human and rodent; the water was contaminated because, come to find out, the hospital was built over a sewer; rats were hiding under beds and scurrying past, as were bugs; and the soldiers’ clothing was swarming with lice and fleas; and on top of that, there were no towels, no basins, no soap, and there were only 14 baths for 2,000 soldiers. Keep in mind this was 20 years before Pasteur and Koch spread Germ Theory. “So she and the 37 nurses that she brought with her set to work, and they did their best to clean up the hospital and help the soldiers. Eventually, because of her, the government sent a sanitary commission. They flushed the sewers; they improved the ventilation. And this helped the situation dramatically. In the end, she reduced the death rate by two thirds. “But Florence Nightingale went home feeling like she had failed, which you’ll remember we mentioned right at the beginning of the podcast. She felt a lot of soldiers had died needlessly. This drove her to write her famous 830-page report. And she ended up working with lead statistician William Farr, who actually helped invent medical statistics. He would say to her, ‘We don’t want impressions, we want facts.’ And working under that type of context, she gathered vast amounts of complex army data and analyzed it to find something rather shocking: 16,000 of 18,000 deaths in hospitals were not due to battle wounds but to preventable diseases spread by poor sanitation.” “So these statistics completely changed her understanding. She thought the deaths were due to inadequate food and lack of supplies, but after the sanitary commission came in, she noticed that the mortality rate dropped significantly. So as Florence prepared her report, she was afraid that people’s eyes would glaze over the numbers and that they wouldn’t grasp the significance of what she was trying to say. So she came up with a clever way to present her data: she ended up using graphics, in particular what she’s known form the rose chart, to convey her message.” Curtis: “Nowadays, charts are everywhere, but back in her day, the idea of creating a picture that was defined by certain data points was not very common, and so the fact that Nightingale thought to do this was very innovative and clever, and it was important because it was able to communicate what she needed to communicate. “Her mentor,
Dec 09, 2016
The Predictive Power of Waffles
18:06
When breakfast food takes on hurricanes, who wins? For another interesting take on the Waffle House Index, see this article the Fivethirtyeight blog, which they posted December 6, 2016. Curtis: “I love waffles. I fill up each of the little squares with the precise amount of syrup so that each bite is a perfect distribution of syrupy goodness.” Nathan: “I love owl-shaped waffles.” Tiffany: “The kind you get at a hotel when they serve you those free breakfasts—they’re just perfect.” Lily: “I love waffles with strawberries.” Vince: “Liège waffles—Belgian waffles were pale in comparison. They’re sugar clumps in the shape of pearls, and they put this in the batter, and it doesn’t dissolve out, and they taste really good. I didn’t even need to add syrup.” Ginette: "I'm Ginette, and I’m Curtis, and you are listening to Data Crunch, a podcast about how data and prediction shape our world. A Vault Analytics production." Curtis: “Today we’re talking about hurricanes, waffles, and predictions.” Ginette: “It happened in 2004. Charley, Frances, Ivan, and Jeanne were four aggressors. With the group’s combined strength, they wrecked their victims. First, Charley attacked and was the most destructive. Frances followed quickly behind with a much weaker pummel, but, being so quick on the heels of Charley, the attack was effective. Then came Ivan with an unexpected one-two punch. And finally, Jeanne forcefully hit the same spot as Frances—but with much more intensity. “To some, this wrecking ball of an attack is known as the Year of the Four Hurricanes. These four hurricanes ruthlessly shredded Florida’s east coast, west coast, panhandle, and interior in about six weeks, leaving $29 to $41 billion in damages. As a point of comparison, if Google had to cover these costs, it would take two to three years of the organization’s net income. Next to Hurricane Andrew, (the most destructive hurricane in US history at the time)—Charley claimed second-place that year. “Charley obliterated mobile homes, savaged houses, knocked over water towers, caused the collapse of carports, obstructed roads by littering them with large trees and power poles, blew over semi-trucks, crushed large trailers, and rendered areas unrecognizable. “We spoke with a couple that experienced a hurricane first hand, and their ordeal sounds harrowing.” Melody Metts: “I don’t think we expected anything that we found when we came back. You couldn’t even recognize where you were.” Ginette: “Christopher and Melody Metts lived within twenty miles of Homestead, Florida, where Hurricane Andrew hit with full fury.” Christopher Metts: “There was nothing taller than the first floor. Any tree, any light pole, any anything that might have been higher than the first floor of a house was completely gone. Anything that would indicate where you were—a street sign, a light—it was all gone as far as you could see.” Ginette: “Like most south Florida residents, they didn’t think much of the storm predictions.” Christopher: “We saw it, and the predictions for it for many days. “Because we were in south Florida and because every hurricane season that comes along has scares that could be very devastating but it’s a near miss or it turns at the last minute, you get into a pattern of they cry wolf too often and you’re lulled into a sense of ‘well not this time.’” Ginette: “While this was their initial feeling, eventually the predictions became serious enough that the authorities issued an evacuation order, so the Metts prepped their house for wind damage and drove to Orlando with seven children in tow, ages one to eight, and it’s a good thing they did because their family would have been in extreme danger otherwise. This is where we start to see the power of prediction in people’s lives. Imagine if there had been little to no ability to predict the hurricane.” Curtis: “Before modern hurricane prediction,
Nov 18, 2016
I Had to Run
22:09
Imagine you have to leave your home immediately, and you have little time to grab anything to take with you. You don't know where you are going—you just know you have to flee for your life. Many people face a similar situation—one in every 113 people on the earth, in fact. There are 65 million people living in a state of limbo, and they don't know what's going to happen to them, but they do know they can't go home. After losing their homes, often their loved ones, and sometimes their identity, they desperately hope for safety and a new home. This episode is where data science meets refugees. Transcript: Hadidja Nyiransekuye: “It wasn’t until I started having as a teacher and a principal of a school when people come in the middle of the night to come attack my house. That’s when I decided I think I need to run again.” Ginette Methot-Seare: “I'm Ginette Methot-Seare, and you are listening to Data Crunch, a Vault Analytics production.” Hadidja: “Just think about something threatening you. Your first reaction would be to duck away from the noise or from whatever is threatening you. Now think about somebody coming with a gun or with a machete, threatening not only your life but the life of your loved ones. You run, you run. Everybody does.” Ginette: “And that’s exactly what Hadidja Nyiransekuye did twice.” Hadidja: “The first time I run, I run because I needed to run.” Ginette: “She was fleeing from bombs.” Hadidja: “It was a mass exodus. Everybody was running, so we run like everybody else.” Ginette: “Hadidja had to flee in her PJs with four children. One of them, a baby on her back.” Hadidja: “My little girl, Lydia, was eight at the time, and I had two of my nieces.” Ginette: “Her husband, who was imminent danger, fled first. And her boys also ran before her.” Hadidja: “It was hot. We were thirsty and hungry. And these young people were perched on . . .” Ginette: “pickup trucks” Hadidja: “And they would say, ‘Keep moving, keep moving! There’s a nice place called Mugunga; that’s where you’ll get food and you’ll get water and you’ll get shelter. And I remember saying to myself, ‘People are dying of Cholera, and I’m going to Mugunga on foot—like 50 miles?’ I just didn’t think I was going to make it.” Ginette: “As a child, Hadidja had polio. Everyone one in 200 polio cases leaves its victims permanently paralyzed. For Hadidja, while her virus didn’t paralyze her, it left her disabled. She walks with a cane and a leg brace.” Hadidja: “At the time, I actually ended up at the Center for People with Disability in the Congo because I had been treated there in my teens. And of course, you just wished people would just let you spread your mat or something you have on their door so you can spend the night there. But they were asking us to get out of the city, to go to that place where they were going to be building refugee camps, so in those conditions, you actually, you hear what other people are saying. Well you just follow because it’s not like you have a choice. Nobody knows where they are going when they are refugees. That’s why they’re called forced migrants.” Ginette: “Let me go back and fill in some holes for you. Hadidja’s story starts . . . ” Hadidja: “in the town of Gisenyi. That’s where I was born and raised.”  Ginette: “Her town is right inside the border of Rwanda.” Hadidja: “It’s at the border of former Zaire, now Democratic Republic of Congo.”  Ginette: “As she grew, she gained an education, became involved in women’s movements, and taught modern languages with an emphasis in applied linguistics. During that time, she married her husband, and they had four children. But then in the 1990s things became precarious in her country.” Hadidja: “People tend to think that the war in Rwanda started in ’94. Actually the war started on October 1, 1990.” Ginette: “Hadidja is referencing an invasion of a group of mostly Tutsis, a minority group,
Nov 01, 2016
Take It Back
11:03
What if one day, out of the blue, you find yourself sick—really sick—and no one knows what's wrong. This is a podcast about a sleeper illness and what one team of data scientists led by Elaine Nsoesie is doing to reduce its reach. Sam Williamson: "It felt as if I were on some kind of hallucinogenic drug. I felt really, really hot. Really cold again. The room started spinning. I got tunnel vision. I was about to black out." Ginette Methot: "I'm Ginette Methot-Seare, and you are listening to Data Crunch, a Vault Analytics production. Today we're going to talk about something that could affect you or someone you love if it hasn't already." Shawn Milne: "It still is a pretty vivid memory for me just because it was such a, such a terrible thing." Ginette: "This is Shawn Milne." Shawn: "Both of us just booked for the bathroom because we were both throwing up." Ginette: "He's describing a sickness that both he and a friend suffered from." Shawn: "On the way home, we had to keep pulling the car over, and we were just both throwing up on the side of the road. It was absolutely terrible. We were just both up all night just throwing up. Just so beat." Ginette: "While Shawn's experience lasted about 48 hours, Samuel Williamson, the person you heard speak at the beginning of our podcast, had one that lasted for about a month." Sam: "I did go to a doctor for it after a while. They convinced me to go to a doctor. He in fact told me that my stomach was just tired, which I thought was a very strange diagnosis. So he suggested that I don't eat anything for a week. I think I lost about ten to twelve pounds in the first week, and so I went a week without eating anything, and came back a week later, and he asked me if the symptoms had gone away, and I told him 'no, they were about the same,' and he said, 'okay, well you can't eat anything else for another week.' I went about three days and then pigged out." Ginette: "While everyone's body reacts differently to this type of sickness, stomach pain was one symptom that everyone we interviewed described." Amy Smart: "I remember at one point, lying on my couch in excruciating pain, and thinking, ‘this is like having a baby, only with a baby, I know it's going to end.’"  Ginette: "Amy had two little girls when she got sick, and she became so ill and weak that she couldn't take care of them. Fortunately, her mom lived nearby and could take her girls during the day, and her husband was able to stay home from work to take care of her." Amy: "I couldn't, I couldn't eat. I wanted to because my body was so depleted, but I couldn't drink. I couldn't keep anything down. We went to the ER because I was so weak, and they put me on IVs and gave me morphine for the pain." Ginette: "But for Amy Smart, the person speaking here, things got a lot worse." Amy: "All that was coming out both ends was blood. And I remember feeling like, 'this is what it feels like to die.'" Ginette: "Amy described to me that it literally felt like life was leaving her body." Amy: "I didn't know when it would end, when I would feel better again. If it would take days or weeks or ever. I remember thinking, 'I'm so glad it's me and not one of my little kids' because I don't know how they would have survived it.'" Ginette: "Now put yourself in her shoes for a second: you're sick and only getting worse. When you go to the doctor, the doctor isn't sure what's wrong." Amy: "They first thought it was stomach flu, then maybe Giardia, then maybe salmonella, and then they cultured it and found I had E. coli." Aside: "E. coli contamination. Possible E. coli contamination. E. coli contamination." Amy: "By then, once it was diagnosed as E. coli, it was a relief because then they knew how to treat it, and they put me on Cipro. By then the Center for Disease Control gets involved and is interviewing and trying to match the strain." Ginette: "Now as an interesting side note,
Oct 13, 2016