Interview with Katharine Jarmul about the Digitization
Katharine Jarmul is a data scientist and co-founder of KIProtect, a data security and privacy company for data science workflows in Berlin, Germany. She researches and is passionate about ethical machine learning, data privacy and information security.
Gerhard Schimpf (GS): Katharine, thank you very much that you accepted our invitation to speak at our Symposium Being Human with Algorithms. Some of our readers might not know you. Would you like to introduce yourself and describe the work you do?
Katharine Jarmul (KJ) Sure! I’m co-founder of KIProtect (https://kiprotect.com), where we build data privacy solutions for machine learning and data science. We are launching several APIs and libraries to make data privacy and information security easier for data scientists and companies. From my experience working in this industry for the past decade or so, unless you make security and privacy easy to do, it won’t happen automatically. Since my co-founder Andreas Dewes and I are both passionate about making the world a more secure place as individuals, we are working on techniques which allow for data science with an emphasis on privacy and security.
GS: Which steps brought you into computer science and what are the effects of the ongoing digital transformation that are most prominent for you today?
KJ: I have been playing with computers and code since I was 14, with some lapses in between, so it’s been really interesting to see it go from launching my website on GeoCities via dial-up Internet on my family’s Windows 95 computer to having a handheld 4G phone with a 64GB SD Card and tweeting my thoughts via a cross-platform application. I would say it would have been hard to predict how the Internet and increasing digitization would affect the global economy and political sphere; effects in part of enabling billions of people across the world to interconnect.
I also don’t think the dangers and advantages associated with “big data” and machine learning could have been anticipated back then – and, to be quite honest, we are still at the beginning of wide-spread machine learning in production systems processing large aggregations of data. One of the first research papers to provide some insight to the ease of de-anonymization of large datasets was the de-anonymization of private user data from the supposedly “anonymized” Netflix prize viewing dataset in 2006 by Arvind Narayanan and Vitaly Shmatikov. Since then, we have only increased the daily production and consumption of both personal and often private data. If data is the new oil, where are the regulations on oil usage and consumption? Thankfully, GDPR (the European General Data Protection Regulation) is a step in the right direction to giving users rights to their own data.
GS: How does this affect your work and how do you personally take part and shape the digital transformation?
KJ: Our work at KIProtect is built on years of experience and knowledge of what businesses need, while counterbalancing those needs with the protections users deserve. I don’t want to live in a world where we have two kinds of citizens: those in Europe who are, as I call them, first class data citizens and those who live outside, with no data rights. If companies are going to make money from user data, then users should have a say in how their data is used and to what they do and do not consent. Simply saying “Well, they shouldn’t use this service unless they agree to unlimited data use” is not an option. Being alive now means leaving a digital footprint, and you should be able to choose who you share that footprint with and how.
GS: What chances do you associate with the ongoing digital transformation?
KJ: Digitization of data could mean huge benefits to society as a whole. Being able to more quickly iterate on medicine and health, to create public transportation more people will use, to allow more people to go to school and vote and live long happy lives. These are all things that computers, data science and machine learning can help us do faster and better than ever before. However, this means someone has to build programs to do this; someone (or some companies or organizations) have to decide to use their computing for social good. And sometimes (or some might argue, often) social good and free-market capitalism are at odds with one another.
I hope companies and my fellow computer and data scientists see the benefits of creating more ethical computing. Writing a computer program or algorithm you know will be used to jail more persons is far less rewarding than one that will help increase school attendance or help users find better health care. I believe people want to do ethical, meaningful work; and I hope the industry can create more opportunities for this work. Ultimately, we, as consumers, can help create those opportunities by demanding more ethical data and computing use from companies we work with as well as companies we purchase from.
GS: On the dark side and in the context of Being Human with Algorithms, what are the risks which you see?
KJ: Unfortunately many! Obviously, my focus right now is on the risks we have for privacy and (im)proper use of our data. When most applications or websites require the ability to track you, use your friend list, see your location data or browsing history, what are you to do? Simply not use apps or your phone or your browser? This is not an option; there has to be a better way. It came as no surprise to me to hear about the use of data for targeted political attacks by Cambridge Analytica – we have made data a weapon available for purchase by those who will and can use it against us (usually for sale to the highest bidder).
Additionally, the security issues we have within data science and machine learning are vast, including the ease of fooling an algorithm or poisoning a machine learning model. Depending on the use case, this could put lives at risk, threaten our financial institutions and hurt any individual user’s security and accounts. Developing and utilizing more security best practices for our data science as well as privacy protections for our users should be top priority for those whose code or machine learning models affect human lives.
GS: …In your opinion, who should be responsible to contain these risks?
KJ: On some level, if we don’t want to divide the world into first and second-class data citizens, we, as consumers, will have to demand these data rights for everyone. I don’t think waiting for governments across the world to regulate these issues will be a viable solution. As digital users, we do create the data – meaning we can choose which companies we trust and do not trust (to some degree). We can demand better policies for privacy and security and put pressure on companies who do not abide by those standards.
Like our work at KIProtect, those of us with computer skills and data science knowledge can choose to build products and tools which enable privacy and security for all, not just for the few. Democratization of privacy and security via our own demands and creations will be the only way for this to become a global movement – allowing for everyone the right to determine why, how and when their data is used.
GS: Katharine, thank you very much and we are looking forward to have you with us in Heidelberg in September.