Skip to main content

Opting Out – My Week Without Big Data, Part 2: Operation Disconnected

Security is a funny thing. Bruce Schneier, in response to a question about stopping another 9/11 attack came up with this novel conclusion: “simply ground all the aircraft. A totally effective solution.” Although Schneier’s answer is obviously tounge-in-cheek, it raises an interesting point. Completely disengaging is always an “effective” solution, if you don’t consider collateral damage. I don’t mean to be melodramatic when I say that for most of us, the internet has become such a commonplace tool in the workplace that simply choosing not to use it is as much of a non-option as grounding all aircraft indefinitely in the U.S.

Two weeks ago I introduced you to my weeklong quest to disengage from so-called “Big Data.” The challenge was simple – over the course of one week, try my best to disengage from data tracking as much as plausible without sacrificing all social and professional life. I would need to be able to contact friends, family, and co-workers, but try as much as I could to keep other parties out of my communications.

The first day of my week without big data actually began in the hours before midnight on Wednesday, as I cut the cord on numerous services in preparation for the impending hour. Over the course of the previous week I had researched effective strategies for opting out and selected the tools best suited to allow me to continue to communicate for work. I logged out of my Facebook account, Google account and others, put the finishing touches on my in-home file and e-mail server (I even got the domain name mailcoach.org, which at least sounded somewhat like an e-mail provider), and set up my desktop, laptop, and smartphone to all automatically connect to Virtual Private Networks upon boot-up. As the hour struck midnight, I wondered if any of these insane countermeasures were actually going to be effective. How would I even know?

The whole process was actually quite liberating. One of the nice perks of opting out of data tracking meant that I had to find other things to do with my time than constantly refresh my email or trawl through my Facebook feed. I used the internet for other forms of entertainment, with a relative sense of security in doing so. Prior to the start of my one-week test, I spun out emails to co-workers and family to let them know that I wouldn’t be available on my normal email address, providing my alternative, self-hosted email instead. Google Drive, which I use regularly to store documents and assignments for students (I’m a teacher at the University) was replaced by a self-hosted, Dropbox-style cloud file server called OwnCloud. Both of these were hosted on my $35 Raspberry Pi. In those areas, I was able to relatively seamlessly transfer to non-tracking software and hardware.

The first thing that I noticed, however, was that the barriers to opting out weren’t the ones that I expected. I had spent a significant amount of time preparing for problems communicating with people or having incompatible systems – more logistical problems than anything else. I regularly use Facebook to communicate with friends, so I wondered if anyone would think that I was just simply ignoring them. In reality, however, that wasn’t a problem at all. Instead, it was the technical barriers. Setting up my extensive series of countermeasures (detailed at the bottom of the article, for the sake of efficiency) was not a simple task. I don’t consider myself the most tech-saavy person in the world, but I do regularly have friends call me for computer help, and I’ve worked a handful of web design and light programming jobs. What I mean is that I know my way around a computer. But even I found myself on occasion questioning the difficulty of setting up these systems. Is it really worth it? I thought to myself. If it takes this much trouble, what am I gaining in exchange? And therein lies a bias of the technological world. The web has a long history of exploiting under-educated users – the identity theft schemes and predatory ads claiming you need to install a “critical software updates,” those don’t usually surprise me – but what did surprise me in the process of opting out was the kind of soft pressure that makes it inherently difficult for users to do anything online without sharing their information. Even for web “power users” such as my self, so much of the great functionality of the web is locked behind a kind of invisible boundary – one that, when crossed, means you surrender your data to the powers that be online. Even if someone has a philosophical objection to data tracking, who’s to say that they have the technical know-how to set up a system like the one that I’ve developed?

In thinking about this I was reminded of an idea by Dragnet Nation author Julia Angwin. She talks about the idea that privacy has become a “luxury item” – meaning it costs money to maintain your privacy online. The de-facto standard is for sharing to be turned “on”, and in Angwin’s argument it requires money to get that privacy back. But I would take Angwin’s statements a few steps further. There is a technical bias against privacy. Privacy is hard, especially if you don’t work with computers regularly. Although technical proficiency with computers is growing more and more common as the years go by, the threshold of technical education required to opt out of data tracking is still well outside the reach of your average American. It took me the better part of a day to set up all of the software, hardware, and varied options necessary to be sure that I’d secured my data online. I thought to myself – if I, as a relative computer geek, had some trouble setting this system up, what would it be like trying to get my grandmother on board? She certainly worries about people reading her mail – but why doesn’t she care about the possibility of someone reading her email?

To me, this represents one of the biggest problems with the Internet today. Most products on the web are designed in such a way that they actively discourage privacy. Although I believe that there are great benefits to web technologies in the integration of personal data (Google Now, for example, is one of my favorite web technologies of the last few years), the extent to which the web violates the privacy of users is astronomical. I can understand the tangible benefits of a system such as Google Now, which wants to learn more about me to help make computing more efficient. The problem arises, rather, in the assumption that user data is open to snooping by third parties by default.

Nonetheless, I was able to set up a system for opting out that worked for me. I routed my traffic through a Virtual Private Network (VPN), used my own custom email and cloud servers, and generally my life went about without too much difficulty. I communicated with my students regularly, spent a whole lot less time on social media websites, and ultimately had an experience that was not on the whole that much different than my day-to-day life.

And as the week wore on, the things that I had given up began to feel more and more frivolous the longer I went without them. The “Google Now” features I’ve discussed so much (the perfect poster child for a “good” use of data tracking) seemed superfluous. I didn’t suddenly find myself late to work just because my phone hadn’t reminded me to be there on time. I didn’t need to Facebook to remind me of when events were coming up or who among my friends had a birthday this week. These were all of the things that I felt I needed these sorts of services for – the little idiosyncratic ways that they make my life easier – but at the end of my week I found myself wondering if I’d ever really needed them to begin with. Was it really worth what I was getting in return?

Ultimately not – I would say. There are some valuable cases, technologies that use data tracking to legitimately make the world an easier and better place to live. However, we need to figure out what types of openness we encourage, allow, and make “open” by default, instead of assuming that our technology needs to be either open or closed. When I first set out to opt out from data tracking I admittedly thought of the world in that binary fashion – “tracked” or “not tracked”, private or public – but moving forward, after I have found it more useful to think about what technologies should be public and tracked and which should not. Just yesterday Google’s Larry Page discussed this kind of nuanced understanding of how data should be shared. He claims that thousands of lives could be saved by openly sharing anonymized health information online – and he’s right. It certainly would make things easier for doctors who could use a couple thousand corrolary cases to cross-reference. What interests Page (and me) about this approach is that it cares more about how things are shared, not whether or not they are shared. Anonymized information certainly would make a lot of people more comfortable with the idea of sharing their medical records.

But I would argue that we need to go a step further. What the current systems lack are sufficient ways for users to have a choice about how their data is being used. For most internet users, they are passive participants in the exchange going on between themselves, websites, and data brokers. By visiting a page they unknowingly consent to the release of their information. Like Page’s approach to the question of open medical records, we should similarly ask the how questions of third-party data tracking. Can users have granular control over their privacy? Certainly. But right now, as I found in setting up my VPN and mail server, extensions and alternative browers, control is difficult to gain.

What I came to understand during my week without big data wasn’t the efficacy of certain systems over others, the better capability of one tool compared to other. Instead, what I figured out is something that I knew from the very beginning. For now, the internet is designed such that concerned used have to opt out of data tracking. But that doesn’t mean that in the future we can have an internet where the inverse is true – where people have to knowingly opt in to data collection services. It is a small, seemingly inconsequential change, but one of which we should all be aware.

Software and Hardware Used:

The Browser

Arguably the most used piece of software on any computer, your web browser is an amazingly complicated piece of technology. If your browser could talk (and sometimes it does) it would know a whole lot about you. However, these browsers are often built around the same technologies that enable third party tracking by data brokers and other interested parties in vacuuming up your personal data.

The Software: Firefox

For regular listeners of the show, you likely know that I am an unapologetic supporter of Google services. However, the fact of the matter is that many of the features implemented in Google’s Chrome browser lend themselves to extensive data tracking. Primary among these is the persistent login of a Google account that binds every website visit that you make to your Google web history, which is ultimately used to serve ads.

So, with a heavy heart I made the switch to Firefox. For years prior to the release of Chrome I was a Firefox diehard, so returning to the browser was not too unfamiliar. My choice of Firefox was an important one. Even when using Chrome without a user account logged in, the program regularly sends usage data back to Google’s servers. Although, this data is anonymized, the presence of the feature makes it a liability for any user concerned with data tracking.

Search Engine: Duck Duck Go

What is true for Chrome browser is also largely true for Google Search. Eli Parsier’s great TED Talk on the concept of the “Filter Bubble” details the fifty-seven different signals that the Google search engine uses to individualize itself to you, the user. Even without a user account logged in, these signals can be used to differentiate users from one another and thus personalize their search queries. These range from your physical location (Fairbanks, Alaska, for instance) to the type of computer you’re using (desktop, tablet, smartphone, etc.) and more. These signals are used to not only provide you with more immediately relevant information (i.e. local results for restaurants, etc.), but also to serve up localized and personalized advertisements. Features such as these, however, have been used for more nefarious purposes: such as in the case of the 2012 revelation the flight booking service Orbitz was charging Mac users more for ticket reservations than others.  This is just one of the many ways that tracking through browsers and search engines can have very real repercussions on your life, and at the end of the day Google’s search tools are built on the premise of individual user tracking.

As a result, I chose to use a different search engine – Duck Duck Go. Duck Duck Go is a search engine that is designed as an alternative to Google’s highly personalized search, meaning that Duck Duck Go does not track you by location, browser, or identity. Furthermore, Duck Duck Go is dedicated to avoiding user tracking. The company keeps no logs of user searches, and as a result cannot even be subpoenaed for the release of user search logs, because they do not exist.

Extensions: Disconnect, HTTPS Everywhere, Ad Block Plus

Even with a switch to a browser that doesn’t require a persistent account and a non-tracking search engine, there are still sneaky ways that your browser can leave behind breadcrumbs throughout your daily browsing. To understand this, we need to talk a little bit about how data tracking actually happens.

The primary way that websites track users is through your web browser software using files known as cookies. These are local files stored on your computer that are used for a variety of functions – from remembering login information for banking websites to remembering your user settings on a service like Pandora internet radio. However, these files can also be accessed across numerous websites to track user browsing activity and report back to the institution that created the cookie. Most often, these cookies are used to generate the personalized ads that we are used to seeing spread throughout the web. Many of the sites on the web use the same advertising and tracking networks, and as a result those sites are able to effectively “share” your traffic on partner websites amongst themselves. That is why, for example, your Amazon browsing habits result in personalized ads in your Facebook news feed.

Extensions like Disconnect keep a running log of known third-party trackers and cookies (the type used by data brokers to get information about you) and blocks them at the source. Similarly, DoNotTrackMe provides a second option for blocking third party trackers. Although neither of these services can offer complete protection, the combination of the two should keep your data safe from a large majority of third party trackers. This is an incredibly powerful tools that makes everyday browsing a lot more secure and stable. HTTPS Everywhere, on the other hand, ensures that you use HTTPS (the “s” is for “secure” or encrypted transmission) connections whenever possible online. Even if you enter an address like “http://www.facebook.com”, HTTPS Everywhere will automatically redirect you to secure connection version of the site. By transmitting data with a secure server, the prying eyes of third party trackers are less effective.

These extensions are really the meat and potatoes for browser security and, in conjunction with the right browser and search engine, can make privacy from data collection services plausible without too much disruption of day-to-day internet use.

Virtual Private Network: Private Internet Access

Your IP address is a unique identifying number used by the network to refer to your computer or network router. For most individuals, the string of numbers in your IP address won’t change unless you physically move to a new home or apartment. For this reason it is a primary metric for determining who you are. However, numerous users on the same network can muddy the waters. For example, if you have numerous computers in your house, each belonging to a different family member, they may all appear with the same external IP address to outside websites. Nonetheless, IP tracking is an incredibly common practice and one of the most reliable methods that companies, organizations, and governments have of tracking individuals. Most lawsuits for piracy through the bittorrent network, for example, are substantiated by identification of a user through their IP address.

Virtual Private Networks such as the one offered through Private Internet Access (PIA) work to obfuscate your IP address by allowing you to appear (to the outside world) to be a different IP each time you connect. PIA offers software for Windows, Mac, and Linux that gives convenient features such as auto-connection, meaning you don’t have to remember to set up the VPN every time you turn on your computer. Furthermore, PIA keeps no logs of their servers and has a dedicated approach to user privacy.

Email: Self-Hosted (Raspberry Pi)

Self-hosted email is not for the faint hearted. It allows users a lot of control over their email and the data that comes and goes from it, but it can also be a huge security hole. Unless you are comfortable with the idea of maintaining your own self-hosted email for years to come, it’s likely not the best idea. However, if you are, Ars Technica has a great series of articles on setting up your own email server. Keep in mind, however, that although self-hosted email isn’t being scanned through Gmail’s filters to generate ads, your email is more like a postcard than a sealed letter. Unless email is encrypted (and it likely isn’t) there is very little you can do about email interception.

Cloud Documents: OwnCloud

OwnCloud offers some great self-hosted tools for creating your own DropBox clone, meaning that you can have total control over your cloud-hosted documents. With all of the same major features as DropBox, OwnCloud is a no brainer.

Leave a Reply