Apple's app crash; the parallels to be drawn with the web

There is an equivalent technology on the web which raises equal, if not greater concerns for privacy

Photo credit: Tayeb Mezahdia on Pixabay

Jamie Robinson

Founder of Mashoom and a mechanical engineer. My passions are teaching, questioning the status-quo and working too hard.

Written

11 min read

Contents

A news story caught our attention today; a whole range of apps on Apple devices crashed for a few hours, with Facebook taking the blame. It's rightly raised a lot of questions as to the pervasiveness of the the tech giant's data collection.

First, I'm going to highlight a bit of detail in how the app crash likely happened, then go into how this concept correlates to the web in a more unnoticeable and potentially concerning way.

What (likely) happened to cause the apps to fail

Firstly, it's nothing really to do with Apple. The reason it happened on Apple products is because Facebook publish an SDK (Software Developer Kit) for Apple devices and this is what caused the failure. SDKs are a simple means for a developer to connect to an external service. For instance, if I want to use Facebook's "Sign in using Facebook" button, then I could manually send all the required information to their servers (via an API or "Application Program Interface"). I would then have to receive the response, figure out what to do with it then change all my code when Facebook go and update their API. This is time consuming and boring, hence why SDKs are very common.

An SDK is a library that accompanies an API service, to make using this service quick, simple and reliable for a developer. So all I need to do is say something like "has this user signed in with Facebook" and the SDK will do all the required API calls/processing etc to tell me the answer. They are not a bad thing in themselves.

Put simple, I expect what happened is that Facebook changed their API and in doing so, caused a fatal error in their SDK, oops! Goes to show that developers are still humans (ish) and anyone can forget to write a "graceful" fail if the API doesn't return what it's expected to.

What this tell us about apps

As all the news stories are picking up on; it's always gut wrenching and surprising when many apps, apparently completely independent of one another, can fail because they all share a link with Facebook. This is the smoking gun that apps like Spotify, TikTok and Tinder all actively use Facebook's services, maybe because they want to provide the convenience of a "login via Facebook" button, maybe because they simply want the marketing and sharing tools that they also offer.

It's very apparent these days, Facebook don't offer these tools out of the goodness of their heart. Every service provided by Facebook (and likewise platforms) is cost benefit analysed; the cost of building and maintaining a service versus the value of the data it collects. This is all stuff we sort of know about though, I think what people are less aware of is how this sort of thing happens on the web...

How this relates to the web

At time of writing, 54.8% of websites have Google Analytics installed, 9.4% have Facebook pixel installed. Think of these as SDKs that instead of being installed on apps, are installed on websites. Apps and websites are fundamentally different in that an app is compiled (code into ones and zeros!) then distributed to everyone's devices. Websites essentially send "code" (strictly, some of it is code and some is "markup") that is then compiled and run by your browser, for instance Apple's Safari, Mozilla Firefox, Google Chrome etc. In most cases, a website says "I need these packages" and your browser fetches and runs these packages, for example, Google Analytics.

There is nothing hacky about this, this is how the internet is designed; to share someone else's content on your site. Us web-folk get the same sort of benefits the app developers and their SDKs get; we can view our website's traffic, we can see what pages they are visiting, we can even see what search term led them to our page. We could then install the Facebook pixel and see which Facebook posts people are clicking on, even see if this converted to a sale on an e-commerce site... lovely...

You can probably see where this is going; all this data is going straight to the big tech companies. OK, but we probably all know this as well... read on...

The scary bit

In the case of SDKs, if the code in the SDK fatally fails, the app will crash. This is a concern, as it raises the question "could Facebook disable a whole range of apps whenever it wants?", the answer is evidently "yes, in theory". This can't really happen on websites, web standards are such that when a script / library fails you won't get its functionality but otherwise (and certainly in the case of these tracking libraries) you probably won't notice the difference.

However another difference is an SDK in an app doesn't have so much scope to do more than the developer is expecting. At the end of the day it's compiled code; it must be the same for every app albeit it can be cleverly controlled by the service's API. A web library can however be a completely open book in terms of what code is downloaded and run.

To put it in perspective, someone else running code on your website would be called a cross-site scripting attack if the developer wasn't expecting it. These are really, really dangerous because an attacker can have access to anything you are looking at on the webpage along with what you are doing, your identifying information (IP address, user agent etc), often the last site you were on and possibly your cookies and local storage, albeit this should have another line of defense. Yet, often without any knowledge of the implications, developers install, or get asked to install, these sorts of things.

I'm very clearly going to say that I have no evidence that what I'm going to say next is actually happening. I'm only saying on a technical front, in the same way as Facebook disabling apps via its SDK, it's possible and a bit concerning as a result.

The analytics package can know who you are and what website you are on. This means they could send a custom packet of code that is designed much more specifically to monitor your activity on a website, including the information it's showing you. They wouldn't have to do this for everyone, maybe a sample of people, maybe a specific person.

They could also send custom code for everyone on a specific website; they could find out quite accurately the sales volume of an online store for instance. This sort of business sensitive information is very valuable to a competitor, it's daunting to know in theory it's available to a tech giant.

Why this isn't OK

In my humble opinion, this crosses the line as to what is acceptable; even as a possibility. If you look at doctors for example, they are rightly held to a very high standard of privacy for their patience's information, yet we allow tech companies to gain probably deeper insight into our activity... because we don't see it happening?

The internet was intended to be a road that gives you access to some houses, aka, websites. If you go into a house I think it's fairly acceptable that you understand you play by the rules of the house and frankly the owners of the house can monitor you, ask you questions, collect data etc. Now, we would expect that this is made clear to us, no spy cameras in the bathroom, but at the end of the day you can choose to leave this house if you think it's dodgy. You buy into it by being there and you can leave if you like.

However, the way the internet is at the moment is that a few companies have someone in most houses on the street. We wouldn't let someone stand in the corner of a house we were visiting and simply trust they won't tell anyone what we get up to. We certainly would want to be told, right?

The implications of this level of access to data from the point of view of law enforcement is also interesting. Government agencies are already realising the power the tech giants hold and as a result leaning on them to provide information. Already law enforcement agencies are demanding data from tech companies via court orders etc, and generally this can be seen as OK as it provides evidence to prosecute bad people.

However, the concern will always be that if the scope of what data can be collected is so wide, you could pessimistically say it's a matter of time before that power gets used for something against the greater good.

Maybe it's not that bad, or at least it's improving

I know a few people where their response is "don't care", for various reasons. Good on them. In my experience they usually have a good working knowledge on all of this, so fair enough! It's a decent argument, data collection isn't by default a bad thing.

My issue is when people don't know about all of this, and would find it creepy if they did. They could still continue to use the internet as they do, that's fine, but I think what must be done is to make this all as clear as possible so people can make this choice.

So the EU got involved; bring on the epic cookie policy, single handedly ruining the fun for users and websites alike. I feel these things are getting longer and longer, more confusing, buggy as sin and just a right pain. This is our punishment as a population for letting things get to where they are. What they are doing very effectively though is giving every single person exposure to this whole issue, even if most automatically punch "accept all", it's a start.

Another interesting force for change is Apple, and they deserve some credit for this. At WWDC 2020, they ramped up their privacy stance by introducing a privacy reporting tool into Safari. This tells the user what third party libraries are running in the background, which should help the visibility of this issue hugely.

I also think it's right to say that these massive tech giants have humans running them and I hope dearly that if they really did step over the line someone would speak out. Equally, golly they don't inspire confidence on this front...

Is this the only way to do this?

NO!

What if Google made the script they run on other's websites open source? So what data they are collecting would be open to scrutiny, as well as the same for everyone? This would not be an issue for them; sure us developers could break it or let it go out of date, but then Google could say "we won't give you analytics until you fix/update it" and then fine, we can do that. Check this out for instance; this is how Google could be doing it!

Will that affect performance? No. Google could even provide this code from their servers; there is a neat standard on the web where a developer can define a hash of the script we are importing so it can't change from what we know. I wrote another article on hashing to explain this concept.

Or, websites could do it the Mashoom way; host their own analytics tools. We are very open and honest about the fact we track visitors on our site, we are a business after all and we need to know how many people are looking in the shop window.

The main thing is that we don't share this information with anyone other than ourselves and a few companies that we use for marketing, and we make sure to only give them the data they need to deduce how a campaign is performing etc; certainly nothing identifying. Overall this means that whilst we know how you are interacting with Mashoom, we don't, can't and don't want to know anything else about you.