Hacker News — vinext + Cloudflare Workers

new
past
show
ask
show
jobs
submit

▲Websites have a new way to spy on visitors: analyzing their SSD activity (arstechnica.com)

257 points by Brajeshwar 5 days ago | 93 comments

blfr 2 days ago [-]

Wait, wait, wait: browsers allow websites to store junk on my drive? They take up gigabytes of memory and still write to disk on top of this? Without even asking whether the site can use local storage?

Years and years back when laptops still had HDDs, I had a script to put the Firefox profile &c on a ramdisk and sync it on reboots so that it didn't spin up the drive constantly. I guess I should have kept doing it.

It's a sad day when Arch users are right (again) https://wiki.archlinux.org/title/Firefox/Profile_on_RAM

noirscape 2 days ago [-]

Browsers have an absolute insane level of relatively unchecked permissions to do whatever they want on a client.

There's a lot of effort by browser developers to scope creep the browser into essentially being an OS-agnostic tech stack (one where, conveniently, code can be shipped across the network "as necessary", removing a lot of user agency for the software being ran); Chrome being the biggest driver of this, while Firefox has an extremely weak spine in trying to limit it.

It's fairly dire and I wouldn't be surprised if there's a lot more of these side channel attacks in a lot of web APIs.

Tangurena2 1 days ago [-]

Flash ended up getting blocked/banned by all browsers because it turned into a giant gaping security hole.

> By January 2021, all major browsers were blocking all Flash content unconditionally.

It looks like we-the-users need to be blocking any and every one of these parasites.

https://en.wikipedia.org/wiki/Adobe_Flash

Narishma 1 days ago [-]

I have a feeling they may have pushed for that more because it was controlled by a third party, and not the browser developers themselves.

rayiner 1 days ago [-]

Now that we have AI, can we go back to real apps and native tech stacks? And revert the browser to a text-display interface?

VMG 1 days ago [-]

Unfortunately, real apps and native tech stacks can not only write data to your SSD, they can usually write data to the user directory however they want and they can read it as well!

Browsers are at least somewhat sandboxed

rayiner 1 days ago [-]

But you need relatively few native apps with those capabilities, in contrast to visiting many different websites for content consumption.

MisterTea 1 days ago [-]

Why not sandbox the process then?

lxgr 11 hours ago [-]

Sure, can you propose an equally mature and battle-tested sandbox, ideally one with multiple implementations on a wide variety of OSes?

VMG 1 days ago [-]

I don't know, maybe something about backwards compatibility, maybe nobody can agree on how to do it correctly. It hasn't happened for decades, so I'm not going to hold my breath.

hollerith 1 days ago [-]

This is a Linux-centric take. It does not apply for example to iPadOS or to AluminiumOS (coming soon to a Googlebook near you). It applies less and less over time to MacOS.

Yes, if one is committed to the standard Linux desktop, then one must hope that any proprietary apps one might need will continue to be available through the browser, but I'm ready to let the standard Linux desktop go (not right now, but eventually).

lxgr 11 hours ago [-]

It very much applies to macOS, or do you know of a way to know what permissions a sideloaded macOS application will have before opening it that's accessible to regular users?

hollerith 7 hours ago [-]

The very fact that you've qualified your question with "sideloaded" suggests that you are already aware that a non-sideloaded MacOS app is installed into a sandbox that is much more secure than anything available on a standard Linux desktop excepting possibly Qubes and Secureblue, and hardly anyone uses Qubes or Secureblue -- probably for very good reasons.

lxgr 11 hours ago [-]

Sure, once we have equivalent strong sandboxing for third-party apps I'm down to install random ones rather than visiting a website.

freedomben 1 days ago [-]

> can we go back to real apps and native tech stacks

Please God, no. If you're worried about the invasiveness of browser-based apps, native is out of the frying pan and into the fire

rayiner 1 days ago [-]

Except you’re not going to install native apps for the vast majority of things you use a browser for. You’re going to use the browser for content consumption and native apps for a few things that need system access.

lxgr 11 hours ago [-]

Are you arguing that it's better to install a small handful of highly privileged applications as opposed to none?

noelwelsh 1 days ago [-]

It's also the technology that will allow software to run without a continuous connection to the server. If you want to break out of a world where companies own your data it's the tech that is needed.

veunes 1 days ago [-]

The uncomfortable part is that each step is usually justified by a real use case

Gormo 1 days ago [-]

My shortcut for launching "clean" Chromium session is `chromium --user-data-dir=$(mktemp -d)` -- each launch creates a new transient profile directory under /tmp, which is itself a RAM disk. Persistent settings are achieved by setting system-wide defaults in /etc/chromium, including using system-wide managed policy JSON.

data-ottawa 1 days ago [-]

Does this maintain your browser extensions (and their settings)?

Gormo 1 days ago [-]

Yes, extensions can be installed into the system-wide config via entries in the manged policy JSON. Settings configured in a specific browser session naturally won't persist, though, but defaults set in an initial preferences config will be present.

sheept 2 days ago [-]

Is this surprising? Websites have long been silently writing to disk, for cache, cookies, and blobs. OPFS just provides a file-system-like API for ultimately the same functionality

runako 2 days ago [-]

Yes? From the paper:

"On Chrome and Safari, OPFS supports very large files, up to 60 % of disk space, which is more than sufficient to avoid the page cache on most typical systems, as even a small disk size of 64 GB would allow us to create a 38.4 GB OPFS file."

I am indeed surprised to learn that a random website can write a file that takes up 60% of my disk. Is this obviously a capability of Web browsers?

rendaw 2 days ago [-]

Not only that, but they don't even provide any visibility into what's being stored. Firefox developer tools doesn't even have OPFS browser functionality. IIRC I even saw some stuff about going out of the way to make it inaccessible by the user.

hulitu 2 days ago [-]

> Is this obviously a capability of Web browsers?

The main capability is RCE, but it seems that they need a way to store the payload.

cinntaile 2 days ago [-]

There's a whole trend with websites not uploading anything to their servers due to privacy and whatnot, where do you suppose the data is being saved for repeat visits...

runako 1 days ago [-]

What your'e describing I would expect to be measured in kilobytes, not tens of gigabytes.

There is no reason for any person to think that a website needs to store data sized in the "full Ubuntu install" range to facilitate repeat visits.

cinntaile 1 days ago [-]

You make a reasonable point, while kilobytes might be too little it probably shouldn't be 30gb. 5gb might be ok. In the settings it should be possible for the user to set their own limit. I am not familiar with browser storage but there is hopefully a mechanism to inform the user that their limit might not be enough.

Rohansi 1 days ago [-]

> There is no reason for any person to think that a website needs to store data sized in the "full Ubuntu install" range to facilitate repeat visits.

Do you think people expect that for apps they've installed? Should those also be limited to a few MB?

runako 1 days ago [-]

I think that when I install an app, typically it will tell me up front how much disk space is required. For example, in the Mac App Store, the size of the app is at the top of the page.

> Should those also be limited to a few MB?

I also want to highlight that many/most websites that think of themselves as apps are at odds with their users in that perception. As an engineer, I know full well that e.g. the URL https://homedepot.com is powered by a sophisticated set of apps. But most users think it's a website.

This is important because people do & should have a different relationship with software they have chosen to install on their machines and websites. Yes, I know e.g. Figma.com does complex client-side stuff. Every Figma user would click a dialog to grant permission for it to do what it needs to do.

The problem is the current state of the art is that literally any website can spam up your disk and you don't even know. If I visit the website for a local radio station, or an e-commerce site, or university, or a site that will tell me what time it is now in a different time zone...I do not expect that it will download tons of data and store it on my disk in case I come back. That some engineers think that is reasonable is why the browser sandboxes need reinforcement.

Rohansi 21 hours ago [-]

> But most users think it's a website.

I'd be willing to bet most (younger?) users don't know what the difference between an app and a website is. Can't really blame them when the line between them has been blurring more and more over time. I think a growing number of users wouldn't even mention installing as a differentiator because you install PWAs (except on iOS).

Anyway, I think an important bit of information that was lost here is that browsers automatically purge data so that your disk doesn't fill up. If you're running low on space it will clean it up for you to make room.

> I do not expect that it will download tons of data and store it on my disk in case I come back. That some engineers think that is reasonable is why the browser sandboxes need reinforcement.

It's a reasonable thing to do for an app, so why not a website/PWA? Video games are a pretty good example where some stream assets while you are playing so that you don't have long install/update times. Getting in game faster is more important.

runako 9 hours ago [-]

> most (younger?) users don't know what the difference between an app and a website is

Yeah, I doubt this because younger users overlap most heavily with "had to ask a parent for permission to install an app."

> It's a reasonable thing to do for an app, so why not a website/PWA?

Again, I chose to install an app and it has transparent install requirements. A website does not clear either of those bars.

ElProlactin 2 days ago [-]

Ten movies streaming across that, that Internet, and what happens to your own personal Internet? I just the other day got... an Internet [email] was sent by my staff at 10 o'clock in the morning on Friday. I got it yesterday [Tuesday]. Why? Because it got tangled up with all these things going on the Internet commercially. [...] They want to deliver vast amounts of information over the Internet. And again, the Internet is not something that you just dump something on. It's not a big truck. It's a series of tubes. And if you don't understand, those tubes can be filled and if they are filled, when you put your message in, it gets in line and it's going to be delayed by anyone that puts into that tube enormous amounts of material, enormous amounts of material.

AlienRobot 2 days ago [-]

That surprised me as well.

I thought the whole point of cookies, local storage, session storage, and indexed DB were to avoid what origin private file system is doing.

You mean I could have just saved stuff as a file this whole time instead of serializing it to a string? Why didn't we just do this from the start?

nostrademons 2 days ago [-]

It's still sandboxed and deleted when the user clears private data for the website.

The main advantage it has over things like cookies, local storage, etc. is that it provides a byte-oriented, random access API and as a result, you can use third-party libraries like SQLite that expect a file API. Which is more important now that we have tools like Emscripten and WebAssembly that let you use existing C libraries on the web. At the same time it has security guarantees such that webpages cannot write arbitrary files that will be viewed and executed by the user.

Also, in theory you could use this side-channel attack on localStorage and sessionStorage. Its only requirement is that it needs an API that writes to disk where you can measure the latency of a synchronous call, since the fingerprinting is just measuring the interference pattern between disk accesses the attacking website does vs. disk accesses that other websites do.

binyu 1 days ago [-]

> Wait, wait, wait: browsers allow websites to store junk on my drive?

Technically even a cookie is junk on your drive

> Without even asking whether the site can use local storage?

Would it be practical to ask permission for every site you visit? It would be better to periodically check the size of your home folder (where the browser profiles normally reside)

HeartStrings 2 days ago [-]

Hostile LLMs? In my browser? At this time of the year?

veunes 1 days ago [-]

The funny part is that "put your browser profile on a ramdisk" used to sound like an obsessive performance tweak, and now it starts to look like a privacy mitigation

DanielHB 2 days ago [-]

If you open an incognito window in chromium it is profile on ram

kccqzy 2 days ago [-]

> Without even asking whether the site can use local storage?

Where did you see this in the article? I had some recollection that Firefox at least did require asking the user.

nozzlegear 2 days ago [-]

Firefox doesn't ask permission just to use localstorage, no modern browser does this. The closest thing you get is when a site wants to persist storage with "navigator.storage.persist()", which should prompt you for permission. But localstorage data usually persists anyway, and only gets deleted if the browser's storage is "under pressure", so I've never personally worked on a site or web app that had to use that API.

anygivnthursday 2 days ago [-]

You mean by default or it cannot be configured that way? I believe, I had Chrome configured to not allow storage by default, only for sites I added to an exclusion list. I cant remember now, but isnt there also an option to change the default on Firefox to deny or always ask for permission?

nozzlegear 2 days ago [-]

Just by default - I didn't know you could configure your browser to disallow storage by default.

Vinnl 1 days ago [-]

I don't think LocalStorage allows you to store gigs of data though, and IIRC this method depended on the Origin-Private File System API.

atoav 2 days ago [-]

Btw. as per EU law (GDPR) website owners are required to aquire informed consent for any kind of client side storage if it contains information that is personal. And it has been ruled that any information that can be used to identify returning users is such.

People think the GDPR is just about cookies, but it is agnostic of the technology used.

Maximum fines: €20 million, or 4% of the company's total worldwide annual turnover of the preceding financial year — whichever is higher.

And informed consent means they need to know what data you collect/store for which purposes and there needs to be an equally easy to select No-Option.

runako 1 days ago [-]

This doesn't really address the issue here. The condition here is that a site might decide that it needs to store (say) a copy of the Red Hat server installation package on each user's local machine (20GB) to facilitate repeat visits.

The stored data is not related to the user at all. The problem is that the website gets to silently write 20GB to the user's disk.

1 days ago [-]

Khaine 2 days ago [-]

And Web Developers want more and more OS features built into the browser. This is why I'm against it. Features are only ever abused.

dwedge 2 days ago [-]

> Even Meta and Yandex were recently caught joining in the privacy-invasive free-for-all.

Damn, even Meta have joined the dark side?

meindnoch 2 days ago [-]

Sic transit gloria mundi :'(

Aurornis 2 days ago [-]

I’m skeptical of these side channel attacks that rely on training a neural network on specific controlled scenarios on controlled hardware. I believe that with enough time and effort and the perfect circumstances where the user is only visiting their website and doing one other thing that the network was trained on it can match.

It does not seem useful as a general purpose side channel vector.

hansvm 2 days ago [-]

It depends what you mean by "general purpose." First, these things generalize more often than you'd expect. Second, even in the absence of generalization they're still useful for, e.g., fingerprinting activities to manufacture a unique ID where non previously existed.

Aurornis 2 days ago [-]

The paper isn’t describing a unique ID fingerprint. It’s looking for specific activity patterns to match against training data of running specific commands on specific hardware.

1970-01-01 1 days ago [-]

Publish or perish. It worked once, in a controlled lab, mostly (80-90% guess). Good enough for more millions in funding..

Not really joking here.

https://hannesweissteiner.com/

https://hannesweissteiner.com/publications/frost/

ramenat2am 2 days ago [-]

That's basically just a research, theoretical attack vector. It doesn't mean it's viable for general purpose old school mass privacy invasion

user____name 40 minutes ago [-]

It’s ActiveX all over again.

gblargg 2 days ago [-]

I'm surprised their 1GB file wasn't cached entirely in RAM during the attack, eliminating the SSD from any timing. Do people keep their machines that heavily loaded that a file being constantly read from doesn't stay in the cache?

francoi8 2 days ago [-]

It should be fairly easy to mitigate no? Simply add random access times. Localstorage doesn't need to be that fast. More generally I find it very annoying how much browsers allow by default (javascript, localstorage, gpu access etc.) - there's only a very limited amount of websites I want to be able to run gpu accelerated shaders.

ars 2 days ago [-]

> Simply add random access times.

That doesn't work. Because the random times are uniformly distributed it's possible to remove it from the data by additional sampling. You do make it harder because you need a lot more data, but it's still possible to extract the signal, because the noise is uniform.

eterm 2 days ago [-]

The interesting mitigation would be snapping I/O to a course clock.

You could then set it to hold the result until the next tick.

E.g. An I/O tick of 20ms, and it would only return on 20ms boundaries, then almost every SSD would look the same.

It would slow down the API a bit, but privacy has tradeoffs.

danbruc 1 days ago [-]

Probably still does not work. Assume a request takes X ms and let us look at what you will observe depending on where within a tick period it arrives.

If it arrives anywhere from 0 ms to (20 - X) ms after a tick, it will complete before the next tick, so the measured duration will be between X ms and 20 ms. If it arrives later in the tick period, it will miss the next tick and have to wait an additional tick period, so the measured duration will be between 20 ms and (20 + X) ms.

If you make N repetitions, you would normally see a spike of density 1 at X. With the 20 ms tick wait, you will see a uniform distribution of density 1/20 between X and (20 + X).

You would have to perform each request and then return the result exactly 20 ms after it was received in order to mask the request duration. But that just creates a new target, your timers and queues to delay the response. Or making the load so high, that requests take more than 20 ms.

progval 2 days ago [-]

The random times don't have to be uniformly distributed. Though it's enough for attackers to know the distribution to de-noisify it.

jjgreen 5 days ago [-]

I laugh at your spying attempts from my HD-equipped laptop, ...

progval 2 days ago [-]

The paper (https://hannesweissteiner.com/pdfs/frost.pdf) cites two earlier papers (from 2014 and 2017) that did the same for HDDs.

falsaberN1 2 days ago [-]

I got $HOME in a huge HDD because it was cheaper. I guess we belong to the cool kids club now?

gblargg 2 days ago [-]

Or mine where the entire browser profile is on a ramdisk.

lima 2 days ago [-]

Higher IO latencies in HDs might actually make this attack easier - more contention means more bits of data.

tosti 1 days ago [-]

I laugh at your spying attempts from my cURL e-mail service that I use instead of a web browser.

mrbluecoat 2 days ago [-]

For a more technical read: https://news.ycombinator.com/item?id=48345822

tovve 5 days ago [-]

Still don't really understand how it works - I put the reddit logo into your local storage and it only took 20ms to take it out again instead of 50ms so therefore you have reddit open in another tab?

nostrademons 2 days ago [-]

I assume it's something like this:

Attacking website periodically makes random reads from a large file in localStorage. Other tabs and websites open have Javascript running that periodically performs operations that will result in SSD traffic. For example, GMail has a certain polling interval to check for new mail, and each request is going to result in a cache write that makes the SSD busy and delays other conflicting IO operations. Reddit checks for new chat messages. Large memory-heavy websites get paged out of RAM.

The pattern of IO operations that a website makes creates a fingerprint of interference with the IO ops that the attacking website is doing, showing up as differing amounts of latency as the SSD is periodically busy. This fingerprint can then be reconstructed to a specific website by training a CNN on it, basically using a neural net to classify a certain pattern of delays to the IO ops that other websites are doing.

In theory it makes sense, but it seems very noisy. Anything that makes absolutely zero requests or IO operations in the background (like say HN, or most old-school text sites) wouldn't show up, and would be indistinguishable from any other zero-request site. And having other sources of IOps on the same computer - say you're running an Ethereum client that's perpetually updating the blockchain, or you're downloading a bunch of torrents, or you've got DropBox and it's syncing your directory - would introduce noise that throws off the classifier.

lxgr 11 hours ago [-]

Usually side channels get way stronger when exploited cooperatively. Think ad scripts embedded on multiple sites communicating with each other across tracking protection and other sandboxes, for example.

doodlebugging 2 days ago [-]

That's interesting. Thanks for the explanation. If I read this right this isn't as effective against spinning HD-based systems and there is a dependence on the user maintaining more than one tab as they browse?

If that's the case then my system which is still HD-based is not threatened and since I tend to close tabs and windows and just spin up a new private window for each site while clearing cookies, etc on exit then maybe this is a non-issue for me. Or maybe just block javascript too.

nostrademons 2 days ago [-]

It'd have some effectiveness against spinning HDDs because it's really just measuring contention for the I/O subsystem, but it'd likely be less because the kernel usually buffers writes to HDDs internally. But then, the kernel also usually buffers writes to SSDs, just with lower latency between the call and the data being written.

I don't think too highly about this particular threat vector - it seems like the kind of attack where you could perhaps get a working proof of concept going in the lab to write a paper and demonstrate some results, but actually using it to attack people at scale seems prohibitively noisy. People that close all their tabs when not at use are not at risk (and the data I had was that most people don't actually use browser tabs, they're very much a power-user feature). People who have disk-intensive other processes like Bittorrent or various file-syncing services aren't really at risk, because those other processes inject similar noise into the data stream. The signal in general seems weak because of buffering and differing SSD latency and so on.

puppycodes 2 days ago [-]

Thats a good explination. It does seem extremely noisy and not at all practical for fingerprinting a user compared to other methods. If you have javascript enabled assume you can be fingerprinted.

maverwa 2 days ago [-]

That’s timing the cache, that’s old stuff by know. As I understand, this writes a relatively large file („Gigabytes“) using this OPFS api, which is different from the „localStorage“ api. This seems to use actual filesystem storage on the client, instead of living completely in memory (which may be reasonable given the size of files supported). This allows to actually time SSD IOPS latency by doing random reads.

Collected enough of these samples, together with the information of what else runs on the host, put that in the ML-Blender and the result will be able to tell you, with some accuracy, from a given set of samples, what’s running on the host.

I am sure i misunderstood some things because there are so many caches and unknowns in that setup that I struggle to understand how there could be any correlation, but that’s my understanding so far.

botw44 1 days ago [-]

I was interested in this so i created a proof of concept: https://github.com/brammittendorff/opfs-ssd-timing

userbinator 2 days ago [-]

It's really not surprising that letting websites run arbitrary code on your machine, even in a sandbox, would lead to things like this.

sigmoid10 2 days ago [-]

There's no such thing as a sandbox "on your machine" when you really think about it. The code still runs on the same hardware and there are tons of ways to fiddle with said hardware that could be exploited (like rowhammer). The only "real" sandbox is fully dedicated hardware down to bare metal with zero connections to sensitive systems.

matheusmoreira 2 days ago [-]

And now that Google's web environment integrity is getting repackaged into captchas, it seems we won't even be able to try to block such things in the future...

pixel_popping 1 days ago [-]

Why would it be a new way? Tracking via timing have always existed, you can also know browsing history of someone with some DNS trick, nothing really new, article is misleading with "new way", it's literally possible since a decade.

zeafoamrun 1 days ago [-]

That's it, I'm turning off JavaScript for everything non essential from here on out.

everdrive 1 days ago [-]

Been doing that for years and do not regret it.

coldacid 1 days ago [-]

Remember when the web was (almost) all static content and all browsers had to do was show it to you? When applications were native? We really need to go back.

matltc 1 days ago [-]

Why is everything like this? One must do backflips to thwart these attempts at surveillance. I am getting pretty tired of it and about to go full Stallman

basilikum 2 days ago [-]

I think what would be more interesting is using this as a side channel for communication between different sandboxed contexts.

exabrial 2 days ago [-]

correction, websites have a way to spy on visitors: Javascript.

sillyboi 1 days ago [-]

Why don't SSDs trust websites anymore?

Because every time they open up, the site gives them the F̶R̶O̶S̶T̶ cold shoulder.

bzmrgonz 1 days ago [-]

I wonder if mozilla's container tabs blocks this type of tracking?

ptek 2 days ago [-]

Ahhh Arstechnica, I wonder if the technical article is by Dan Goodin. (It is)

I enjoyed his C Programming books for dummies series.

dbg31415 2 days ago [-]

Maybe don’t let Google decide what its browser can and can’t do on my computer…

Why do browsers need to do this? Feels like an edge case need, at best, that was likely just a cover for some power Google wanted to exploit.

mr-pink 2 days ago [-]

sounds like nonsense. i guess this works on some test environment but not in real life. you would never know that I am running tetris, for example

veunes 1 days ago [-]

[dead]

opengrass 2 days ago [-]

{first.last}@tugraz.at

Rendered at 23:27:45 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.