Tl;dr: I created sharrr.com as a proof-of-concept - this service allows you to share a file up to 100GB (technically, there is no file size limit) following a zero-knowledge encryption scheme. All code is open source on GitHub.
Disclaimer: I am not a security engineer or cryptographer. Use with caution. Feedback on security aspects or potential vulnerabilities is much appreciated.
Prerequisites & Goal
It is important to highlight the asynchronous aspect of the plan. Obviously, you could just send a file directly from A to B using FTPS or similar protocols. In my scenario, a sender (Alice) would upload a file and get a download link in return. She would then send the link to a recipient (Bob) who can download and decrypt the original file. This means the file has to be stored somehow, somewhere along the way.
One of the challenges was to prepare for a potential scenario, where all of the infrastructure, namely API, Database and S3 Storage, would be compromised at the same time. Even in such a case, it shall not be possible to decompile or reverse-engineer the original file, fulfilling the promise of zero-knowledge encryption.
Another challenge is that file encryption requires internal memory (RAM). This becomes a problem when you wanted to encrypt/decrypt a big file at once: You needed twice the file size reserved in memory. For huge files, this amount of memory is simply not available on many devices.
Since I wanted to support massive files (>50GB) another factor to consider is the single file storage limitations many providers have. (Although, AWS supports up to 5TB)
And lastly, I wanted all traces to disappear after the file has been downloaded successfully.
Based on my background as a front-end engineer and because I believe in the open web, I also wanted to create a browser-based solution that is free, transparent and accessible.
Here is how it works
The following schema shows a simplified version of the implementation:
Memory and storage limitations
The solution is to break down the files into chunks and encrypt/decrypt them separately. Each chunk is saved individually to not only circumvent storage limitations but also to increase security. (From a storage perspective each chunk is just a bunch of binary data - it is not obvious which chunks make up a specific file.) Similar to the files stored on S3, the database only contains encrypted data. There is no way to reference database entries directly to files.
Only the client with a valid link is able to connect all the dots and access the original file. It is worth noting that the master key (and alias) never leaves the browser. Both strings will be added to the fragment identifier of the download link. This commonly referred to as the hash (#) or anchor part of the URL is never sent to the server. (You can check yourself in your browser's dev tools: When you access a link, this information will not be part of the request.)
Simplified illustration of the download link (in the final implementation there is also a nonce added to the hash):
Side note: In this proof of concept everyone with the download link can access the file (once). For advanced security, one could easily build a solution that requires an additional password. To increase security with the present implementation, you may manually split the link and send the encryption key separately.
Breakdown: File Upload
Encrypting the file is the easier part: First, a file is split into smaller chunks. Those chunks then get encrypted separately using the master key and stored on S3. A reference to each chunk, together with a signature and a public key is afterwards stored in the database.
Simplified schema of the file upload process:
The download and decryption is the tricky part. Not only do we need to make sure the client is allowed to request a certain file (solved with a cryptographic signature, see blue box), but also, it is not possible to build back the file at once to save it into the download folder. (To be exact, for smaller files, one could download the file, decrypt it in memory and save it right away. But since we want to support large files, we need some magic from the Streams API and the Service Worker API.)
Simplified schema of the file download process:
A service worker is something like a proxy within your browser. It has access to the network and can e.g. intercept requests or change responses. We can interact with it by sending messages back and forth, but it doesn't have direct access to the DOM.
Within our app the service worker is designed to intercept the initial file request (described above), then fetch and decrypt the chunks and finally stream the data back within the response (basically directly into your download folder).
We have to use the "request interception" approach, b/c of a limitation in most browsers: A user has to initiate a file download with a click on
<a href="/path/to/file.txt" download>. (This will change with the File System Access API - which is unfortunately not widely available yet).
The role of the service worker:
If you want to learn more about how this works in detail, I recommend this video.
For encryption/decryption only built-in browser APIs are used, namely the Web Crypto API. As a side effect, this app won't run in legacy browsers or with older node versions. The following algorithms are being used:
AES-GCM (Advanced Encryption Standard - Galois/Counter Mode) for symmetric encryption: This is used for the master key that encrypts/decrypts all file chunks and the data stored in the database.
ECDSA (Elliptic Curve Digital Signature Algorithm) for asymmetric encryption: This is used to sign the file chunk keys in order to make sure the later download request is allowed to access a specific chunk file.
The Web Crypto API offers a set of low-level cryptographic primitives. On top of those, I created some helper functions that might be useful to others.
This project is heavily inspired by a great online community and amazing open-source projects:
Did you find this article valuable?
Support Chris by becoming a sponsor. Any amount is appreciated!