GridFS stores files in a MongoDB database. The maximum size of any one MongoDB document is 16MB. GridFS allows for much bigger files to be stored. It achieves this by breaking each file into chunks of 256K. Each fragment is stored as a separate document. One additional document is also created — this store’s metadata about the file and which documents hold each part of the file.
S3, in technology terms, is known as object storage architecture. It is often associated with Amazon Web Services, but alternatives exist, such as MinIO. Each object is a self-contained container comprising of data and meta-data stored in a structurally flat environment. No files, folders or complex relationships exist.
In both cases, S3 and GridFS store unstructured data. There are no complex relationships as you would find in a relational database.
Which is faster?
Speed and performance are going to depend on several factors. Every situation is unique so there is not a single answer. I have laid out some things to consider below.
Since GridFS is part of MongoDB, memory and processor play a significant role in its performance. If your only storing smaller files, infrequently, then GridFS may be slightly faster than Amazon S3. If your storing and retrieving larger files more frequently, then performance may become the limiting factor with Mongo.
Consider where your storage platform is located in relation to your application. If your application is hosted on Amazon, then S3 will undoubtedly be faster than transferring data over the internet to a managed MongoDB instance. If your application is hosted on-prem, MinIO will be faster than Amazon S3. MongoDB on the same network as your application will be faster than S3 or MinIO.
Which is cheaper?
Will depend on the size of files you’re storing and the rate you’re storing them at.
Processor and Memory costs
MongoDB requires more memory and processing power than S3. If you’re storing large files at a fast rate, then S3 is going to be cheaper.
If you’re storing large files, but speed is not an issue, then the equation boils down to the cost of disk space. You will have to calculate that yourself based on the price you pay for disk space vs Amazon S3 prices.
Cost of support
Amazon S3 and hosted MongoDB instances are fully managed products. If you’re hosting your MongoDB or MinIO instance internally, will you need to pay extra for support and scaling?
Data Transfer costs
In most cases, Bandwidth is going to cost more on Amazon than a self-hosted solution or for that matter a Cloud server with other providers. Since we are talking about storing large files, this is likely to add up to a significant part of your costs.
There may be indirect costs to a business. Will there be an impact on your business reputation if you can’t scale with customer demands. Amazon can scale automatically, whereas self-hosted solutions often can’t.
Which is more scalable?
Both S3 and MongoDB are capable of incredible scaling.
In terms of horizontal scaling (storage space & processing nodes), MongoDB will require more processor and memory resources. Hence, as you add further nodes, it gets more expensive than Amazon S3. MinIO hosted locally will be cheaper to scale horizontally than MongoDB. On the other hand, MinIO does not have the auto-scaling capabilities of Amazon S3 or a professionally hosted MongoDB solution.
As the pressure on your storage system increases, the cost may become more of an issue with MongoDB. MongoDB requires more processor and memory to deal with higher loads than that required by S3.
As I am sure you will agree, there is no clear winner. Which is better depends on your circumstances, requirements and budget. As I stated at the start of this article, the choice you make may require some compromises. Only you can decide which best suits your circumstances.
Rest assured that both are excellent technology choices in the right circumstances. The key to making the right decision is to ensure you carry out a thorough assessment of your requirements.