JuiceFS is a distributed POSIX file system built on top of Redis and S3

<- Back

JuiceFS is a distributed POSIX file system built on top of Redis and S3

tosh

Comments (67)

wgjordan
Related, "The Design & Implementation of Sprites" [1] (also currently on the front page) mentioned JuiceFS in its stack:> The Sprite storage stack is organized around the JuiceFS model (in fact, we currently use a very hacked-up JuiceFS, with a rewritten SQLite metadata backend). It works by splitting storage into data (“chunks”) and metadata (a map of where the “chunks” are). Data chunks live on object stores; metadata lives in fast local storage. In our case, that metadata store is kept durable with Litestream. Nothing depends on local storage.[1] https://news.ycombinator.com/item?id=46634450
staticassertion
Do people really trust Redis for something like this? I feel like it's sort of pointless to pair Redis with S3 like this, and it'd be better to see benchmarks with metadata stores that can provide actual guarantees for durability/availability.Unfortunately, the benchmarks use Redis. Why would I care about distributed storage on a system like S3, which is all about consistency/durability/availability guarantees, just to put my metadata into Redis?It would be nice to see benchmarks with another metadata store.
willbeddow
Juice is cool, but tradeoffs around which metadata store you choose end up being very important. It also writes files in it's own uninterpretable format to object storage, so if you lose the metadata store, you lose your data.When we tried it at Krea we ended up moving on because we couldn't get sufficient performance to train on, and having to choose which datacenter to deploy our metadata store on essentially forced us to only use it one location at a time.
hsn915
This is upside down.We need a kernel native distributed file system so that we can build distributed storage/databases on top of it.This is like building an operating system on top of a browser.
mattbillenstein
The key I think with s3 is using it mostly as a blobstore. We put the important metadata we want into postgres so we can quickly select stuff that needs to be updated based on other things being newer. So, we don't need to touch s3 that often if we don't need the actual data.When we actually need to manipulate or generate something in Python, we download/upload to S3 and wrap it all in a tempfile.TemporaryDirectory() to cleanup the local disk when we're done. If you don't do this, you end up with a bunch of garbage eventually in /tmp/ you need to deal with.We also have some longer-lived disk caches and using the data in the db and a os.stat() on the file we can easily know if the cache is up to date without hitting s3. And this cache, we can just delete stuff that's old wrt os.stat() to manage the size of it since we can always get it from s3 again if needed in the future.
tuhgdetzhh
If tested various Posix FS projects over the years and everyone has their shortcomings in one way or the other.Although the maintainers of these projects disagree, I mostly consider them as a workaround for smaller projects. For big data (PB range) and critical production workloads I recommend to bite the bullet and make your software nativley S3 compatible without going over a POSIX mounted S3 proxy.
eru
Distributed filesystem and POSIX don't go together well.
sabslikesobs
See also their User Stories: https://juicefs.com/en/blog/user-storiesI'm not an enterprise-storage guy (just sqlite on a local volume for me so far!) so those really helped de-abstractify what JuiceFS is for.
jeffbee
It is not clear that pjdfstest establishes full POSIX semantic compliance. After a short search of the repo I did not see anything that exercises multiple unrelated processes atomically writing with O_APPEND, for example. And the fact that their graphic shows applications interfacing with JuiceFS over NFS and SMB casts further doubt, since both of those lack many POSIX semantic properties.Over the decades I have written test harnesses for many distributed filesystems and the only one that seemed to actually offer POSIX semantics was LustreFS, which, for related reasons, is also an operability nightmare.
Plasmoid
I was actually looking at using this to replace our mongo disks so we could easily cold store our data
Eikon
ZeroFS [0] outperforms JuiceFS on common small file workloads [1] while only requiring S3 and no 3rd party database.[0] https://github.com/Barre/ZeroFS[1] https://www.zerofs.net/zerofs-vs-juicefs
IshKebab
Interesting. Would this be suitable as a replacement for NFS? In my experience literally everyone in the silicon design industry uses NFS on their compute grid and it sucks in numerous ways:* poor locking support (this sounds like it works better)* it's slow* no manual fence support; a bad but common way of distributing workloads is e.g. to compile a test on one machine (on an NFS mount), and then use SLURM or SGE to run the test on other machines. You use NFS to let the other machines access the data... and this works... except that you either have to disable write caches or have horrible hacks to make the output of the first machine visible to the others. What you really want is a manual fence: "make all changes to this directory visible on the server"* The bloody .nfs000000 files. I think this might be fixed by NFSv4 but it seems like nobody actually uses that. (Not helped by the fact that CentOS 7 is considered "modern" to EDA people.)