Need help?
<- Back

Comments (74)

  • amluto
    There are many systems that take a native data structure in your favorite language and, using some sort of reflection, makes an on-disk structure that resembles it. Python pickles and Java’s serialization system are infamous examples, and rkyv is a less alarming one.I am quite strongly of the opinion that one should essentially never use these for anything that needs to work well at any scale. If you need an industrial strength on-disk format, start with a tool for defining on-disk formats, and map back to your language. This gives you far better safety, portability across languages, and often performance as well.Depending on your needs, the right tool might be Parquet or Arrow or protobuf or Cap’n Proto or even JSON or XML or ASN.1. Note that there are zero programming languages in that list. The right choice is probably not C structs or pickles or some other language’s idea of pickles or even a really cool library that makes Rust do this.(OMG I just discovered rkyv_dyn. boggle. Did someone really attempt to reproduce the security catastrophe that is Java deserialization in Rust? Hint: Java is also memory-safe, and that has not saved users of Java deserialization from all the extremely high severity security holes that have shown up over the years. You can shoot yourself in the foot just fine when you point a cannon at your foot, even if the cannon has no undefined behavior.)
  • duc_minh
    > Sometimes the best optimization is not a clever algorithm. Sometimes it is just changing the shape of the data.This is basically Rob Pike's Rule 5: If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident.(https://users.ece.utexas.edu/~adnan/pike.html)
  • SoftTalker
    > But SQL schemas often look like this. Columns are nullable by default, and wide tables are common.Hard disagree. That database table was a waving red flag. I don't know enough/any rust so don't really understand the rest of the article but I have never in my life worked with a database table that had 700 columns. Or even 100.
  • jamesblonde
    Here is an article I wrote this week with a section on Feldera - how it uses its incremental compute engine to compute "rolling aggregates" (the most important real-time feature for detecting changes in user behavior/pricing/anamalies).https://www.hopsworks.ai/post/rolling-aggregations-for-real-...
  • logdahl
    Strictly speaking, Isn't there still a way to express at least one Illegal string in ArchivedString? I'm not sure how to hint to the Rust compiler which values are illegal, but if the inline length (at most 15 characers) is aliased to the pointer string length (assume little-endian), wouldnt {ptr: null, len: 16} and {inline_data: {0...}, len: 16} both technically be an illegal value?I'm not saying this is better than your solution, just curious :^)
  • porise
    Why is rust allowed to reorder fields? If I know that fields are going to be generally accessed together, this prevents me from ordering them so they fit in cache lines.
  • saghm
    I feel like I'm missing something, but the article started by talking about SQL tables, and then in-memory representations, and then on-disk representation, but...isn't storing it on a disk already what a SQL database is doing? It sounds like data is being read from a disk into memory in one format and then written back to a disk (maybe a different one?) in another format, and the second format was not as efficient as the first. I'm not sure I understand why a third format was even introduced in the first place.
  • anon
    undefined
  • jim33442
    I did read the rest, but I'm stuck on the first part where their SQL table has almost a thousand cols. Why so many?
  • anon
    undefined
  • astrostl
    I have mixed feelings about it, but I'm going to fire somebody tomorrow for using a struct just to prove a point to the author.
  • arcrwlock
  • SigmundA
    Looks like they just recreated a tuple layout in rust with null bit map and everything, next up would be storing them in pages and memmap the pages.https://www.postgresql.org/docs/current/storage-page-layout....
  • everyone
    Just cus structs and classes work differently, and classes are much more common. I tend to make everything a class, unless there is a really good reason to make it a struct.
  • anon
    undefined
  • dyauspitr
    No one has written a struct in 10 years.