Why don't Jiff's strftime and strptime routines validate the format string separately from formatting or parsing a datetime?
#560
Replies: 2 comments 3 replies
-
Indeed, I have the same feeling. Your philosophical approach to this is While perhaps surprising, performance is one of the main motivators! The big thing to notice here is that Jiff doesn't have a distinction between It works for this because the formatting string grammar is incredibly simple. With respect to parsing failures, it's very rare to have them. The main Anyway... I'm rambling. For things like This is explain in more detail on the (The above 3 paragraphs feel a bit rambly and tangential, but also relevant. So
For parsing, this is expected. For formatting, it's very much intentional
So by this, I would argue that the API very specifically and intentionally
OK, now we're going to really get into the meat and potatoes of why the API is What might surprise you is that I hate hate hate the Now, who gives a fuck about POSIX? Goodness, I certainly wish it would die. So, Jiff's implementation and API are modeled on POSIX, for better or worse.
The main reason why I didn't go all the way and promise POSIX semantics is mostly There is some data that this strategy has paid off. The uutils project is now And that's kind of what it comes down. The cursed This is also why...
... is perhaps less true than you might think. The Now that I'm here, I feel like I've done a poor job of answering this question.
The last one is pretty cool and not something I touched on before. With [package]
publish = false
name = "jiff-play"
version = "0.1.0"
edition = "2024"
[dependencies]
anyhow = "1.0.102"
jiff = { version = "0.2.25", default-features = false }
[[bin]]
name = "jiff-play"
path = "main.rs"
[profile.release]
debug = trueAnd this fn main() {
let Ok(dt) = jiff::civil::DateTime::strptime(
"%Y-%m-%d %H:%M%P",
"2026-05-25 5:30pm",
) else {
eprintln!("parse failure");
std::process::exit(1);
};
println!("{}", dt.strftime("%m/%d/%Y"));
}We get: If we built up an intermediate structure internally, where would we put that? Indeed, this is likely the entire motivation behind how and why the POSIX APIs Finally, I'll note that there are parallels to regular expressions here.
Apologies for the very long winded reply here, but you asked a question that |
Beta Was this translation helpful? Give feedback.
-
|
Thank you for taking the time to write such a detailed response! It was quite interesting. I should make it clear that I didn't allow this quibble to stop me from trying out You point out that one pass is faster. Maybe we're misunderstanding each other here. It seems quite likely that the same format string is going to be applied to multiple dates. When would you need to process millions of dates each with their own unique format string? You're saying that the single-pass architecture that Neither To go as fast as possible, I'd imagine that the intermediate representation would be one contiguous block of memory (much like the format string itself). Each of the format specifiers could be encoded in a single-byte opcode. Additionally, there could be an opcode that means "the following byte is a literal" because single-byte literals between components are common. Finally, there could be an opcode that means "the following byte (or two) is a length and following that are the literal bytes". I'm quite confident that this would be faster because less work is being done on validating multi-byte sequences. Of course, a benchmark would be required to determine just how much of a difference this makes. I may end up writing that benchmark just for my own enjoyment! My proposal to have a separate validation step doesn't prevent Memory allocations might also significantly worsen the wrapper function that does validation and formatting in one step. To avoid allocating and deallocating the intermediate representation on every call, it could reuse a thread-local vector. That doesn't sound too bad but it's a bit of extra global state that wasn't there before. The alternative is to allocate and deallocate each time because if you're using that API, maybe the extra nanoseconds don't matter to you. I still believe that the error semantics and performance (of the
I would selfishly go forth despite these sacrifices because:
Of course, you have different priorities because |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I've been considering moving from
chronotojiff. So far, I've been liking what I'm reading, especially when it comes to time zones. However, one thing stood out to me. That's thestrftimeandstrptimefunctions. These functions take the format string and the data to be formatted/parsed and do the parsing of the format string together with the processing of the input data. Combining these two operations into one seems wrong to me.The format string is typically going to be static. An invalid format string is typically going to be a bug in the program rather than invalid input data that needs to be handled. An invalid format string and invalid input data are two distinct failure modes that shouldn't be conflated. I'd like to parse the format string on program startup and crash immediately if it's invalid. When formatting or parsing a date using that format, I'd like the only errors to be due to the input data, not the format string.
What if problems in the format string somehow only revealed themselves when combined with particular input data? That's probably not possible but the API allows it to be.
chronoandtimeboth make this separation in their APIs. I'm looking for something in the shape of this:Of course, performance is a side benefit to this separation but it's not my primary motivation. Although, the benchmark results in the comparison document show
jiff'sprint/strftime/oneshot/bufferbeing faster thanchrono'sprint/strftime/prebuilt/bufferwhich I found surprising.I'm interested to know your opinion on this. Is it something that you've already considered and decided against?
Beta Was this translation helpful? Give feedback.
All reactions