So, of course, I've thought of a bunch of things that I wish I'd added, or done differently.
A big one is that I wish I'd thought to split it into two files: the normative standards track spec that defines the scheme, and an informative document covering all the non-standard stuff in Appendix E—contentious things people do (and in many cases have done for decades) that could never be included in the main standard for political reasons but you probably need to be able to deal with if you want to interact on the open internet anyway.
I would totally use that as the title.
The reason for two files is that the core spec, being very stable, is probably not going to change much; but in contrast the informative bit, which documents the crazy stuff people do on the wacky internet, is liable to drift and warp and change over time. If we wanted to update the second part we'd have to re-release the entire document.
And now some politics: how do you justify pushing out a document that updates or obsoletes a standards track spec but doesn't actually change the spec? It's much easier to replace an informational memo.
I also wish I'd been able to find a way to better address Windows' quirks and UNC strings. Some of the non-normative appendix content used to be in the main spec, but somebody on the mailing list complained that I was giving too much attention to "Windoze" (presumably because 2017 will be the year of Linux on the desktop?) As a result, all the dumb quirks about dealing with drive letters and resolving relative references and ".." segments and all that, and how many slashes to put after "file:", were relegated to an appendix – and, I regret to say, in some cases completely forgotten about.
And so a lot of text that would have removed edge cases and resolved historical quirky behaviour—and made "file:" URIs really widely interoperable—is not actually standardised. I mean, it's written there, and sometimes I even tried to say "you probably really want to do this", but someone didn't like Windows so I couldn't make it really real.
I guess I could just write it in my blog. Yeah, that sounds cool. Here you go, an officially unofficial guide to using "file:" URIs by the guy who wrote the spec:
file:c:/foo/bar.bazare perfectly legitimate, unambiguous, and beautiful.
file:/c:/foo/bar.bazis fine, too, if you prefer that aesthetic.
file:///c:/foo/bar.bazhave been working absolutely perfectly for decades, if you don't want to rock the boat.
file://c:/foo/bar.baz– and particularly
file://c|/foo/bar.baz– are just... no. Don't do that. This isn't 1997. We have standards.
\. Ain't nobody got time for that.
file:////example.org/Qux/foo/bar.bazis obviously pointing to this file on an SMB share:
file://///example.org/Qux/foo/bar.bazis acceptable, if a bit... y'know... slashy.
file:///d:/foo/bar/baz.htmand you see a reference like
<img src="/foo/bar/pong.png">you know it should resolve to
file:///d:/foo/bar/pong.png– even if your CD is in C:\ somewhere.
<a href="/f:/oof/rab/zab.htm">resolves to
<link rel="/e:../bar.baz">is not trying to interoperate – they're looking for exploits. Don't fall for it.
file://and the next
/is confused and broken and there'll always be someone who gets it wrong, so just don't write anything in there.
<a href="/%E3%81%A1">may mean many things to many people. (
/TA~in EBCDIC, etc.) Just avoid the whole mess – use an IRI.
file:c:/reçu.txtalways means exactly that, even if it gets turned into
0043 003a 005c 0072 0065 00e7 0075 002e 0074 0078 0074in NTFS's UTF-16 encoding, or
43 3a 5c 72 65 87 75 2e 74 78 74in MS-DOS's CP-437.
<a href="~matty/.plan">doesn't mean what it does in bash, and you know it doesn't.
%SystemRoot%and all that sort of guff.
Abide by these guidelines and, while not necessarily adhering to the strictest interpretation of a Standards Track RFC, at the least you'll be a well-intentioned and interoperable member of the internet community.