Do we think we really need to allow caching? From what are we trying to protect the backend systems? Feels like it would be easier to disallow caching (or recommend against). I understand the sense that a smaller company could be overwhelmed by an attacker or ill-configured legit sender, but I donât know how much caching will really help there. The caching server would be similarly overwhelmed I would imagine, and be unable to serve policies to any other requesting systems.
--
Alex Brotman
Sr. Engineer, Anti-Abuse
Comcast
From: Uta [mailto:uta-***@ietf.org] On Behalf Of Daniel Margolis
Sent: Thursday, October 05, 2017 3:53 AM
To: ***@ietf.org
Subject: Re: [Uta] Updated MTA-STS & TLSRPT
Well, so this isn't that big a change or that weird, and I think we can mostly reconcile here. :)
In section 3.3, we already say this:
Senders may wish to rate-limit the frequency of attempts to fetch the
HTTPS endpoint even if a valid TXT record for the recipient domain
exists. In the case that the HTTPS GET fails, we suggest
implementions may limit further attempts to a period of five minutes
or longer per version ID, to avoid overwhelming resource-constrained
recipients with cascading failures.
So we are already opening the door to short-term caching-*like* behavior in some cases, for basically the reasons you describe (throttling, though in the section 3.3. case, to avoid repeatedly hitting an endpoint in case of *failure*).
First, we all agree MTAs should not use HTTP caching, since it's redundant and confusing.
If we want to allow intermediate proxies to use HTTP caching, they should do so only if the cache-control headers allow it, or else their behavior will be opaque to the real server.
The real server, if it allows HTTP caching, should probably not allow caching of any meaningful period of time--say, 1 minute--and should probably in practice make the cache lifetime much shorter than the DNS TTL, since the caching then becomes mostly immaterial. (I'm fudging a bit here, but it makes operational considerations easier: the HTTP cache lifetime is fetch-time + cache lifetime, whereas the DNS cache lifetime is resolve-time + TTL; since resolve-time and fetch-time are quite close together, making the DNS TTL longer than the HTTP cache lifetime removes the need to seriously consider the HTTP caching.)
As Lief said, what's the concrete language here?
"HTTP caching MAY be used by reverse proxies if allowed by the Cache-Control headers on the HTTPS endpoint. Hosts who are serving a policy that is delegated to by other domains SHOULD limit their cache lifetimes to values under one minute; further, when rotating deployed policies, they should consider that the new HTTP policy may not be visible to MTAs fetching the delegator domain's policy until the HTTP cache lifetime has expired."
Something like that?
Post by Daniel MargolisPost by Viktor DukhovniSo I think that cache control is simply not applicable to the MTA, and
there's no need to "prohibit" it as such.
I mean, I agree that it's unlikely an MTA would *want* to do this, but I
think it's useful that we (already have) said "no honoring
cache-control".
Agreed, we should keep on saying "no honoring cache-control" for
the MTA, however redundant that may be in practice.
Post by Daniel MargolisAre you suggesting we just say that cache-control can be honored up to a
value of 60s? I think that's fine.
I was thinking that larger values should generally not be published,
but I am open to some client-side (reverse-proxy) limits if you think
that's appropriate.
Post by Daniel MargolisBut note (as I said above) that caching
that is less than the max-age can still be problematic; it means that
someone who sees updated DNS but an old cached HTTP endpoint sees the old
policy. So the interplay between HTTP caching and DNS TTLs are weird, no?
Good point, after updating the HTTPS policy, one MUST wait at least
the duration of the cache-ttl, before making visible DNS changes.
The same also applies when HTTPS changes are made in some content
management system and take a bit of time to propagate out to the
entire server farm. The idea is to ensure that the most recent
visible change in the DNS id occurs more recently than the most
recent visible change in the underlying policy.
Post by Daniel MargolisPost by Viktor DukhovniThe explicit presence of "max_age" makes possible and *invites* the possibility
of using a shorter cache-control lifetime for use-cases like reverse proxies.
OK. So to be clear, is your suggestion just to allow cache-control headers *as
long as the cache lifetime is less than the policy max_age*? Or up to a
value of 60s? Or something else?
Something on the order of 60s feels about right to me, it could be
even shorter. Basically, anything that allows a reverse proxy to
consolidate a high-volume streamm of closely-spaced requests is
useful, and once it is only doing one upstream request every few
seconds, most of the gain is achieved. There's little benefit to
pushing it to one upstream request every ten minutes or every hour.
So I guess I'd like to see providers offer a cache-ttl of at least
5s and at most 60s, with the latter limit recommended to be enfoced
by the reverse proxy as an upper bound. The proxy MUST start the
clock from when it *initiates* the upstream connection, not when
it receives the payload, as network delays could otherwise lead to
serving stale data for fresh requests.
A related issue arises for MTAs, if two separate threads are doing
policy retrieval, it would be bad if policy data obtained by one
"thread" that took a long time to arrive displaced more recent data
obtained by another "thread". And worse if the MTA fails to reliably
associate the TXT id that caused each thread to run with the policy
retrieved by that thread. That is, it would be bad to squirrel
away the latest observed DNS TXT id in some shared state and then
separately obtain a policy, and at that time associate the policy
with a TXT id that may be a later one obtained by some other thread.
There needs to be a proper causal ordering of policy data and TXT
ids, where an MTA never ends up a TXT id with a policy that was
already replaced when the TXT id was published.
Post by Daniel MargolisI think there is some oddity around cache lifetimes greater than the DNS
TTL, though...should we worry about that? Or just advise against it on the
grounds that it can result in confusion, but leave it up to deployers if
they wish to do it?
I don't think that's an issue, provide the TXT id is never visible
before the new policy is in place, and the old has been flushed
from any HTTP caches. When TXT TTL expires, the client will just
go fetch the latest. Seeing a slightly stale TXT is never a problem,
what could be a problem is seeing a stale policy in association with
a fresh TXT.
--
Viktor.
_______________________________________________
Uta mailing list
***@ietf.org<mailto:***@ietf.org>
https://www.ietf.org/mailman/listinfo/uta