[Uta] Eric Rescorla's Discuss on draft-ietf-uta-mta-sts-17: (with DISCUSS and COMMENT)

On Thu, May 03, 2018 at 06:14:44PM -0700, Eric Rescorla wrote:

[ Though I am not one of the authors, I was actively involved in
the evolution of the draft. Some its features are in part are
result of my influence, both based on prior work with DANE, and
as a potential implementor of the specification in a future
Postfix release. The comments below may shed light on the
rationale for some of the choices reflected in the draft. ]

Post by Eric Rescorla
S 3.2.

What if I receive a policy with a lifetime less than that remaining in
the previously received policy

Good question, I don't recall any discussion of this. Since the
policy might be "none", canceling the cached policy entirely, it
seems logical to allow the new "max_age" to end before the old
"max_age". I hope the authors agree.

How many labels can be prepended here. Is "a.b.c.example.net" valid?

Another good observation, I am having trouble finding normative
text in the body of the draft that makes this clear. IIRC, as
evidenced by a comment in the Appendix B pseudo-code, the intention
is to support just a single label:

// Leading '.' matches a wildcard against the first part, i.e.
// .example.com matches x.example.com but not x.y.example.com.

The text should be more clear, unless we both missed where this is
specified.

Post by Eric Rescorla
S 10.2.

I'm surprised that you don't note that if you use DNSSEC (and the
client validates), you are in general resistant to this form of
attack.

With MTA-STS, hardening the DNS is not enough, the policy does not
take effect until it is first verified to work. First-contact
lookup failures for the TXT record do not cause email to be deferred.

Indeed with MTA-STS, some MTAs may do *background* policy retrieval,
and the first few messages to a destination may go out unprotected.

For a downgrade-resistant mechanism, a domain can use DANE SMTP
(RFC7672). If the destination domain is signed, the first step in
that direction is already taken.

2. That at least one of the policy's "mx" patterns matches at least
one of the identities presented in the MX's X.509 certificate, as
described in "MX Certificate Validation".

IMPORTANT: This doesn't seem like quite what you want. Consider
mx: mx1.example.com
mx: mx2.example.com
And I then attempt to send to mx1.example.com, send SNI=mx1.example.com,
and get a cert that is only valid for mx2.example.com.

[ This was discussed extensively in the WG. This part of the design
is substantially my doing... ]

An MiTM attacker can direct the traffic to any MX host of his choice
by blocking TCP SYNs, or generating RST packets for traffic to all
the other MXs, causing the desired MX host to be the only one the
client can reach. Also, for a large fraction of domains a wildcard
certificate, or a certificate with all the names is used. For
example, below are the SANs from the certificate for gmail.com:

DNS:mx.google.com
DNS:alt1.aspmx.l.google.com
DNS:alt1.gmail-smtp-in.l.google.com
DNS:alt1.gmr-smtp-in.l.google.com
DNS:alt2.aspmx.l.google.com
DNS:alt2.gmail-smtp-in.l.google.com
DNS:alt2.gmr-smtp-in.l.google.com
DNS:alt3.aspmx.l.google.com
DNS:alt3.gmail-smtp-in.l.google.com
DNS:alt3.gmr-smtp-in.l.google.com
DNS:alt4.aspmx.l.google.com
DNS:alt4.gmail-smtp-in.l.google.com
DNS:alt4.gmr-smtp-in.l.google.com
DNS:aspmx.l.google.com
DNS:aspmx2.googlemail.com
DNS:aspmx3.googlemail.com
DNS:aspmx4.googlemail.com
DNS:aspmx5.googlemail.com
DNS:gmail-smtp-in.l.google.com
DNS:gmr-mx.google.com
DNS:gmr-smtp-in.l.google.com

So trying to make sure that you're reaching the MX host
you think you're reaching and not one of the others is
largely pointless and often a lost cause.

Post by Eric Rescorla
This seems like it's extremely undesirable and might be the basis for some kind of attack.

See above. If the MX host has a certificate that matches the
client's SNI, it'll may return it, even if that's one of the other
MX hosts. If it does not return a matching certificate, the "attack"
fails.

Post by Eric Rescorla
You look up the MXes in the DNS.
You select one that must match one of the things in the mx list in the STS

Preemptive removal of non-matching MX hosts is liable (in sloppy
implementations, and I expect enough to be sloppy) to cause routing
loops, when a backup MX host, not after removing itself early from
the list, fails to eliminate worse priority MX hosts. It also
requires all sites to duplicate MX host updates from DNS into the
STS policy, disallowing the "low-maintenance" ".example.com" form.

Post by Eric Rescorla
You then connect to the MX and provide its SNI.
The certificate must match the domain you provided in the SNI

The WG considered this issue, and in the end accepted the current
design. I hope this helps.

The certificate MAY be checked for revocation via the Online
Certificate Status Protocol (OCSP) [RFC6960], certificate revocation
lists (CRLs), or some other mechanism.

Why is revocation only MAY?

Looking at e.g. the X.509 certificate for Gmail, I don't see a
"must staple OCSP" extension. So we get no meaningful security
from OCSP stapling, an attacker who misappropriates the private
key will not staple OCSP responses.

I have no intention of building an HTTP client into the Postfix
SMTP client to download CRLs from various CAs that remote peers
might use. Full CRLs might at least be cached on a per-CA basis,
while per-certificate OCSP requires a connection to the CA for each
new certificate.

I'm afraid I see too little value in CRLs to consider CRL support
in Postfix. The OS platforms that Postfix runs on don't deliver
a full intermediate CA store with regular updates of the associated
CRLs. Doing CRL management in each application is IMHO impractical.

In short, I have not implemented and don't expect to implement CRL
support in Postfix.

[ I'll endeavour to leave further comments on the above topics to
the authors. I might still chime in if a new topic comes up
where I'm one of the culprits responsible for the current text. ]
--
Viktor.

P.S. (digression on what I'd like to see replace CRLs)

If we want effective revocation for WebPKI, let's fully automate
certificate roll-over (ACME is a good start) and drive down the
maximum certificate lifetimes to be short enough that most likely
you'd have a hard time noticing that your key is compromised any
faster, and getting the CA to revoke the cert, getting sites that
cache CRLs to get fresh CRLs, ...

I'd like to see one to two week certificate lifetimes, and X.509
stacks that can reload the certificates and keys without restarting
the server.

Eric Rescorla

2018-05-04 12:11:20 UTC

Post by Eric Rescorla
S 10.2.

mode, to allow clean MTA-STS removal, as described in Section

8.3.)

Resistance to downgrade attacks of this nature--due to the

ability to

authoritatively determine "lack of a record" even for non-
participating recipients--is a feature of DANE, due to its use of
DNSSEC for policy discovery.

I'm surprised that you don't note that if you use DNSSEC (and the
client validates), you are in general resistant to this form of
attack.

With MTA-STS, hardening the DNS is not enough, the policy does not
take effect until it is first verified to work. First-contact
lookup failures for the TXT record do not cause email to be deferred.
Indeed with MTA-STS, some MTAs may do *background* policy retrieval,
and the first few messages to a destination may go out unprotected.
For a downgrade-resistant mechanism, a domain can use DANE SMTP
(RFC7672). If the destination domain is signed, the first step in
that direction is already taken.

2. That at least one of the policy's "mx" patterns matches at

least

one of the identities presented in the MX's X.509

certificate, as

described in "MX Certificate Validation".

[ This was discussed extensively in the WG. This part of the design
is substantially my doing... ]
An MiTM attacker can direct the traffic to any MX host of his choice
by blocking TCP SYNs, or generating RST packets for traffic to all
the other MXs, causing the desired MX host to be the only one the
client can reach. Also, for a large fraction of domains a wildcard
certificate, or a certificate with all the names is used. For
DNS:mx.google.com
DNS:alt1.aspmx.l.google.com
DNS:alt1.gmail-smtp-in.l.google.com
DNS:alt1.gmr-smtp-in.l.google.com
DNS:alt2.aspmx.l.google.com
DNS:alt2.gmail-smtp-in.l.google.com
DNS:alt2.gmr-smtp-in.l.google.com
DNS:alt3.aspmx.l.google.com
DNS:alt3.gmail-smtp-in.l.google.com
DNS:alt3.gmr-smtp-in.l.google.com
DNS:alt4.aspmx.l.google.com
DNS:alt4.gmail-smtp-in.l.google.com
DNS:alt4.gmr-smtp-in.l.google.com
DNS:aspmx.l.google.com
DNS:aspmx2.googlemail.com
DNS:aspmx3.googlemail.com
DNS:aspmx4.googlemail.com
DNS:aspmx5.googlemail.com
DNS:gmail-smtp-in.l.google.com
DNS:gmr-mx.google.com
DNS:gmr-smtp-in.l.google.com
So trying to make sure that you're reaching the MX host
you think you're reaching and not one of the others is
largely pointless and often a lost cause.

But not everyone is configured this way.

Post by Eric Rescorla
This seems like it's extremely undesirable and might be the basis for

some kind of attack.
See above. If the MX host has a certificate that matches the
client's SNI, it'll may return it, even if that's one of the other
MX hosts. If it does not return a matching certificate, the "attack"
fails.

This might be true, but this kind of informal reasoning is notoriously
prone to error.
We have a general pattern for TLS certificate verification, which you are
deviating
from, and we then need to analyze in detail. I'm not seeing any good reason
for
that.

Post by Eric Rescorla
You look up the MXes in the DNS.
You select one that must match one of the things in the mx list in the

STS
Preemptive removal of non-matching MX hosts is liable (in sloppy
implementations, and I expect enough to be sloppy) to cause routing
loops, when a backup MX host, not after removing itself early from
the list, fails to eliminate worse priority MX hosts.

I don't understand this claim.

It also

requires all sites to duplicate MX host updates from DNS into the
STS policy, disallowing the "low-maintenance" ".example.com" form.

I don't see why this would be true. You publish .example.com and
then you modify the MX requires at will. provided that they all end
in .example.com.

Post by Eric Rescorla
The certificate MAY be checked for revocation via the Online

Certificate Status Protocol (OCSP) [RFC6960], certificate

revocation

lists (CRLs), or some other mechanism.

Why is revocation only MAY?

You seem to be omitting the obvious answer: regular OCSP.

-Ekr

Viktor Dukhovni

2018-05-04 14:41:13 UTC

[ Re-ordered for clarity. Hope the below adds some context. ]

Post by Viktor Dukhovni
Preemptive removal of non-matching MX hosts is liable (in sloppy
implementations, and I expect enough to be sloppy) to cause routing
loops, when a backup MX host, not after removing itself early from
the list, fails to eliminate worse priority MX hosts.
I don't understand this claim.

A sending MTA might be a non-primary MX host for a domain, that
is trying to reach a better (lower) preference MX host. If it
prunes the MX RRset based on the STS policy, *before* dropping
all worse (higher) preference MX hosts, it is liable to create
a mail routing loop, by not taking into account the fact it is
one of the MX hosts for the destination. Ideally the domain's
MX RRset should not contain any names not matched by the STS
policy, but reality is sometimes different.

If the meaning of the matching field were changed to be an
MX hostname pattern, rather than a presented-identifet (RFC6125)
pattern, then we'd need rather prominent warnings in the
text about routing loop avoidance.

Post by Viktor Dukhovni
So trying to make sure that you're reaching the MX host
you think you're reaching and not one of the others is
largely pointless and often a lost cause.

But not everyone is configured this way.

Yes, some domains have distinct per-MX certificates. Even then,
an MiTM attacker can still restrict traffic to any MX host of
his/her choice, but if the name matching were more strict indeed
the sending MTA would then know *which* MX host this was more
reliably than otherwise.

Post by Viktor Dukhovni
See above. If the MX host has a certificate that matches the
client's SNI, it'll may return it, even if that's one of the other
MX hosts. If it does not return a matching certificate, the "attack"
fails.

This might be true, but this kind of informal reasoning is notoriously
prone to error. We have a general pattern for TLS certificate verification,
which you are deviating from, and we then need to analyze in detail. I'm
not seeing any good reason for that.

Historically, because MX lookups are unauthenticated DNS, trusting
the MX hostname was not a good option. So SMTP senders would validate
the next-hop domain, rather than the MX hostname. Correspondingly,
the certificates used by MX hosts would not necessarily match the
MX hostname, some matched only the (email) destination domain.
These were called UCC certificates by some.

Of course MTA-STS is new territory, and one might require suitable
new certificates for that, that always match the MX hostname. The
current draft is more forgiving.

Post by Viktor Dukhovni
It also
requires all sites to duplicate MX host updates from DNS into the
STS policy, disallowing the "low-maintenance" ".example.com" form.

I don't see why this would be true. You publish .example.com and
then you modify the MX requires at will. provided that they all end
in .example.com.

Yes, that's true, provided the field remains a pattern. It would
invite the routing loop mis-optimization, not clear how effective
the text can be in the face of lazy implementors who just read some
TL;DR summary and implement without much thought. The presented-
identifier design is less prone to getting that wrong...

Post by Viktor Dukhovni
In short, I have not implemented and don't expect to implement CRL
support in Postfix.

You seem to be omitting the obvious answer: regular OCSP.

I did mention OCSP, I have problems with it:

* When OCSP lookups temp-fail, my impression is that most
clients generally continue processing. This obviates
the security benefits of OCSP. Otherwise the CA OCSP
server becomes a single point of failure I'd prefer
to avoid.

* One of goals of DANE and MTA-STS is to increase email
transport privacy. Leaking the (sender-domain,
recipient-domain) pairs to a new third party is in
conflict with that goal.

Hope that helps.

--
Viktor.

Eric Rescorla

2018-05-04 15:45:32 UTC

Post by Viktor Dukhovni
[ Re-ordered for clarity. Hope the below adds some context. ]

A sending MTA might be a non-primary MX host for a domain, that
is trying to reach a better (lower) preference MX host. If it
prunes the MX RRset based on the STS policy, *before* dropping
all worse (higher) preference MX hosts, it is liable to create
a mail routing loop, by not taking into account the fact it is
one of the MX hosts for the destination. Ideally the domain's
MX RRset should not contain any names not matched by the STS
policy, but reality is sometimes different.
If the meaning of the matching field were changed to be an
MX hostname pattern, rather than a presented-identifet (RFC6125)
pattern, then we'd need rather prominent warnings in the
text about routing loop avoidance.

Well, in general when STS is misconfigured you can have problems.
I don't see that this case is sufficiently important to go away from
standard TLS semantics.

Post by Viktor Dukhovni
In short, I have not implemented and don't expect to implement CRL

Post by Viktor Dukhovni
support in Postfix.

You seem to be omitting the obvious answer: regular OCSP.

* When OCSP lookups temp-fail, my impression is that most
clients generally continue processing. This obviates
the security benefits of OCSP. Otherwise the CA OCSP
server becomes a single point of failure I'd prefer
to avoid.
* One of goals of DANE and MTA-STS is to increase email
transport privacy. Leaking the (sender-domain,
recipient-domain) pairs to a new third party is in
conflict with that goal.

OSCP stapling (w/o must-staple) significantly decreases the privacy
load here without introducing brittleness. And of course there are
other mechanisms, such as CRLsets.

-Ekr

Post by Viktor Dukhovni
Hope that helps.
--
Viktor.

Viktor Dukhovni

2018-05-04 16:13:03 UTC

Post by Viktor Dukhovni
If the meaning of the matching field were changed to be an
MX hostname pattern, rather than a presented-identifet (RFC6125)
pattern, then we'd need rather prominent warnings in the
text about routing loop avoidance.

Well, in general when STS is misconfigured you can have problems.
I don't see that this case is sufficiently important to go away from
standard TLS semantics.

For the record, I'm concerned about implementation pitfalls,
not misconfiguration. A domain where not all the MX hosts
are not listed in the STS policy is "normal" is not
misconfigured per-se, STS-aware clients would send only
via the secure MX hosts, other clients may use the full
set. This is not a recommended configuration, but it
should work, provided at least one best-preference MX host
is listed.

The basic idea is that STS is there to secure mail routing,
not trump it. As much as possible mail routing should continue
to be based on the MX host names. An MX host not listed in the
policy might never-the-less possess a certificate matching the
policy (if the policy specifies presented-id patterns rather than
MX host patterns).

Which is not to say that alternative designs can't work, they'd
emphasize doing TLS "by-the-book" over doing SMTP "by-the-book".
My instinct is to do SMTP "by the book", the goal here is to deliver
email, securely when possible.

This protocol is an opportunistic upgrade from cleartext to
unauthenticated TLS to authenticated TLS when STS policy is
located and/or cached, some caution may be appropriate to not
over-optimize for security at the expense of operational
robustness. Especially in the email space, fragile security
gets turned off, RFC7435 and all that...

One related observation (thanks for the hard questions that lead
to the insight), perhaps worth mentioning in Security Considerations,
is that with MTA-STS an attacker who can forge MX records or address
RRsets of MX hosts can cause mail to bounce when the sender finds no
A/AAAA records for any of the MX hosts. The reverse path may not be
STS protected, and the bounce may return to the sender in the clear.

An implementation that naively filters the MX RRset first,
before eliminating MX hosts at the same or worse preference
than the sending host is buggy, and I think this bug is quite
likely. These days few read a complete document cover to cover,
we tend read the bits we think we need. Information overload and
all that.

So the warning about MX loops would likely be needed in
multiple places in the document to make MX patterns safer
for implementors with a typical attention span.

If the authors, IESG, the WG participants reading this, ...
decide to go back to MX host patterns at this point, I
won't stand in the way, I would just ask for prominent
warnings about MX RRset truncation at the sending host's
own preference (when found in the original MX RRset, forged
or not) and above happening BEFORE any policy filtering of
the MX RRset.

--
Viktor.

Daniel Margolis

2018-05-04 18:25:23 UTC

Whoah. Long thread.

For the record, I believe it's trivial to implement the hostname filtering
without applying it to the MX selection loop (and I think I've made this
observation before): if an invalid certificate is (as it must be) detected
after connecting to the chosen MX candidate (and thus cannot be used to
"prefilter" the candidate list), then, similarly, one can merely reject MX
candidates after selecting them (i.e. without modifying the loop/candidate
logic) and simulate the same control flow. That said, I always read
Viktor's argument as being that by making this a check against the
presented certificate it ensures implementers do not modify the candidate
selection logic.

I also always felt a bit ambivalent about this entire discussion, insofar
as we are trying to design so that implementers of validating MTAs--of
which there aren't all that many--don't make mistakes. Both designs run a
risk of hypothetical mistakes, either in the wildcard-to-wildcard matching
or in the MX loop traversal. But neither mistake is applicable to system
administrators, but only to the much rarer set of MTA authors. This doesn't
mean we shouldn't consider it, of course, and I think the concerns voiced
are valid--but it still seems significant to me to keep that in mind.

I think that actually lends itself to documentary fixes--i.e., calling out
the risks and potential mis-implementations in either strategy for
uncareful readers.

Well, in general when STS is misconfigured you can have problems.
I don't see that this case is sufficiently important to go away from
standard TLS semantics.

For the record, I'm concerned about implementation pitfalls,
not misconfiguration. A domain where not all the MX hosts
are not listed in the STS policy is "normal" is not
misconfigured per-se, STS-aware clients would send only
via the secure MX hosts, other clients may use the full
set. This is not a recommended configuration, but it
should work, provided at least one best-preference MX host
is listed.
The basic idea is that STS is there to secure mail routing,
not trump it. As much as possible mail routing should continue
to be based on the MX host names. An MX host not listed in the
policy might never-the-less possess a certificate matching the
policy (if the policy specifies presented-id patterns rather than
MX host patterns).
Which is not to say that alternative designs can't work, they'd
emphasize doing TLS "by-the-book" over doing SMTP "by-the-book".
My instinct is to do SMTP "by the book", the goal here is to deliver
email, securely when possible.
This protocol is an opportunistic upgrade from cleartext to
unauthenticated TLS to authenticated TLS when STS policy is
located and/or cached, some caution may be appropriate to not
over-optimize for security at the expense of operational
robustness. Especially in the email space, fragile security
gets turned off, RFC7435 and all that...
One related observation (thanks for the hard questions that lead
to the insight), perhaps worth mentioning in Security Considerations,
is that with MTA-STS an attacker who can forge MX records or address
RRsets of MX hosts can cause mail to bounce when the sender finds no
A/AAAA records for any of the MX hosts. The reverse path may not be
STS protected, and the bounce may return to the sender in the clear.
An implementation that naively filters the MX RRset first,
before eliminating MX hosts at the same or worse preference
than the sending host is buggy, and I think this bug is quite
likely. These days few read a complete document cover to cover,
we tend read the bits we think we need. Information overload and
all that.
So the warning about MX loops would likely be needed in
multiple places in the document to make MX patterns safer
for implementors with a typical attention span.
If the authors, IESG, the WG participants reading this, ...
decide to go back to MX host patterns at this point, I
won't stand in the way, I would just ask for prominent
warnings about MX RRset truncation at the sending host's
own preference (when found in the original MX RRset, forged
or not) and above happening BEFORE any policy filtering of
the MX RRset.
--
Viktor.

Alberto Bertogli

2018-05-04 14:56:12 UTC

2. That at least one of the policy's "mx" patterns matches at least
one of the identities presented in the MX's X.509 certificate, as
described in "MX Certificate Validation".

[ This was discussed extensively in the WG. This part of the design
is substantially my doing... ]

For ease of reference, these are some of those discussions where people
(including me) raised concerns about the custom certificate matching:

https://www.ietf.org/mail-archive/web/uta/current/msg02195.html

https://www.ietf.org/mail-archive/web/uta/current/msg01922.html

https://www.ietf.org/mail-archive/web/uta/current/msg02308.html

Thanks,
Alberto

Daniel Margolis

2018-05-06 16:55:09 UTC

Hey Eric,

Thanks for the valuable comments. I've responded to most of them here:
https://mozphab-ietf.devsvcdev.mozaws.net/D4010. The revision containing
fixes can be seen at https://github.com/mrisher/smtp-sts/pull/220. (I will
let another author review my changes before merging and submitting a new
official draft.)

Comments I was unable to resolve:

* https://mozphab-ietf.devsvcdev.mozaws.net/D4010#inline-3713: How do you
suggest we clarify the terminology ("host" and "Policy Domain")?
* https://mozphab-ietf.devsvcdev.mozaws.net/D4010#inline-3716: Any
suggestions on clarifying that any max-age is valid?

I think there are two larger comments unresolved here, as well:

1. Certificate revocation (and "MAY"). My read on this is that revocation
is not widely mandated (e.g. popular Web browsers don't necessarily do it
using standard mechanisms!), some mechanisms (e.g. OCSP) don't provide the
security guarantees we would want, and so this is too muddied a space to
mandate specific behavior. As Viktor noted, some MTA developers may be very
opposed. My preference here is somewhat strongly to leave this as-is, for
those reasons.

2. Why is the "mx" pattern matched against the SANs and not the MX records
themselves? As Viktor noted and I commented briefly in passing, we debated
this a *lot* before. One point here is that this is only visible to MTA
implementors; sysadmins who mistakenly believe the "mx" field should match
the DNS records (which should themselves match the servers' certificates)
will end up making their configurations valid per the actual specification.
In other words, "match the policy against the SAN" matches a superset of
conditions which are valid in the alternative ("match the policy against
the MX records and match those records against the certificate").
Personally I would consider this edit to have been a compromise--it was not
and is still not my first choice--but, given it is the status quo, I am
fairly loath to change it.

On these points--especially #2--I continue to defer to the guidance of the
chairs on how best to resolve such issues.

Hope that helps. More feedback is welcome.

Post by Eric Rescorla
Eric Rescorla has entered the following ballot position for
draft-ietf-uta-mta-sts-17: Discuss
When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)
Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
for more information about IESG DISCUSS and COMMENT positions.
https://datatracker.ietf.org/doc/draft-ietf-uta-mta-sts/
----------------------------------------------------------------------
----------------------------------------------------------------------
https://mozphab-ietf.devsvcdev.mozaws.net/D4010
DETAIL
S 3.3.

Why is revocation only MAY?
S 4.

described in "MX Certificate Validation".

This doesn't seem like quite what you want. Consider the case where
S 5.

What exactly does this mean? That you have to do HTTPS or just do a
new DNS resolution despite the TTL?
S 8.2.

to the hosting organization. This can be done either by setting the
"mta-sts" record to an IP address or CNAME specified by the hosting
organization and by giving the hosting organization a TLS

certificate

which is valid for that host, or by setting up a "reverse proxy"
(also known as a "gateway") server that serves as the Policy

Domain's

policy the policy currently served by the hosting organization.

What certificate do I expect in this case?
----------------------------------------------------------------------
----------------------------------------------------------------------
S 1.

o whether MTAs sending mail to this domain can expect PKIX-
authenticated TLS support
o what a conforming client should do with messages when TLS cannot
be successfully negotiated

It would be nice if you stated here that you publish them in the DNS.
S 3.2.

the

path is "https://mta-sts.example.com/.well-known/mta-sts.txt".

Nit: "other than"
S 3.2.

charset=us-ascii are ignored. Additional "Content-Type" parameters
are also ignored.
o "version": (plain-text). Currently only "STSv1" is supported.

What does "plain-text" mean? I don't see a definition,
S 3.2.

o "max_age": Max lifetime of the policy (plain-text non-negative
integer seconds, maximum value of 31557600). Well-behaved

clients

SHOULD cache a policy for up to this value from last policy fetch
time. To mitigate the risks of attacks at policy refresh time,

is expected that this value typically be in the range of weeks or
greater.

What if I receive a policy with a lifetime less than that remaining in
the previously received policy
S 3.2.

indicates that mail for this domain might be handled by any MX with

certificate valid for a host at "mail.example.com" or "example.net

Valid patterns can be either fully specified names ("example.com")

suffixes (".example.net") matching the right-hand parts of a

server's

identity; the latter case are distinguished by a leading period. If

How many labels can be prepended here. Is "a.b.c.example.net" valid?
S 3.3.

is duplicated, all entries except for the first SHALL be ignored.

any field is not specified, the policy SHALL be treated as invalid.
3.3. HTTPS Policy Fetching
When fetching a new policy or updating a policy, the HTTPS endpoint

You probably need a 2818 citation here.
S 4.1.

The certificate presented by the receiving MX MUST chain to a root

that is trusted by the sending MTA and be non-expired. The
certificate MUST have a subject alternative name (SAN, [RFC5280])
with a DNS-ID ([RFC6125]) matching the "mx" pattern. The MX's
certificate MAY also be checked for revocation via OCSP [RFC6960],
CRLs [RFC6818], or some other mechanism.

Why isn't this required?
S 4.1.

"wildcard-to-

wildcard" matching.

If you follow my advice above, this will not be necessary.
S 8.1.

Do you mean SHOULD?
S 8.1.

Wouldn't it be easier to just to version the policies?
S 10.2.

mode, to allow clean MTA-STS removal, as described in Section 8.3.)
Resistance to downgrade attacks of this nature--due to the ability

authoritatively determine "lack of a record" even for non-
participating recipients--is a feature of DANE, due to its use of
DNSSEC for policy discovery.

I'm surprised that you don't note that if you use DNSSEC (and the
client validates), you are in general resistant to this form of
attack.
_______________________________________________
Uta mailing list
https://www.ietf.org/mailman/listinfo/uta

Viktor Dukhovni

2018-05-06 18:41:12 UTC

2. Why is the "mx" pattern matched against the SANs and not the MX records themselves? As Viktor noted and I commented briefly in passing, we debated this a *lot* before. One point here is that this is only visible to MTA implementors; sysadmins who mistakenly believe the "mx" field should match the DNS records (which should themselves match the servers' certificates) will end up making their configurations valid per the actual specification. In other words, "match the policy against the SAN" matches a superset of conditions which are valid in the alternative ("match the policy against the MX records and match those records against the certificate"). Personally I would consider this edit to have been a compromise--it was not and is still not my first choice--but, given it is the status quo, I am fairly loath to change it.
On these points--especially #2--I continue to defer to the guidance of the chairs on how best to resolve such issues.

After having to revisit this in response to the DISCUSS, I can
crystalize the issue in terms of the following dichotomy:

* Does MTA-STS secure the connections to the endpoints indicated
by a domain's MX RRset, without preempting MX-based SMTP routing?

or

* Does MTA-STS secure the MX RRset, possibly filtering it to at
at most a set of names cached in the policy, with great care
to first take care of loop elimination.

My sense is that the first option (current text) is a less invasive
change in SMTP, it changes only how the peer is authenticated.

For example, it "testing" mode, one probably SHOULD NOT trim the MX
RRset based on a "testing" policy. Or one might support multiple
authentication mechanisms for the peer MX (say key fingerprint as
a fallback of MTA-STS fails).

There are more implications to filtering the RRset then just
the presented-id matching...

--
Viktor.

Daniel Margolis

2018-05-06 19:44:44 UTC

I don't believe that *pre-filtering *the MX candidate list is the only way
to do it. You could leave the loop as-is and just refuse to connect to
(i.e. treat as a transient connection failure) any candidate which fails
the policy validation. So this is an implementation question; modifying
loop pre-filtering is probably riskier than what we might call "connection
early termination", but both are compliant with the protocol.

The real difference between the two options is not, I think, this
implementation question, but that the current protocol technically allows
some valid configurations that are invalid in the MX-based
alternative--namely, the case where the certificate does not match the MX
hostname. That turns out to be fairly common (per
https://conferences.sigcomm.org/imc/2015/papers/p27.pdf), though, frankly,
I do not know that there's a good reason for admins to deliberately
configure a system in such a matter and, as a result, I don't believe
there's a strong argument for us preserving that flexibility.

I guess the tl;dr as far as I'm concerned is that I think either way really
can be done safely, that it's mostly a documentation issue, but I am
generally hesitant to change things now if we don't have to.

Post by Daniel Margolis
2. Why is the "mx" pattern matched against the SANs and not the MX

records themselves? As Viktor noted and I commented briefly in passing, we
debated this a *lot* before. One point here is that this is only visible to
MTA implementors; sysadmins who mistakenly believe the "mx" field should
match the DNS records (which should themselves match the servers'
certificates) will end up making their configurations valid per the actual
specification. In other words, "match the policy against the SAN" matches a
superset of conditions which are valid in the alternative ("match the
policy against the MX records and match those records against the
certificate"). Personally I would consider this edit to have been a
compromise--it was not and is still not my first choice--but, given it is
the status quo, I am fairly loath to change it.

Post by Daniel Margolis
On these points--especially #2--I continue to defer to the guidance of

the chairs on how best to resolve such issues.
After having to revisit this in response to the DISCUSS, I can
* Does MTA-STS secure the connections to the endpoints indicated
by a domain's MX RRset, without preempting MX-based SMTP routing?
or
* Does MTA-STS secure the MX RRset, possibly filtering it to at
at most a set of names cached in the policy, with great care
to first take care of loop elimination.
My sense is that the first option (current text) is a less invasive
change in SMTP, it changes only how the peer is authenticated.
For example, it "testing" mode, one probably SHOULD NOT trim the MX
RRset based on a "testing" policy. Or one might support multiple
authentication mechanisms for the peer MX (say key fingerprint as
a fallback of MTA-STS fails).
There are more implications to filtering the RRset then just
the presented-id matching...
--
Viktor.

Viktor Dukhovni

2018-05-06 21:20:57 UTC

I don't believe that pre-filtering the MX candidate list is the only way to do it. You could leave the loop as-is and just refuse to connect to (i.e. treat as a transient connection failure) any candidate which fails the policy validation. So this is an implementation question; modifying loop pre-filtering is probably riskier than what we might call "connection early termination", but both are compliant with the protocol.

It makes a difference with a "testing" policy. Should mail be sent via
an MX host not listed in the policy, or should it be skipped? With
"testing" the mail should probably go out, with a report of the authentication
failure (impossible success given unexpected MX name) sent per any "tlsrpt"
policy.

So at least "testing" should probably use all the MX hosts. Whether "enforce"
does or does not is then a question of whether doing it differently for the
two cases is a potential source of confusion/bugs, and prominent anti-loop
warnings.

There are even some domains where connecting to the backup MX host *before*
trying a connection to the primary will cause firewall rules to be dynamically
added to block the client!

--
Viktor.

Viktor Dukhovni

2018-05-07 03:03:58 UTC