Discussion:
[Uta] Eric Rescorla's Discuss on draft-ietf-uta-mta-sts-17: (with DISCUSS and COMMENT)
Eric Rescorla
2018-05-04 01:14:44 UTC
Permalink
Eric Rescorla has entered the following ballot position for
draft-ietf-uta-mta-sts-17: Discuss

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
for more information about IESG DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-uta-mta-sts/



----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------

Rich version of this review at:
https://mozphab-ietf.devsvcdev.mozaws.net/D4010



DETAIL
S 3.3.
character '*' as the complete left-most label within the
identifier.
The certificate MAY be checked for revocation via the Online
Certificate Status Protocol (OCSP) [RFC6960], certificate revocation
lists (CRLs), or some other mechanism.
Why is revocation only MAY?


S 4.
1. That the recipient MX supports STARTTLS and offers a valid PKIX-
based TLS certificate.
2. That at least one of the policy's "mx" patterns matches at least
one of the identities presented in the MX's X.509 certificate, as
described in "MX Certificate Validation".
This doesn't seem like quite what you want. Consider the case where
the STS policy has:


S 5.
as though it does not have any active policy; see Section 8.3,
"Removing MTA-STS", for use of this mode value.
When a message fails to deliver due to an "enforce" policy, a
compliant MTA MUST NOT permanently fail to deliver messages before
checking for the presence of an updated policy at the Policy Domain.
What exactly does this mean? That you have to do HTTPS or just do a
new DNS resolution despite the TTL?


S 8.2.
to the hosting organization. This can be done either by setting the
"mta-sts" record to an IP address or CNAME specified by the hosting
organization and by giving the hosting organization a TLS certificate
which is valid for that host, or by setting up a "reverse proxy"
(also known as a "gateway") server that serves as the Policy Domain's
policy the policy currently served by the hosting organization.
What certificate do I expect in this case?


----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

S 1.
o whether MTAs sending mail to this domain can expect PKIX-
authenticated TLS support
o what a conforming client should do with messages when TLS cannot
be successfully negotiated
It would be nice if you stated here that you publish them in the DNS.


S 3.2.
The policy itself is a set of key/value pairs (similar to [RFC5322]
header fields) served via the HTTPS GET method from the fixed
[RFC5785] "well-known" path of ".well-known/mta-sts.txt" served by
the "mta-sts" host at the Policy Domain. Thus for "example.com" the
path is "https://mta-sts.example.com/.well-known/mta-sts.txt".
This is slightly confusing text, because domains and hosts aren't
distinguished categories. I'm not sure what the correct terminology is
for DNS, but the point seems to be that you get it by prepending the
mta-sts label to the policy domain.


S 3.2.
path is "https://mta-sts.example.com/.well-known/mta-sts.txt".
When fetching a policy, senders SHOULD validate that the media type
is "text/plain" to guard against cases where webservers allow
untrusted users to host non-text content (typically, HTML or images)
at a user-defined path. All parameters other charset=utf-8 or
Nit: "other than"


S 3.2.
charset=us-ascii are ignored. Additional "Content-Type" parameters
are also ignored.
o "version": (plain-text). Currently only "STSv1" is supported.
What does "plain-text" mean? I don't see a definition,


S 3.2.
o "max_age": Max lifetime of the policy (plain-text non-negative
integer seconds, maximum value of 31557600). Well-behaved clients
SHOULD cache a policy for up to this value from last policy fetch
time. To mitigate the risks of attacks at policy refresh time, it
is expected that this value typically be in the range of weeks or
greater.
What if I receive a policy with a lifetime less than that remaining in
the previously received policy


S 3.2.
indicates that mail for this domain might be handled by any MX with a
certificate valid for a host at "mail.example.com" or "example.net".
Valid patterns can be either fully specified names ("example.com") or
suffixes (".example.net") matching the right-hand parts of a server's
identity; the latter case are distinguished by a leading period. If
How many labels can be prepended here. Is "a.b.c.example.net" valid?


S 3.3.
is duplicated, all entries except for the first SHALL be ignored. If
any field is not specified, the policy SHALL be treated as invalid.
3.3. HTTPS Policy Fetching
When fetching a new policy or updating a policy, the HTTPS endpoint
You probably need a 2818 citation here.


S 4.1.
The certificate presented by the receiving MX MUST chain to a root CA
that is trusted by the sending MTA and be non-expired. The
certificate MUST have a subject alternative name (SAN, [RFC5280])
with a DNS-ID ([RFC6125]) matching the "mx" pattern. The MX's
certificate MAY also be checked for revocation via OCSP [RFC6960],
CRLs [RFC6818], or some other mechanism.
Why isn't this required?


S 4.1.
identical to other common cases of X.509 certificate authentication
(as described, for example, in [RFC6125]). Consider the example
policy given above, with an "mx" pattern containing ".example.com".
In this case, if the MX server's X.509 certificate contains a SAN
matching "*.example.com", we are required to implement "wildcard-to-
wildcard" matching.
If you follow my advice above, this will not be necessary.


S 8.1.
may be unable to discover that a new policy exists until the DNS TTL
has passed. Recipients should therefore ensure that old policies
continue to work for message delivery during this period of time, or
risk message delays.
Recipients should also prefer to update the HTTPS policy body before
Do you mean SHOULD?


S 8.1.
continue to work for message delivery during this period of time, or
risk message delays.
Recipients should also prefer to update the HTTPS policy body before
updating the TXT record; this ordering avoids the risk that senders,
seeing a new TXT record, mistakenly cache the old policy from HTTPS.
Wouldn't it be easier to just to version the policies?


S 10.2.
mode, to allow clean MTA-STS removal, as described in Section 8.3.)
Resistance to downgrade attacks of this nature--due to the ability to
authoritatively determine "lack of a record" even for non-
participating recipients--is a feature of DANE, due to its use of
DNSSEC for policy discovery.
I'm surprised that you don't note that if you use DNSSEC (and the
client validates), you are in general resistant to this form of
attack.
Viktor Dukhovni
2018-05-04 05:19:46 UTC
Permalink
On Thu, May 03, 2018 at 06:14:44PM -0700, Eric Rescorla wrote:

[ Though I am not one of the authors, I was actively involved in
the evolution of the draft. Some its features are in part are
result of my influence, both based on prior work with DANE, and
as a potential implementor of the specification in a future
Postfix release. The comments below may shed light on the
rationale for some of the choices reflected in the draft. ]
Post by Eric Rescorla
S 3.2.
o "max_age": Max lifetime of the policy (plain-text non-negative
integer seconds, maximum value of 31557600). Well-behaved clients
SHOULD cache a policy for up to this value from last policy fetch
time. To mitigate the risks of attacks at policy refresh time, it
is expected that this value typically be in the range of weeks or
greater.
What if I receive a policy with a lifetime less than that remaining in
the previously received policy
Good question, I don't recall any discussion of this. Since the
policy might be "none", canceling the cached policy entirely, it
seems logical to allow the new "max_age" to end before the old
"max_age". I hope the authors agree.
Post by Eric Rescorla
indicates that mail for this domain might be handled by any MX with a
certificate valid for a host at "mail.example.com" or "example.net".
Valid patterns can be either fully specified names ("example.com") or
suffixes (".example.net") matching the right-hand parts of a server's
identity; the latter case are distinguished by a leading period. If
How many labels can be prepended here. Is "a.b.c.example.net" valid?
Another good observation, I am having trouble finding normative
text in the body of the draft that makes this clear. IIRC, as
evidenced by a comment in the Appendix B pseudo-code, the intention
is to support just a single label:

// Leading '.' matches a wildcard against the first part, i.e.
// .example.com matches x.example.com but not x.y.example.com.

The text should be more clear, unless we both missed where this is
specified.
Post by Eric Rescorla
S 10.2.
mode, to allow clean MTA-STS removal, as described in Section 8.3.)
Resistance to downgrade attacks of this nature--due to the ability to
authoritatively determine "lack of a record" even for non-
participating recipients--is a feature of DANE, due to its use of
DNSSEC for policy discovery.
I'm surprised that you don't note that if you use DNSSEC (and the
client validates), you are in general resistant to this form of
attack.
With MTA-STS, hardening the DNS is not enough, the policy does not
take effect until it is first verified to work. First-contact
lookup failures for the TXT record do not cause email to be deferred.

Indeed with MTA-STS, some MTAs may do *background* policy retrieval,
and the first few messages to a destination may go out unprotected.

For a downgrade-resistant mechanism, a domain can use DANE SMTP
(RFC7672). If the destination domain is signed, the first step in
that direction is already taken.
Post by Eric Rescorla
2. That at least one of the policy's "mx" patterns matches at least
one of the identities presented in the MX's X.509 certificate, as
described in "MX Certificate Validation".
IMPORTANT: This doesn't seem like quite what you want. Consider
mx: mx1.example.com
mx: mx2.example.com
And I then attempt to send to mx1.example.com, send SNI=mx1.example.com,
and get a cert that is only valid for mx2.example.com.
[ This was discussed extensively in the WG. This part of the design
is substantially my doing... ]

An MiTM attacker can direct the traffic to any MX host of his choice
by blocking TCP SYNs, or generating RST packets for traffic to all
the other MXs, causing the desired MX host to be the only one the
client can reach. Also, for a large fraction of domains a wildcard
certificate, or a certificate with all the names is used. For
example, below are the SANs from the certificate for gmail.com:

DNS:mx.google.com
DNS:alt1.aspmx.l.google.com
DNS:alt1.gmail-smtp-in.l.google.com
DNS:alt1.gmr-smtp-in.l.google.com
DNS:alt2.aspmx.l.google.com
DNS:alt2.gmail-smtp-in.l.google.com
DNS:alt2.gmr-smtp-in.l.google.com
DNS:alt3.aspmx.l.google.com
DNS:alt3.gmail-smtp-in.l.google.com
DNS:alt3.gmr-smtp-in.l.google.com
DNS:alt4.aspmx.l.google.com
DNS:alt4.gmail-smtp-in.l.google.com
DNS:alt4.gmr-smtp-in.l.google.com
DNS:aspmx.l.google.com
DNS:aspmx2.googlemail.com
DNS:aspmx3.googlemail.com
DNS:aspmx4.googlemail.com
DNS:aspmx5.googlemail.com
DNS:gmail-smtp-in.l.google.com
DNS:gmr-mx.google.com
DNS:gmr-smtp-in.l.google.com

So trying to make sure that you're reaching the MX host
you think you're reaching and not one of the others is
largely pointless and often a lost cause.
Post by Eric Rescorla
This seems like it's extremely undesirable and might be the basis for some kind of attack.
See above. If the MX host has a certificate that matches the
client's SNI, it'll may return it, even if that's one of the other
MX hosts. If it does not return a matching certificate, the "attack"
fails.
Post by Eric Rescorla
You look up the MXes in the DNS.
You select one that must match one of the things in the mx list in the STS
Preemptive removal of non-matching MX hosts is liable (in sloppy
implementations, and I expect enough to be sloppy) to cause routing
loops, when a backup MX host, not after removing itself early from
the list, fails to eliminate worse priority MX hosts. It also
requires all sites to duplicate MX host updates from DNS into the
STS policy, disallowing the "low-maintenance" ".example.com" form.
Post by Eric Rescorla
You then connect to the MX and provide its SNI.
The certificate must match the domain you provided in the SNI
The WG considered this issue, and in the end accepted the current
design. I hope this helps.
Post by Eric Rescorla
The certificate MAY be checked for revocation via the Online
Certificate Status Protocol (OCSP) [RFC6960], certificate revocation
lists (CRLs), or some other mechanism.
Why is revocation only MAY?
Looking at e.g. the X.509 certificate for Gmail, I don't see a
"must staple OCSP" extension. So we get no meaningful security
from OCSP stapling, an attacker who misappropriates the private
key will not staple OCSP responses.

I have no intention of building an HTTP client into the Postfix
SMTP client to download CRLs from various CAs that remote peers
might use. Full CRLs might at least be cached on a per-CA basis,
while per-certificate OCSP requires a connection to the CA for each
new certificate.

I'm afraid I see too little value in CRLs to consider CRL support
in Postfix. The OS platforms that Postfix runs on don't deliver
a full intermediate CA store with regular updates of the associated
CRLs. Doing CRL management in each application is IMHO impractical.

In short, I have not implemented and don't expect to implement CRL
support in Postfix.

[ I'll endeavour to leave further comments on the above topics to
the authors. I might still chime in if a new topic comes up
where I'm one of the culprits responsible for the current text. ]
--
Viktor.

P.S. (digression on what I'd like to see replace CRLs)

If we want effective revocation for WebPKI, let's fully automate
certificate roll-over (ACME is a good start) and drive down the
maximum certificate lifetimes to be short enough that most likely
you'd have a hard time noticing that your key is compromised any
faster, and getting the CA to revoke the cert, getting sites that
cache CRLs to get fresh CRLs, ...

I'd like to see one to two week certificate lifetimes, and X.509
stacks that can reload the certificates and keys without restarting
the server.
Eric Rescorla
2018-05-04 12:11:20 UTC
Permalink
Post by Eric Rescorla
S 10.2.
mode, to allow clean MTA-STS removal, as described in Section
8.3.)
Post by Eric Rescorla
Resistance to downgrade attacks of this nature--due to the
ability to
Post by Eric Rescorla
authoritatively determine "lack of a record" even for non-
participating recipients--is a feature of DANE, due to its use of
DNSSEC for policy discovery.
I'm surprised that you don't note that if you use DNSSEC (and the
client validates), you are in general resistant to this form of
attack.
With MTA-STS, hardening the DNS is not enough, the policy does not
take effect until it is first verified to work. First-contact
lookup failures for the TXT record do not cause email to be deferred.
Indeed with MTA-STS, some MTAs may do *background* policy retrieval,
and the first few messages to a destination may go out unprotected.
For a downgrade-resistant mechanism, a domain can use DANE SMTP
(RFC7672). If the destination domain is signed, the first step in
that direction is already taken.
Post by Eric Rescorla
2. That at least one of the policy's "mx" patterns matches at
least
Post by Eric Rescorla
one of the identities presented in the MX's X.509
certificate, as
Post by Eric Rescorla
described in "MX Certificate Validation".
IMPORTANT: This doesn't seem like quite what you want. Consider
mx: mx1.example.com
mx: mx2.example.com
And I then attempt to send to mx1.example.com, send SNI=mx1.example.com,
and get a cert that is only valid for mx2.example.com.
[ This was discussed extensively in the WG. This part of the design
is substantially my doing... ]
An MiTM attacker can direct the traffic to any MX host of his choice
by blocking TCP SYNs, or generating RST packets for traffic to all
the other MXs, causing the desired MX host to be the only one the
client can reach. Also, for a large fraction of domains a wildcard
certificate, or a certificate with all the names is used. For
DNS:mx.google.com
DNS:alt1.aspmx.l.google.com
DNS:alt1.gmail-smtp-in.l.google.com
DNS:alt1.gmr-smtp-in.l.google.com
DNS:alt2.aspmx.l.google.com
DNS:alt2.gmail-smtp-in.l.google.com
DNS:alt2.gmr-smtp-in.l.google.com
DNS:alt3.aspmx.l.google.com
DNS:alt3.gmail-smtp-in.l.google.com
DNS:alt3.gmr-smtp-in.l.google.com
DNS:alt4.aspmx.l.google.com
DNS:alt4.gmail-smtp-in.l.google.com
DNS:alt4.gmr-smtp-in.l.google.com
DNS:aspmx.l.google.com
DNS:aspmx2.googlemail.com
DNS:aspmx3.googlemail.com
DNS:aspmx4.googlemail.com
DNS:aspmx5.googlemail.com
DNS:gmail-smtp-in.l.google.com
DNS:gmr-mx.google.com
DNS:gmr-smtp-in.l.google.com
So trying to make sure that you're reaching the MX host
you think you're reaching and not one of the others is
largely pointless and often a lost cause.
But not everyone is configured this way.
Post by Eric Rescorla
This seems like it's extremely undesirable and might be the basis for
some kind of attack.
See above. If the MX host has a certificate that matches the
client's SNI, it'll may return it, even if that's one of the other
MX hosts. If it does not return a matching certificate, the "attack"
fails.
This might be true, but this kind of informal reasoning is notoriously
prone to error.
We have a general pattern for TLS certificate verification, which you are
deviating
from, and we then need to analyze in detail. I'm not seeing any good reason
for
that.
Post by Eric Rescorla
You look up the MXes in the DNS.
You select one that must match one of the things in the mx list in the
STS
Preemptive removal of non-matching MX hosts is liable (in sloppy
implementations, and I expect enough to be sloppy) to cause routing
loops, when a backup MX host, not after removing itself early from
the list, fails to eliminate worse priority MX hosts.
I don't understand this claim.

It also
requires all sites to duplicate MX host updates from DNS into the
STS policy, disallowing the "low-maintenance" ".example.com" form.
I don't see why this would be true. You publish .example.com and
then you modify the MX requires at will. provided that they all end
in .example.com.
Post by Eric Rescorla
The certificate MAY be checked for revocation via the Online
Certificate Status Protocol (OCSP) [RFC6960], certificate
revocation
Post by Eric Rescorla
lists (CRLs), or some other mechanism.
Why is revocation only MAY?
Looking at e.g. the X.509 certificate for Gmail, I don't see a
"must staple OCSP" extension. So we get no meaningful security
from OCSP stapling, an attacker who misappropriates the private
key will not staple OCSP responses.
I have no intention of building an HTTP client into the Postfix
SMTP client to download CRLs from various CAs that remote peers
might use. Full CRLs might at least be cached on a per-CA basis,
while per-certificate OCSP requires a connection to the CA for each
new certificate.
I'm afraid I see too little value in CRLs to consider CRL support
in Postfix. The OS platforms that Postfix runs on don't deliver
a full intermediate CA store with regular updates of the associated
CRLs. Doing CRL management in each application is IMHO impractical.
In short, I have not implemented and don't expect to implement CRL
support in Postfix.
You seem to be omitting the obvious answer: regular OCSP.

-Ekr
Viktor Dukhovni
2018-05-04 14:41:13 UTC
Permalink
[ Re-ordered for clarity. Hope the below adds some context. ]
Post by Viktor Dukhovni
Preemptive removal of non-matching MX hosts is liable (in sloppy
implementations, and I expect enough to be sloppy) to cause routing
loops, when a backup MX host, not after removing itself early from
the list, fails to eliminate worse priority MX hosts.
I don't understand this claim.
A sending MTA might be a non-primary MX host for a domain, that
is trying to reach a better (lower) preference MX host. If it
prunes the MX RRset based on the STS policy, *before* dropping
all worse (higher) preference MX hosts, it is liable to create
a mail routing loop, by not taking into account the fact it is
one of the MX hosts for the destination. Ideally the domain's
MX RRset should not contain any names not matched by the STS
policy, but reality is sometimes different.

If the meaning of the matching field were changed to be an
MX hostname pattern, rather than a presented-identifet (RFC6125)
pattern, then we'd need rather prominent warnings in the
text about routing loop avoidance.
Post by Viktor Dukhovni
Post by Viktor Dukhovni
So trying to make sure that you're reaching the MX host
you think you're reaching and not one of the others is
largely pointless and often a lost cause.
But not everyone is configured this way.
Yes, some domains have distinct per-MX certificates. Even then,
an MiTM attacker can still restrict traffic to any MX host of
his/her choice, but if the name matching were more strict indeed
the sending MTA would then know *which* MX host this was more
reliably than otherwise.
Post by Viktor Dukhovni
Post by Viktor Dukhovni
See above. If the MX host has a certificate that matches the
client's SNI, it'll may return it, even if that's one of the other
MX hosts. If it does not return a matching certificate, the "attack"
fails.
This might be true, but this kind of informal reasoning is notoriously
prone to error. We have a general pattern for TLS certificate verification,
which you are deviating from, and we then need to analyze in detail. I'm
not seeing any good reason for that.
Historically, because MX lookups are unauthenticated DNS, trusting
the MX hostname was not a good option. So SMTP senders would validate
the next-hop domain, rather than the MX hostname. Correspondingly,
the certificates used by MX hosts would not necessarily match the
MX hostname, some matched only the (email) destination domain.
These were called UCC certificates by some.

Of course MTA-STS is new territory, and one might require suitable
new certificates for that, that always match the MX hostname. The
current draft is more forgiving.
Post by Viktor Dukhovni
Post by Viktor Dukhovni
It also
requires all sites to duplicate MX host updates from DNS into the
STS policy, disallowing the "low-maintenance" ".example.com" form.
I don't see why this would be true. You publish .example.com and
then you modify the MX requires at will. provided that they all end
in .example.com.
Yes, that's true, provided the field remains a pattern. It would
invite the routing loop mis-optimization, not clear how effective
the text can be in the face of lazy implementors who just read some
TL;DR summary and implement without much thought. The presented-
identifier design is less prone to getting that wrong...
Post by Viktor Dukhovni
Post by Viktor Dukhovni
In short, I have not implemented and don't expect to implement CRL
support in Postfix.
You seem to be omitting the obvious answer: regular OCSP.
I did mention OCSP, I have problems with it:

* When OCSP lookups temp-fail, my impression is that most
clients generally continue processing. This obviates
the security benefits of OCSP. Otherwise the CA OCSP
server becomes a single point of failure I'd prefer
to avoid.

* One of goals of DANE and MTA-STS is to increase email
transport privacy. Leaking the (sender-domain,
recipient-domain) pairs to a new third party is in
conflict with that goal.

Hope that helps.
--
Viktor.
Eric Rescorla
2018-05-04 15:45:32 UTC
Permalink
Post by Viktor Dukhovni
[ Re-ordered for clarity. Hope the below adds some context. ]
Post by Viktor Dukhovni
Preemptive removal of non-matching MX hosts is liable (in sloppy
implementations, and I expect enough to be sloppy) to cause routing
loops, when a backup MX host, not after removing itself early from
the list, fails to eliminate worse priority MX hosts.
I don't understand this claim.
A sending MTA might be a non-primary MX host for a domain, that
is trying to reach a better (lower) preference MX host. If it
prunes the MX RRset based on the STS policy, *before* dropping
all worse (higher) preference MX hosts, it is liable to create
a mail routing loop, by not taking into account the fact it is
one of the MX hosts for the destination. Ideally the domain's
MX RRset should not contain any names not matched by the STS
policy, but reality is sometimes different.
If the meaning of the matching field were changed to be an
MX hostname pattern, rather than a presented-identifet (RFC6125)
pattern, then we'd need rather prominent warnings in the
text about routing loop avoidance.
Well, in general when STS is misconfigured you can have problems.
I don't see that this case is sufficiently important to go away from
standard TLS semantics.
Post by Viktor Dukhovni
Post by Viktor Dukhovni
In short, I have not implemented and don't expect to implement CRL
Post by Viktor Dukhovni
support in Postfix.
You seem to be omitting the obvious answer: regular OCSP.
* When OCSP lookups temp-fail, my impression is that most
clients generally continue processing. This obviates
the security benefits of OCSP. Otherwise the CA OCSP
server becomes a single point of failure I'd prefer
to avoid.
* One of goals of DANE and MTA-STS is to increase email
transport privacy. Leaking the (sender-domain,
recipient-domain) pairs to a new third party is in
conflict with that goal.
OSCP stapling (w/o must-staple) significantly decreases the privacy
load here without introducing brittleness. And of course there are
other mechanisms, such as CRLsets.

-Ekr
Post by Viktor Dukhovni
Hope that helps.
--
Viktor.
Viktor Dukhovni
2018-05-04 16:13:03 UTC
Permalink
Post by Eric Rescorla
Post by Viktor Dukhovni
If the meaning of the matching field were changed to be an
MX hostname pattern, rather than a presented-identifet (RFC6125)
pattern, then we'd need rather prominent warnings in the
text about routing loop avoidance.
Well, in general when STS is misconfigured you can have problems.
I don't see that this case is sufficiently important to go away from
standard TLS semantics.
For the record, I'm concerned about implementation pitfalls,
not misconfiguration. A domain where not all the MX hosts
are not listed in the STS policy is "normal" is not
misconfigured per-se, STS-aware clients would send only
via the secure MX hosts, other clients may use the full
set. This is not a recommended configuration, but it
should work, provided at least one best-preference MX host
is listed.

The basic idea is that STS is there to secure mail routing,
not trump it. As much as possible mail routing should continue
to be based on the MX host names. An MX host not listed in the
policy might never-the-less possess a certificate matching the
policy (if the policy specifies presented-id patterns rather than
MX host patterns).

Which is not to say that alternative designs can't work, they'd
emphasize doing TLS "by-the-book" over doing SMTP "by-the-book".
My instinct is to do SMTP "by the book", the goal here is to deliver
email, securely when possible.

This protocol is an opportunistic upgrade from cleartext to
unauthenticated TLS to authenticated TLS when STS policy is
located and/or cached, some caution may be appropriate to not
over-optimize for security at the expense of operational
robustness. Especially in the email space, fragile security
gets turned off, RFC7435 and all that...

One related observation (thanks for the hard questions that lead
to the insight), perhaps worth mentioning in Security Considerations,
is that with MTA-STS an attacker who can forge MX records or address
RRsets of MX hosts can cause mail to bounce when the sender finds no
A/AAAA records for any of the MX hosts. The reverse path may not be
STS protected, and the bounce may return to the sender in the clear.

An implementation that naively filters the MX RRset first,
before eliminating MX hosts at the same or worse preference
than the sending host is buggy, and I think this bug is quite
likely. These days few read a complete document cover to cover,
we tend read the bits we think we need. Information overload and
all that.

So the warning about MX loops would likely be needed in
multiple places in the document to make MX patterns safer
for implementors with a typical attention span.

If the authors, IESG, the WG participants reading this, ...
decide to go back to MX host patterns at this point, I
won't stand in the way, I would just ask for prominent
warnings about MX RRset truncation at the sending host's
own preference (when found in the original MX RRset, forged
or not) and above happening BEFORE any policy filtering of
the MX RRset.
--
Viktor.
Daniel Margolis
2018-05-04 18:25:23 UTC
Permalink
Whoah. Long thread.

For the record, I believe it's trivial to implement the hostname filtering
without applying it to the MX selection loop (and I think I've made this
observation before): if an invalid certificate is (as it must be) detected
after connecting to the chosen MX candidate (and thus cannot be used to
"prefilter" the candidate list), then, similarly, one can merely reject MX
candidates after selecting them (i.e. without modifying the loop/candidate
logic) and simulate the same control flow. That said, I always read
Viktor's argument as being that by making this a check against the
presented certificate it ensures implementers do not modify the candidate
selection logic.

I also always felt a bit ambivalent about this entire discussion, insofar
as we are trying to design so that implementers of validating MTAs--of
which there aren't all that many--don't make mistakes. Both designs run a
risk of hypothetical mistakes, either in the wildcard-to-wildcard matching
or in the MX loop traversal. But neither mistake is applicable to system
administrators, but only to the much rarer set of MTA authors. This doesn't
mean we shouldn't consider it, of course, and I think the concerns voiced
are valid--but it still seems significant to me to keep that in mind.

I think that actually lends itself to documentary fixes--i.e., calling out
the risks and potential mis-implementations in either strategy for
uncareful readers.
Post by Viktor Dukhovni
Post by Eric Rescorla
Post by Viktor Dukhovni
If the meaning of the matching field were changed to be an
MX hostname pattern, rather than a presented-identifet (RFC6125)
pattern, then we'd need rather prominent warnings in the
text about routing loop avoidance.
Well, in general when STS is misconfigured you can have problems.
I don't see that this case is sufficiently important to go away from
standard TLS semantics.
For the record, I'm concerned about implementation pitfalls,
not misconfiguration. A domain where not all the MX hosts
are not listed in the STS policy is "normal" is not
misconfigured per-se, STS-aware clients would send only
via the secure MX hosts, other clients may use the full
set. This is not a recommended configuration, but it
should work, provided at least one best-preference MX host
is listed.
The basic idea is that STS is there to secure mail routing,
not trump it. As much as possible mail routing should continue
to be based on the MX host names. An MX host not listed in the
policy might never-the-less possess a certificate matching the
policy (if the policy specifies presented-id patterns rather than
MX host patterns).
Which is not to say that alternative designs can't work, they'd
emphasize doing TLS "by-the-book" over doing SMTP "by-the-book".
My instinct is to do SMTP "by the book", the goal here is to deliver
email, securely when possible.
This protocol is an opportunistic upgrade from cleartext to
unauthenticated TLS to authenticated TLS when STS policy is
located and/or cached, some caution may be appropriate to not
over-optimize for security at the expense of operational
robustness. Especially in the email space, fragile security
gets turned off, RFC7435 and all that...
One related observation (thanks for the hard questions that lead
to the insight), perhaps worth mentioning in Security Considerations,
is that with MTA-STS an attacker who can forge MX records or address
RRsets of MX hosts can cause mail to bounce when the sender finds no
A/AAAA records for any of the MX hosts. The reverse path may not be
STS protected, and the bounce may return to the sender in the clear.
An implementation that naively filters the MX RRset first,
before eliminating MX hosts at the same or worse preference
than the sending host is buggy, and I think this bug is quite
likely. These days few read a complete document cover to cover,
we tend read the bits we think we need. Information overload and
all that.
So the warning about MX loops would likely be needed in
multiple places in the document to make MX patterns safer
for implementors with a typical attention span.
If the authors, IESG, the WG participants reading this, ...
decide to go back to MX host patterns at this point, I
won't stand in the way, I would just ask for prominent
warnings about MX RRset truncation at the sending host's
own preference (when found in the original MX RRset, forged
or not) and above happening BEFORE any policy filtering of
the MX RRset.
--
Viktor.
Alberto Bertogli
2018-05-04 14:56:12 UTC
Permalink
Post by Viktor Dukhovni
Post by Eric Rescorla
2. That at least one of the policy's "mx" patterns matches at least
one of the identities presented in the MX's X.509 certificate, as
described in "MX Certificate Validation".
IMPORTANT: This doesn't seem like quite what you want. Consider
mx: mx1.example.com
mx: mx2.example.com
And I then attempt to send to mx1.example.com, send SNI=mx1.example.com,
and get a cert that is only valid for mx2.example.com.
[ This was discussed extensively in the WG. This part of the design
is substantially my doing... ]
For ease of reference, these are some of those discussions where people
(including me) raised concerns about the custom certificate matching:

https://www.ietf.org/mail-archive/web/uta/current/msg02195.html

https://www.ietf.org/mail-archive/web/uta/current/msg01922.html

https://www.ietf.org/mail-archive/web/uta/current/msg02308.html

Thanks,
Alberto
Daniel Margolis
2018-05-06 16:55:09 UTC
Permalink
Hey Eric,

Thanks for the valuable comments. I've responded to most of them here:
https://mozphab-ietf.devsvcdev.mozaws.net/D4010. The revision containing
fixes can be seen at https://github.com/mrisher/smtp-sts/pull/220. (I will
let another author review my changes before merging and submitting a new
official draft.)

Comments I was unable to resolve:

* https://mozphab-ietf.devsvcdev.mozaws.net/D4010#inline-3713: How do you
suggest we clarify the terminology ("host" and "Policy Domain")?
* https://mozphab-ietf.devsvcdev.mozaws.net/D4010#inline-3716: Any
suggestions on clarifying that any max-age is valid?

I think there are two larger comments unresolved here, as well:

1. Certificate revocation (and "MAY"). My read on this is that revocation
is not widely mandated (e.g. popular Web browsers don't necessarily do it
using standard mechanisms!), some mechanisms (e.g. OCSP) don't provide the
security guarantees we would want, and so this is too muddied a space to
mandate specific behavior. As Viktor noted, some MTA developers may be very
opposed. My preference here is somewhat strongly to leave this as-is, for
those reasons.

2. Why is the "mx" pattern matched against the SANs and not the MX records
themselves? As Viktor noted and I commented briefly in passing, we debated
this a *lot* before. One point here is that this is only visible to MTA
implementors; sysadmins who mistakenly believe the "mx" field should match
the DNS records (which should themselves match the servers' certificates)
will end up making their configurations valid per the actual specification.
In other words, "match the policy against the SAN" matches a superset of
conditions which are valid in the alternative ("match the policy against
the MX records and match those records against the certificate").
Personally I would consider this edit to have been a compromise--it was not
and is still not my first choice--but, given it is the status quo, I am
fairly loath to change it.

On these points--especially #2--I continue to defer to the guidance of the
chairs on how best to resolve such issues.

Hope that helps. More feedback is welcome.
Post by Eric Rescorla
Eric Rescorla has entered the following ballot position for
draft-ietf-uta-mta-sts-17: Discuss
When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)
Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
for more information about IESG DISCUSS and COMMENT positions.
https://datatracker.ietf.org/doc/draft-ietf-uta-mta-sts/
----------------------------------------------------------------------
----------------------------------------------------------------------
https://mozphab-ietf.devsvcdev.mozaws.net/D4010
DETAIL
S 3.3.
character '*' as the complete left-most label within the
identifier.
The certificate MAY be checked for revocation via the Online
Certificate Status Protocol (OCSP) [RFC6960], certificate revocation
lists (CRLs), or some other mechanism.
Why is revocation only MAY?
S 4.
1. That the recipient MX supports STARTTLS and offers a valid PKIX-
based TLS certificate.
2. That at least one of the policy's "mx" patterns matches at least
one of the identities presented in the MX's X.509 certificate,
as
described in "MX Certificate Validation".
This doesn't seem like quite what you want. Consider the case where
S 5.
as though it does not have any active policy; see Section 8.3,
"Removing MTA-STS", for use of this mode value.
When a message fails to deliver due to an "enforce" policy, a
compliant MTA MUST NOT permanently fail to deliver messages before
checking for the presence of an updated policy at the Policy Domain.
What exactly does this mean? That you have to do HTTPS or just do a
new DNS resolution despite the TTL?
S 8.2.
to the hosting organization. This can be done either by setting the
"mta-sts" record to an IP address or CNAME specified by the hosting
organization and by giving the hosting organization a TLS
certificate
which is valid for that host, or by setting up a "reverse proxy"
(also known as a "gateway") server that serves as the Policy
Domain's
policy the policy currently served by the hosting organization.
What certificate do I expect in this case?
----------------------------------------------------------------------
----------------------------------------------------------------------
S 1.
o whether MTAs sending mail to this domain can expect PKIX-
authenticated TLS support
o what a conforming client should do with messages when TLS cannot
be successfully negotiated
It would be nice if you stated here that you publish them in the DNS.
S 3.2.
The policy itself is a set of key/value pairs (similar to [RFC5322]
header fields) served via the HTTPS GET method from the fixed
[RFC5785] "well-known" path of ".well-known/mta-sts.txt" served by
the "mta-sts" host at the Policy Domain. Thus for "example.com"
the
path is "https://mta-sts.example.com/.well-known/mta-sts.txt".
This is slightly confusing text, because domains and hosts aren't
distinguished categories. I'm not sure what the correct terminology is
for DNS, but the point seems to be that you get it by prepending the
mta-sts label to the policy domain.
S 3.2.
path is "https://mta-sts.example.com/.well-known/mta-sts.txt".
When fetching a policy, senders SHOULD validate that the media type
is "text/plain" to guard against cases where webservers allow
untrusted users to host non-text content (typically, HTML or images)
at a user-defined path. All parameters other charset=utf-8 or
Nit: "other than"
S 3.2.
charset=us-ascii are ignored. Additional "Content-Type" parameters
are also ignored.
o "version": (plain-text). Currently only "STSv1" is supported.
What does "plain-text" mean? I don't see a definition,
S 3.2.
o "max_age": Max lifetime of the policy (plain-text non-negative
integer seconds, maximum value of 31557600). Well-behaved
clients
SHOULD cache a policy for up to this value from last policy fetch
time. To mitigate the risks of attacks at policy refresh time,
it
is expected that this value typically be in the range of weeks or
greater.
What if I receive a policy with a lifetime less than that remaining in
the previously received policy
S 3.2.
indicates that mail for this domain might be handled by any MX with
a
certificate valid for a host at "mail.example.com" or "example.net
".
Valid patterns can be either fully specified names ("example.com")
or
suffixes (".example.net") matching the right-hand parts of a
server's
identity; the latter case are distinguished by a leading period. If
How many labels can be prepended here. Is "a.b.c.example.net" valid?
S 3.3.
is duplicated, all entries except for the first SHALL be ignored.
If
any field is not specified, the policy SHALL be treated as invalid.
3.3. HTTPS Policy Fetching
When fetching a new policy or updating a policy, the HTTPS endpoint
You probably need a 2818 citation here.
S 4.1.
The certificate presented by the receiving MX MUST chain to a root
CA
that is trusted by the sending MTA and be non-expired. The
certificate MUST have a subject alternative name (SAN, [RFC5280])
with a DNS-ID ([RFC6125]) matching the "mx" pattern. The MX's
certificate MAY also be checked for revocation via OCSP [RFC6960],
CRLs [RFC6818], or some other mechanism.
Why isn't this required?
S 4.1.
identical to other common cases of X.509 certificate authentication
(as described, for example, in [RFC6125]). Consider the example
policy given above, with an "mx" pattern containing ".example.com".
In this case, if the MX server's X.509 certificate contains a SAN
matching "*.example.com", we are required to implement
"wildcard-to-
wildcard" matching.
If you follow my advice above, this will not be necessary.
S 8.1.
may be unable to discover that a new policy exists until the DNS TTL
has passed. Recipients should therefore ensure that old policies
continue to work for message delivery during this period of time, or
risk message delays.
Recipients should also prefer to update the HTTPS policy body before
Do you mean SHOULD?
S 8.1.
continue to work for message delivery during this period of time, or
risk message delays.
Recipients should also prefer to update the HTTPS policy body before
updating the TXT record; this ordering avoids the risk that senders,
seeing a new TXT record, mistakenly cache the old policy from HTTPS.
Wouldn't it be easier to just to version the policies?
S 10.2.
mode, to allow clean MTA-STS removal, as described in Section 8.3.)
Resistance to downgrade attacks of this nature--due to the ability
to
authoritatively determine "lack of a record" even for non-
participating recipients--is a feature of DANE, due to its use of
DNSSEC for policy discovery.
I'm surprised that you don't note that if you use DNSSEC (and the
client validates), you are in general resistant to this form of
attack.
_______________________________________________
Uta mailing list
https://www.ietf.org/mailman/listinfo/uta
Viktor Dukhovni
2018-05-06 18:41:12 UTC
Permalink
2. Why is the "mx" pattern matched against the SANs and not the MX records themselves? As Viktor noted and I commented briefly in passing, we debated this a *lot* before. One point here is that this is only visible to MTA implementors; sysadmins who mistakenly believe the "mx" field should match the DNS records (which should themselves match the servers' certificates) will end up making their configurations valid per the actual specification. In other words, "match the policy against the SAN" matches a superset of conditions which are valid in the alternative ("match the policy against the MX records and match those records against the certificate"). Personally I would consider this edit to have been a compromise--it was not and is still not my first choice--but, given it is the status quo, I am fairly loath to change it.
On these points--especially #2--I continue to defer to the guidance of the chairs on how best to resolve such issues.
After having to revisit this in response to the DISCUSS, I can
crystalize the issue in terms of the following dichotomy:

* Does MTA-STS secure the connections to the endpoints indicated
by a domain's MX RRset, without preempting MX-based SMTP routing?

or

* Does MTA-STS secure the MX RRset, possibly filtering it to at
at most a set of names cached in the policy, with great care
to first take care of loop elimination.

My sense is that the first option (current text) is a less invasive
change in SMTP, it changes only how the peer is authenticated.

For example, it "testing" mode, one probably SHOULD NOT trim the MX
RRset based on a "testing" policy. Or one might support multiple
authentication mechanisms for the peer MX (say key fingerprint as
a fallback of MTA-STS fails).

There are more implications to filtering the RRset then just
the presented-id matching...
--
Viktor.
Daniel Margolis
2018-05-06 19:44:44 UTC
Permalink
I don't believe that *pre-filtering *the MX candidate list is the only way
to do it. You could leave the loop as-is and just refuse to connect to
(i.e. treat as a transient connection failure) any candidate which fails
the policy validation. So this is an implementation question; modifying
loop pre-filtering is probably riskier than what we might call "connection
early termination", but both are compliant with the protocol.

The real difference between the two options is not, I think, this
implementation question, but that the current protocol technically allows
some valid configurations that are invalid in the MX-based
alternative--namely, the case where the certificate does not match the MX
hostname. That turns out to be fairly common (per
https://conferences.sigcomm.org/imc/2015/papers/p27.pdf), though, frankly,
I do not know that there's a good reason for admins to deliberately
configure a system in such a matter and, as a result, I don't believe
there's a strong argument for us preserving that flexibility.

I guess the tl;dr as far as I'm concerned is that I think either way really
can be done safely, that it's mostly a documentation issue, but I am
generally hesitant to change things now if we don't have to.
Post by Daniel Margolis
2. Why is the "mx" pattern matched against the SANs and not the MX
records themselves? As Viktor noted and I commented briefly in passing, we
debated this a *lot* before. One point here is that this is only visible to
MTA implementors; sysadmins who mistakenly believe the "mx" field should
match the DNS records (which should themselves match the servers'
certificates) will end up making their configurations valid per the actual
specification. In other words, "match the policy against the SAN" matches a
superset of conditions which are valid in the alternative ("match the
policy against the MX records and match those records against the
certificate"). Personally I would consider this edit to have been a
compromise--it was not and is still not my first choice--but, given it is
the status quo, I am fairly loath to change it.
Post by Daniel Margolis
On these points--especially #2--I continue to defer to the guidance of
the chairs on how best to resolve such issues.
After having to revisit this in response to the DISCUSS, I can
* Does MTA-STS secure the connections to the endpoints indicated
by a domain's MX RRset, without preempting MX-based SMTP routing?
or
* Does MTA-STS secure the MX RRset, possibly filtering it to at
at most a set of names cached in the policy, with great care
to first take care of loop elimination.
My sense is that the first option (current text) is a less invasive
change in SMTP, it changes only how the peer is authenticated.
For example, it "testing" mode, one probably SHOULD NOT trim the MX
RRset based on a "testing" policy. Or one might support multiple
authentication mechanisms for the peer MX (say key fingerprint as
a fallback of MTA-STS fails).
There are more implications to filtering the RRset then just
the presented-id matching...
--
Viktor.
Viktor Dukhovni
2018-05-06 21:20:57 UTC
Permalink
I don't believe that pre-filtering the MX candidate list is the only way to do it. You could leave the loop as-is and just refuse to connect to (i.e. treat as a transient connection failure) any candidate which fails the policy validation. So this is an implementation question; modifying loop pre-filtering is probably riskier than what we might call "connection early termination", but both are compliant with the protocol.
It makes a difference with a "testing" policy. Should mail be sent via
an MX host not listed in the policy, or should it be skipped? With
"testing" the mail should probably go out, with a report of the authentication
failure (impossible success given unexpected MX name) sent per any "tlsrpt"
policy.

So at least "testing" should probably use all the MX hosts. Whether "enforce"
does or does not is then a question of whether doing it differently for the
two cases is a potential source of confusion/bugs, and prominent anti-loop
warnings.

There are even some domains where connecting to the backup MX host *before*
trying a connection to the primary will cause firewall rules to be dynamically
added to block the client!
--
Viktor.
Viktor Dukhovni
2018-05-07 03:03:58 UTC
Permalink
Post by Viktor Dukhovni
I don't believe that pre-filtering the MX candidate list is the only way to do it. You could leave the loop as-is and just refuse to connect to (i.e. treat as a transient connection failure) any candidate which fails the policy validation. So this is an implementation question; modifying loop pre-filtering is probably riskier than what we might call "connection early termination", but both are compliant with the protocol.
It makes a difference with a "testing" policy. Should mail be sent via
an MX host not listed in the policy, or should it be skipped? With
"testing" the mail should probably go out, with a report of the authentication
failure (impossible success given unexpected MX name) sent per any "tlsrpt"
policy.
So at least "testing" should probably use all the MX hosts. Whether "enforce"
does or does not is then a question of whether doing it differently for the
two cases is a potential source of confusion/bugs, and prominent anti-loop
warnings.
There are even some domains where connecting to the backup MX host *before*
trying a connection to the primary will cause firewall rules to be dynamically
added to block the client!
And yet, of course, we could essentially in all cases go through the motions
of considering *every* MX host, and even connect to each X host in turn as
needed, but still authenticate the peer based on a match between the
certificate and the MX hostname, with the additional constraint that the
MX hostname match the policy "mx" list.

* In "testing" mode one would still actually connect even to MX hosts
whose names don't match the cached policy.

* In "enforce" mode, one could at the last moment optimize-out connections
to hosts which are sure to fail authentication, because the MX hostname
does not match the "mx" list. This of course after loop eiimination, etc.

I think that's the point that Daniel was trying to make...
--
Viktor.
Loading...