Commit Graph

47 Commits

Author SHA1 Message Date
Matthew Holt
aad674cda5
ari: Fix panic when loaded cert has no RetryAfter 2024-09-05 10:53:29 -06:00
Matthew Holt
5ee48a3108
Add config option to disable ARI
This may be temporary until ARI is more mature
2024-08-08 08:08:29 -06:00
Matthew Holt
16e2e0b344
Synchronize ARI fetching (fix #297) 2024-06-28 10:33:21 -06:00
Matthew Holt
ed73243f8b
Export interface for GetRenewalInfo
We can't assume the ARI-supporting issuer types are exactly *ACMEIssuer; they may be implemented by third party packages (such as caddytls.ACMEIssuer).
2024-06-01 17:59:39 -06:00
Matt Holt
0e88b3eaa1
Initial implementation of ARI (#286)
* Initial implementation of ARI

* Enhance redundancy, robustness, and logging

* Improve ARI updating; integrate on-demand TLS; detect changed window
2024-05-07 09:46:03 -06:00
Matthew Holt
74862ff45a
Upgrade acmez to v2 beta
Adds support for customizing NotBefore/NotAfter times of certs
2024-04-08 14:05:43 -06:00
Francis Lavoie
857856663d
Demote "storage cleaning happened too recently" from WARN to INFO (#270) 2024-02-19 16:48:18 -07:00
Matthew Holt
754844673f
Don't try to decode last clean data if file does not exist 2023-11-14 07:45:43 -07:00
Matthew Holt
ee3b26a5e1
Global exclusion lock on storage cleaning
Cleaning storage now obtains a lock, and it can optionally be configured
to only happen once per interval.

This should help lower costs for expensive storage backends
that are used by clusters of CertMagic/Caddy instances.
2023-11-13 20:59:51 -07:00
Matthew Holt
93a28b732a
Make cache options updateable; new remove methods
These are useful for advanced applications (like Caddy) which would
like to remove certificates from the
cache in a controlled way, and operate the
cache with new settings while running.
2023-07-08 09:56:51 -06:00
Matthew Holt
5deb7c2fb0 Make logger values required
Eliminates a bajillion nil checks and footguns
(except in tests, which bypass exported APIs, but that is expected)

Most recent #207

Logging can still be disabled via zap.NewNop(), if necessary.
(But disabling logging in CertMagic is a really bad idea.)
2022-09-26 10:19:30 -06:00
Matthew Holt
585ecc11ac
events: Remove cert_renewed, add cert_ocsp_revoked 2022-08-31 11:13:09 -06:00
Ben Burkert
871b774821
Add one second (at most) to account for NotAfter imprecision (#199)
Fix #197
2022-08-16 18:08:34 -06:00
Matthew Holt
03cffeb193
Update a couple comments 2022-03-25 10:55:29 -06:00
Alban Lecocq
915efd8fdb
Fix crash because of a zero value cert in cache (#177)
* Fix crash because of a zero value cert in cache

Check a cert is still in cache when trying to update its
ocsp & OCSPStaple fields

Why: Bc in parallel of updateOCSPStaples() loops,
any cert can be removed from a full cache to make some room.

* Update maintain.go

Co-authored-by: Matt Holt <mholt@users.noreply.github.com>

Co-authored-by: Matt Holt <mholt@users.noreply.github.com>
2022-03-22 15:54:52 -06:00
Dave Henderson
9a56fcd4f9
Propagate context in the Storage interface methods (#155)
* Add context propagation to the Storage interface

Signed-off-by: Dave Henderson <dhenderson@gmail.com>

* Bump to Go 1.17

* Minor cleanup

* filestorage: Honor context cancellation in List()

Co-authored-by: Matthew Holt <mholt@users.noreply.github.com>
2022-03-07 12:26:52 -07:00
Matthew Holt
579abc82db
Finish cert revocation checking enhancements 2022-02-01 13:24:11 -07:00
Matthew Holt
bded7eab59
WIP 2022-02-01 11:04:25 -07:00
Matt Holt
eef59acc1d
Fix force-renewing revoked on-demand certs (#166)
* Fix force-renewing revoked on-demand certs

Follow-up to 9245be5a2f

* One more fix for on-demand logic of revoked certs

* OCSP revocation checks at startup, too

Required significant refactoring, hope it works.
Yet again way too late at night for this...
2022-02-01 09:05:24 -07:00
Matthew Holt
9245be5a2f
Automatically replace revoked certs managed on-demand
When I initially wrote the auto-replace feature, it was for the standard mode of operation,
which I presumed the vast majority of CertMagic deployments use. At the time, On-Demand
mode of operation was fairly niche. And at the time, it looked tricky to properly enable this feature for on-demand certificates, so I shelved it considering it would be low-impact anyway.
So on-demand certificates didn't benefit from auto-replace in the case of revocation (oh well,
no other servers / ACME clients do that at all anyway).

I guess since that time, the use of CertMagic's exclusive on-demand feature has risen in
popularity. But there is no way to tell, and I had no real way of knowing whether any
significant use of the feature is being had since Caddy has no telemetry. (We used to
have telemetry -- benign, anonymous technical stats to help us understand usage -- but
unfortunately public backlash forced us to end the program.) Based on public feedback
forced by external events, it seems that on-demand TLS deployments are probably rare,
but each of those few deployments actually serve thousands of sites/domains. (The
true importance of this feature would have been clear months ago if Caddy had telemetry,
as Caddy is the primary importer of CertMagic.)

This commit should enable auto-replace for on-demand certificates. It required some
refactoring and some decisions that aren't *entirely* clear are right, but that's how it
goes.

I haven't tested this. (Last time I worked on this feature it took me about 2 days to test properly.)
2022-01-30 21:58:34 -07:00
Matt Holt
07f7d0dec1
Allow forced renewals; fix renew on OCSP revoke; change key on compromise (#134)
* Begin refactor of ObtainCert and RenewCert to allow force renews

* Don't reuse private key in case of revocation due to key compromise

* Improve logging in renew

* Run OCSP check at start of cache maintenance

Otherwise we wait until first tick (currently 1 hour) which might be too long

* Fix obtain; move some things around

Obtain now tries to reuse private key if exists, but if it doesn't exist, that shouldn't be an error (so we clear the error in that case).

Moved the removal of compromised private keys to have logging make more sense.
2021-06-12 13:47:47 -06:00
Matthew Holt
d2311e1f3e
Don't maintain on-demand certs in background
On-demand certs are managed at handshake-time. Doing so in the background was
a temporary holdover until on-demand maintenance improved, which it since has.
Since background maintenance did not consult the "ask" endpoint or decision func,
it would sometimes renew certificates that were not desirable to renew.

See https://caddy.community/t/clean-up-caddy-certificates/11429/11?u=matt
2021-02-10 14:35:27 -07:00
Matthew Holt
725b69d53d
Configurable OCSP stapling
Allows disabling it entirely, or overriding responder URLs

See https://github.com/caddyserver/caddy/issues/3714
2021-01-07 15:45:22 -07:00
Matthew Holt
e7f9729bad
Renew managed on-demand certificates at handshake-time if necessary
If the machine goes to sleep or the process gets suspended, background
maintenance won't happen, so we need to check for expiration of all
managed, on-demand certificates at every handshake. Fortunately, this is
pretty cheap because it's simple date math.

https://caddy.community/t/local-certificates-not-renewing-on-demand/9482
2020-08-17 12:14:46 -06:00
Matthew Holt
e6076585c0
Convert (most of the library) to structured logs (closes #19)
Logging is now configurable through setting the Logging field on the
various relevant struct types. This is a more useful, consistent, and
higher-performing experience with logs than the std lib logger we used
before.

This isn't a 100% complete transition because there are some parts of
the code base that don't have obvious or easy access to a logger.
They are mostly fringe/edge cases though, and most are error logs, so
you shouldn't see them under normal circumstances. They still emit to
the std lib logger, so it's not like any errors get hidden: they are
just unstructured until we find a way to give them access to a logger.
2020-07-29 19:38:12 -06:00
Matt Holt
b76b76abfc
Replace lego with ACMEz (close #71) (#78) 2020-07-27 16:50:41 -06:00
Matthew Holt
fff412bb74
Restart maintenance routine if it panics 2020-05-13 11:11:27 -06:00
Matthew Holt
5ed364019b
Add nil check; recover from all goroutines 2020-05-12 09:28:56 -06:00
Matthew Holt
0a90841d31
Consider client's signature support when choosing certificates
This allows two certs (say, RSA and ECDSA) for the same names to be
loaded, and CertMagic will consider which one the client supports and
use that.

We used to extract just select fields from the leaf certificate so that
we didn't need to fill memory with more data than necessary, but in
order to use the stdlib's SupportsCertificate() method, we have to keep
the full tls.Certificate.Leaf field set for speed during handshakes.
2020-04-01 18:25:02 -06:00
Matthew Holt
e9f9f60183
Separate logic for qualifying names for any cert vs. public certs 2020-03-13 19:09:36 -06:00
Matthew Holt
5265f2bcb1
Rename function 2020-03-12 16:02:48 -06:00
Matthew Holt
e02edabc36 Ask before renewing and uncache rejected certs; fix certs path 2020-03-06 17:55:13 -07:00
Matthew Holt
8a7197beca
Change renewal window to a ratio of cert lifetime instead of hard coded
This allows CertMagic to accommodate certificates with extremely short
lifetimes (new defaults work with cert lifetimes < 24h, but I wouldn't
want to push it < 30m with these defaults).
2020-02-24 18:42:27 -07:00
Matthew Holt
37e754b40c
Major refactor to improve performance, correctness, and extensibility
Breaking changes; thank goodness we're not 1.0 yet 😅 - read on!

This change completely separates ACME-specific code from the rest of the
certificate management process, allowing pluggable sources for certs
that aren't ACME.

Notably, most of Config was spliced into ACMEManager. Similarly, there's
now Default and DefaultACME.

Storage structure had to be reconfigured. Certificates are no longer in
the acme/ subfolder since they can be obtained by ways other than ACME!
Certificates moved to a new certificates/ subfolder. The subfolders in
that folder use the path of the ACME endpoint instead of just the host,
so that also changed. Be aware that unless you move your certs over,
CertMagic will not find them and will attempt to get new ones. That is
usually fine for most users, but for extremely large deployments, you
will want to move them over first.

Old certs path:
  acme/acme-staging-v02.api.letsencrypt.org/...

New certs path:
  certificates/acme-staging-v02.api.letsencrypt.org-directory/...

That's all for significant storage changes!

But this refactor also vastly improves performance, especially at scale,
and makes CertMagic way more resilient to errors. Retries are done on
the staging endpoint by default, so they won't count against your rate
limit. If your hardware can handle it, I'm now pretty confident that you
can give CertMagic a million domain names and it will gracefully manage
them, as fast as it can within internal and external rate limits, even
in the presence of errors. Errors will of course slow some things down,
but you should be good to go if you're monitoring logs and can fix any
misconfigurations or other external errors!

Several other mostly-minor enhancements fix bugs, especially at scale.
For example, duplicated renewal tasks (that continuously fail) will not
pile up on each other: only one will operate, under exponential backoff.

Closes #50 and fixes #55
2020-02-21 14:32:57 -07:00
Matthew Holt
7311b4680c
Perform OCSP staple updates outside of lock on certCache
Also add some log entries when certs are replaced in cache
2020-01-10 11:18:37 -07:00
Matthew Holt
1c70bb8ce4 Update rate limiter to allow cancellation; add context to arguments
The previous rate limiter design did not allow reservation cancellation.
This became problematic with lots of config reloads in Caddy for large
numbers of domain names. While the rate limiter had a backlog, a new
config would come in and add even more to the rate limiter, and even
more over time as background maintenance (renewals) kicked in. This
leaked goroutines and memory as a side-effect, and blocked the issuance
of certificates nigh indefinitely.

The new rate limiter does not make future reservations like the previous
one did. However, this requires us to run a single scheduler goroutine
when a rate limiter is created, which requires being cleaned up when the
rate limiter is no longer needed. As rate limits are global and should
live up to the life of the process, there is currently no actual cleanup
that takes place, but if it did happen, one would simply call Stop() on
the rate limiter to stop that goroutine.

With this new design, reservations are made only as the event actually
happens; implementing cancellation with the old design would have been
almost impossible to do correctly in a practical, elegant way. Although
the trade-off is an extra goroutine that needs cleaning up, this is
seldom (if ever?) needed in practice, and the benefit is that waiting
goroutines can be unblocked when their context is canceled. This allows
Caddy, for example, to reload configs often and cancel any goroutines
that were merely waiting on the rate limiter.

Now, all Obtain, Renew, and Revoke calls accept a context that can be
cancelled.

We also eliminate the acmeMu, a mutex that permitted only a single ACME
operation at a time by the process, which was our early, naive form of
rate limiting, which should no longer be necessary.

On-demand obtain and renew do not yet use cancelable contexts, because
what defines the context of a TLS handshake is still unclear. We might
end up using a simple context with a timeout that is the maximum length
of a TLS handshake in practice, say, 1 minute.

This is a breaking change, but critical for larger deployments with very
dynamic configurations.
2019-12-16 13:36:48 -07:00
Matthew Holt
c52848a21d
Background cert operations; ManageSync() and ManageAsync()
Split Manage() into ManageSync() and ManageAsync().

In accordance with developing best practices, ACME operations should be
allowed to happen in the background and not block server startup in
non-interactive environments.

We also no longer return an error during batch cert renewals, because
we always treat it as a background operation. (The ManageSync() method
can perform foreground renewal if that is desired.)
2019-10-16 00:19:57 -06:00
Matthew Holt
ecf5d6b59e
Remove revoked managed certificates from cache 2019-09-18 10:53:59 -06:00
Matthew Holt
5dca2331a8
Attempt to replace managed certs when OCSP status is Revoked 2019-09-18 06:00:34 -06:00
Matthew Holt
647ffbe02d
Ability to clean out expired certificates and related assets in storage
Closes #43
2019-09-17 15:54:01 -06:00
Matthew Holt
6a42ef9fe8
Optional tags for unmanaged certificates
This allows for user-loaded certificates to be associated with arbitrary
values such as user-provided IDs or categories. This can be useful if
multiple certificates satisfy a ClientHello but if a specific one still
needs to be chosen. See for example:
https://github.com/mholt/caddy/issues/2588

This is a breaking API change since we need to expose a tags parameter
to the caching functions, but we're not 1.0 yet so we will try this
API change and see how it goes.
2019-06-24 11:51:58 -06:00
Matt Holt
8f7a1caa59
Significant refactoring to improve correctness and flexibility (#39)
* Significant refactor

This refactoring expands the capabilities of the library for advanced
use cases, as well as improving the overall architecture, including
possible memory leak fixes if used over a long period with many certs
loaded into memory. This refactor enables using different configs
depending on the certificate.

The public API has changed slightly, however, and arguably it is
slightly less convenient/elegant. I have never quite found the perfect
design for this package, and this certainly isn't it, but I think it's
better than what we had before.

There is still work to be done, but this is a good step forward. I've
decoupled Storage from Cache, and made it easier and more correct for
Configs (and Storage values) to be short-lived. Cache is the only value
that should be long-lived.

Note that CertMagic no longer automatically takes care of storage (i.e.
it used to delete old OCSP staples, but now it doesn't). The functions
to do this are still there and even exported, and now we expect the
application to call the cleanup functions when it wants to.

* Fix little oopsies

* Create Manager abstraction so obtain/renew isn't limited to ACME
2019-04-20 10:44:55 -06:00
Matthew Holt
a7f18a937c
Fix nil pointer dereference and cleanup after TLS-ALPN challenge 2019-02-02 16:10:51 -07:00
Matthew Holt
318e24ccb2
Print which cache is doing maintenance in log entries 2018-12-12 15:43:34 -07:00
Matthew Holt
b2a67f0504 FileStorage: Fix List(); modify Storage interface (fixes #4)
Adding a recursive option to List(), which, if true, causes List to
act like a walk function.

Also differentiating between "terminal" keys and "non-terminal" in
KeyInfo, since sometimes directories are useful, like listing user
accounts.
2018-12-12 14:47:46 -07:00
Matthew Holt
1f94da1ed1
Compatibility with refactored lego core
lego commit: 42941ccea6b431ebff203d4cb520991fb7b47951
2018-12-10 00:26:09 -07:00
Matthew Holt
bea13a36c8
Initial commit 2018-12-09 20:15:26 -07:00