2018-12-10 13:15:26 +10:00
// Copyright 2015 Matthew Holt
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package certmagic
import (
Update rate limiter to allow cancellation; add context to arguments
The previous rate limiter design did not allow reservation cancellation.
This became problematic with lots of config reloads in Caddy for large
numbers of domain names. While the rate limiter had a backlog, a new
config would come in and add even more to the rate limiter, and even
more over time as background maintenance (renewals) kicked in. This
leaked goroutines and memory as a side-effect, and blocked the issuance
of certificates nigh indefinitely.
The new rate limiter does not make future reservations like the previous
one did. However, this requires us to run a single scheduler goroutine
when a rate limiter is created, which requires being cleaned up when the
rate limiter is no longer needed. As rate limits are global and should
live up to the life of the process, there is currently no actual cleanup
that takes place, but if it did happen, one would simply call Stop() on
the rate limiter to stop that goroutine.
With this new design, reservations are made only as the event actually
happens; implementing cancellation with the old design would have been
almost impossible to do correctly in a practical, elegant way. Although
the trade-off is an extra goroutine that needs cleaning up, this is
seldom (if ever?) needed in practice, and the benefit is that waiting
goroutines can be unblocked when their context is canceled. This allows
Caddy, for example, to reload configs often and cancel any goroutines
that were merely waiting on the rate limiter.
Now, all Obtain, Renew, and Revoke calls accept a context that can be
cancelled.
We also eliminate the acmeMu, a mutex that permitted only a single ACME
operation at a time by the process, which was our early, naive form of
rate limiting, which should no longer be necessary.
On-demand obtain and renew do not yet use cancelable contexts, because
what defines the context of a TLS handshake is still unclear. We might
end up using a simple context with a timeout that is the maximum length
of a TLS handshake in practice, say, 1 minute.
This is a breaking change, but critical for larger deployments with very
dynamic configurations.
2019-12-17 06:36:41 +10:00
"context"
2019-09-18 07:54:01 +10:00
"crypto/x509"
2023-11-14 13:59:51 +10:00
"encoding/json"
2019-09-18 07:54:01 +10:00
"encoding/pem"
2023-11-14 13:59:51 +10:00
"errors"
2019-09-18 07:54:01 +10:00
"fmt"
2023-11-14 13:59:51 +10:00
"io/fs"
2019-09-18 07:54:01 +10:00
"path"
2020-05-13 01:28:56 +10:00
"runtime"
2019-09-18 07:54:01 +10:00
"strings"
2018-12-10 13:15:26 +10:00
"time"
2024-04-09 06:05:43 +10:00
"github.com/mholt/acmez/v2/acme"
2020-07-30 11:38:12 +10:00
"go.uber.org/zap"
2018-12-10 13:15:26 +10:00
"golang.org/x/crypto/ocsp"
)
// maintainAssets is a permanently-blocking function
// that loops indefinitely and, on a regular schedule, checks
// certificates for expiration and initiates a renewal of certs
Significant refactoring to improve correctness and flexibility (#39)
* Significant refactor
This refactoring expands the capabilities of the library for advanced
use cases, as well as improving the overall architecture, including
possible memory leak fixes if used over a long period with many certs
loaded into memory. This refactor enables using different configs
depending on the certificate.
The public API has changed slightly, however, and arguably it is
slightly less convenient/elegant. I have never quite found the perfect
design for this package, and this certainly isn't it, but I think it's
better than what we had before.
There is still work to be done, but this is a good step forward. I've
decoupled Storage from Cache, and made it easier and more correct for
Configs (and Storage values) to be short-lived. Cache is the only value
that should be long-lived.
Note that CertMagic no longer automatically takes care of storage (i.e.
it used to delete old OCSP staples, but now it doesn't). The functions
to do this are still there and even exported, and now we expect the
application to call the cleanup functions when it wants to.
* Fix little oopsies
* Create Manager abstraction so obtain/renew isn't limited to ACME
2019-04-21 02:44:55 +10:00
// that are expiring soon. It also updates OCSP stapling. It
2020-05-14 03:11:27 +10:00
// should only be called once per cache. Panics are recovered,
// and if panicCount < 10, the function is called recursively,
// incrementing panicCount each time. Initial invocation should
// start panicCount at 0.
func ( certCache * Cache ) maintainAssets ( panicCount int ) {
2022-09-27 02:19:28 +10:00
log := certCache . logger . Named ( "maintenance" )
log = log . With ( zap . String ( "cache" , fmt . Sprintf ( "%p" , certCache ) ) )
2020-07-30 11:38:12 +10:00
2020-05-13 01:28:56 +10:00
defer func ( ) {
if err := recover ( ) ; err != nil {
buf := make ( [ ] byte , stackTraceBufferSize )
buf = buf [ : runtime . Stack ( buf , false ) ]
2022-09-27 02:19:28 +10:00
log . Error ( "panic" , zap . Any ( "error" , err ) , zap . ByteString ( "stack" , buf ) )
2020-05-14 03:11:27 +10:00
if panicCount < 10 {
certCache . maintainAssets ( panicCount + 1 )
}
2020-05-13 01:28:56 +10:00
}
} ( )
2023-07-09 01:56:51 +10:00
certCache . optionsMu . RLock ( )
Significant refactoring to improve correctness and flexibility (#39)
* Significant refactor
This refactoring expands the capabilities of the library for advanced
use cases, as well as improving the overall architecture, including
possible memory leak fixes if used over a long period with many certs
loaded into memory. This refactor enables using different configs
depending on the certificate.
The public API has changed slightly, however, and arguably it is
slightly less convenient/elegant. I have never quite found the perfect
design for this package, and this certainly isn't it, but I think it's
better than what we had before.
There is still work to be done, but this is a good step forward. I've
decoupled Storage from Cache, and made it easier and more correct for
Configs (and Storage values) to be short-lived. Cache is the only value
that should be long-lived.
Note that CertMagic no longer automatically takes care of storage (i.e.
it used to delete old OCSP staples, but now it doesn't). The functions
to do this are still there and even exported, and now we expect the
application to call the cleanup functions when it wants to.
* Fix little oopsies
* Create Manager abstraction so obtain/renew isn't limited to ACME
2019-04-21 02:44:55 +10:00
renewalTicker := time . NewTicker ( certCache . options . RenewCheckInterval )
ocspTicker := time . NewTicker ( certCache . options . OCSPCheckInterval )
2023-07-09 01:56:51 +10:00
certCache . optionsMu . RUnlock ( )
2018-12-10 13:15:26 +10:00
2022-09-27 02:19:28 +10:00
log . Info ( "started background certificate maintenance" )
2019-02-03 09:10:51 +10:00
Update rate limiter to allow cancellation; add context to arguments
The previous rate limiter design did not allow reservation cancellation.
This became problematic with lots of config reloads in Caddy for large
numbers of domain names. While the rate limiter had a backlog, a new
config would come in and add even more to the rate limiter, and even
more over time as background maintenance (renewals) kicked in. This
leaked goroutines and memory as a side-effect, and blocked the issuance
of certificates nigh indefinitely.
The new rate limiter does not make future reservations like the previous
one did. However, this requires us to run a single scheduler goroutine
when a rate limiter is created, which requires being cleaned up when the
rate limiter is no longer needed. As rate limits are global and should
live up to the life of the process, there is currently no actual cleanup
that takes place, but if it did happen, one would simply call Stop() on
the rate limiter to stop that goroutine.
With this new design, reservations are made only as the event actually
happens; implementing cancellation with the old design would have been
almost impossible to do correctly in a practical, elegant way. Although
the trade-off is an extra goroutine that needs cleaning up, this is
seldom (if ever?) needed in practice, and the benefit is that waiting
goroutines can be unblocked when their context is canceled. This allows
Caddy, for example, to reload configs often and cancel any goroutines
that were merely waiting on the rate limiter.
Now, all Obtain, Renew, and Revoke calls accept a context that can be
cancelled.
We also eliminate the acmeMu, a mutex that permitted only a single ACME
operation at a time by the process, which was our early, naive form of
rate limiting, which should no longer be necessary.
On-demand obtain and renew do not yet use cancelable contexts, because
what defines the context of a TLS handshake is still unclear. We might
end up using a simple context with a timeout that is the maximum length
of a TLS handshake in practice, say, 1 minute.
This is a breaking change, but critical for larger deployments with very
dynamic configurations.
2019-12-17 06:36:41 +10:00
ctx , cancel := context . WithCancel ( context . Background ( ) )
defer cancel ( )
2018-12-10 13:15:26 +10:00
for {
select {
case <- renewalTicker . C :
Update rate limiter to allow cancellation; add context to arguments
The previous rate limiter design did not allow reservation cancellation.
This became problematic with lots of config reloads in Caddy for large
numbers of domain names. While the rate limiter had a backlog, a new
config would come in and add even more to the rate limiter, and even
more over time as background maintenance (renewals) kicked in. This
leaked goroutines and memory as a side-effect, and blocked the issuance
of certificates nigh indefinitely.
The new rate limiter does not make future reservations like the previous
one did. However, this requires us to run a single scheduler goroutine
when a rate limiter is created, which requires being cleaned up when the
rate limiter is no longer needed. As rate limits are global and should
live up to the life of the process, there is currently no actual cleanup
that takes place, but if it did happen, one would simply call Stop() on
the rate limiter to stop that goroutine.
With this new design, reservations are made only as the event actually
happens; implementing cancellation with the old design would have been
almost impossible to do correctly in a practical, elegant way. Although
the trade-off is an extra goroutine that needs cleaning up, this is
seldom (if ever?) needed in practice, and the benefit is that waiting
goroutines can be unblocked when their context is canceled. This allows
Caddy, for example, to reload configs often and cancel any goroutines
that were merely waiting on the rate limiter.
Now, all Obtain, Renew, and Revoke calls accept a context that can be
cancelled.
We also eliminate the acmeMu, a mutex that permitted only a single ACME
operation at a time by the process, which was our early, naive form of
rate limiting, which should no longer be necessary.
On-demand obtain and renew do not yet use cancelable contexts, because
what defines the context of a TLS handshake is still unclear. We might
end up using a simple context with a timeout that is the maximum length
of a TLS handshake in practice, say, 1 minute.
This is a breaking change, but critical for larger deployments with very
dynamic configurations.
2019-12-17 06:36:41 +10:00
err := certCache . RenewManagedCertificates ( ctx )
2022-09-27 02:19:28 +10:00
if err != nil {
2020-07-30 11:38:12 +10:00
log . Error ( "renewing managed certificates" , zap . Error ( err ) )
2018-12-10 13:15:26 +10:00
}
case <- ocspTicker . C :
Update rate limiter to allow cancellation; add context to arguments
The previous rate limiter design did not allow reservation cancellation.
This became problematic with lots of config reloads in Caddy for large
numbers of domain names. While the rate limiter had a backlog, a new
config would come in and add even more to the rate limiter, and even
more over time as background maintenance (renewals) kicked in. This
leaked goroutines and memory as a side-effect, and blocked the issuance
of certificates nigh indefinitely.
The new rate limiter does not make future reservations like the previous
one did. However, this requires us to run a single scheduler goroutine
when a rate limiter is created, which requires being cleaned up when the
rate limiter is no longer needed. As rate limits are global and should
live up to the life of the process, there is currently no actual cleanup
that takes place, but if it did happen, one would simply call Stop() on
the rate limiter to stop that goroutine.
With this new design, reservations are made only as the event actually
happens; implementing cancellation with the old design would have been
almost impossible to do correctly in a practical, elegant way. Although
the trade-off is an extra goroutine that needs cleaning up, this is
seldom (if ever?) needed in practice, and the benefit is that waiting
goroutines can be unblocked when their context is canceled. This allows
Caddy, for example, to reload configs often and cancel any goroutines
that were merely waiting on the rate limiter.
Now, all Obtain, Renew, and Revoke calls accept a context that can be
cancelled.
We also eliminate the acmeMu, a mutex that permitted only a single ACME
operation at a time by the process, which was our early, naive form of
rate limiting, which should no longer be necessary.
On-demand obtain and renew do not yet use cancelable contexts, because
what defines the context of a TLS handshake is still unclear. We might
end up using a simple context with a timeout that is the maximum length
of a TLS handshake in practice, say, 1 minute.
This is a breaking change, but critical for larger deployments with very
dynamic configurations.
2019-12-17 06:36:41 +10:00
certCache . updateOCSPStaples ( ctx )
2018-12-10 13:15:26 +10:00
case <- certCache . stopChan :
renewalTicker . Stop ( )
ocspTicker . Stop ( )
2022-09-27 02:19:28 +10:00
log . Info ( "stopped background certificate maintenance" )
Significant refactoring to improve correctness and flexibility (#39)
* Significant refactor
This refactoring expands the capabilities of the library for advanced
use cases, as well as improving the overall architecture, including
possible memory leak fixes if used over a long period with many certs
loaded into memory. This refactor enables using different configs
depending on the certificate.
The public API has changed slightly, however, and arguably it is
slightly less convenient/elegant. I have never quite found the perfect
design for this package, and this certainly isn't it, but I think it's
better than what we had before.
There is still work to be done, but this is a good step forward. I've
decoupled Storage from Cache, and made it easier and more correct for
Configs (and Storage values) to be short-lived. Cache is the only value
that should be long-lived.
Note that CertMagic no longer automatically takes care of storage (i.e.
it used to delete old OCSP staples, but now it doesn't). The functions
to do this are still there and even exported, and now we expect the
application to call the cleanup functions when it wants to.
* Fix little oopsies
* Create Manager abstraction so obtain/renew isn't limited to ACME
2019-04-21 02:44:55 +10:00
close ( certCache . doneChan )
2018-12-10 13:15:26 +10:00
return
}
}
}
// RenewManagedCertificates renews managed certificates,
2018-12-10 17:26:09 +10:00
// including ones loaded on-demand. Note that this is done
// automatically on a regular basis; normally you will not
2019-10-16 16:19:57 +10:00
// need to call this. This method assumes non-interactive
// mode (i.e. operating in the background).
Update rate limiter to allow cancellation; add context to arguments
The previous rate limiter design did not allow reservation cancellation.
This became problematic with lots of config reloads in Caddy for large
numbers of domain names. While the rate limiter had a backlog, a new
config would come in and add even more to the rate limiter, and even
more over time as background maintenance (renewals) kicked in. This
leaked goroutines and memory as a side-effect, and blocked the issuance
of certificates nigh indefinitely.
The new rate limiter does not make future reservations like the previous
one did. However, this requires us to run a single scheduler goroutine
when a rate limiter is created, which requires being cleaned up when the
rate limiter is no longer needed. As rate limits are global and should
live up to the life of the process, there is currently no actual cleanup
that takes place, but if it did happen, one would simply call Stop() on
the rate limiter to stop that goroutine.
With this new design, reservations are made only as the event actually
happens; implementing cancellation with the old design would have been
almost impossible to do correctly in a practical, elegant way. Although
the trade-off is an extra goroutine that needs cleaning up, this is
seldom (if ever?) needed in practice, and the benefit is that waiting
goroutines can be unblocked when their context is canceled. This allows
Caddy, for example, to reload configs often and cancel any goroutines
that were merely waiting on the rate limiter.
Now, all Obtain, Renew, and Revoke calls accept a context that can be
cancelled.
We also eliminate the acmeMu, a mutex that permitted only a single ACME
operation at a time by the process, which was our early, naive form of
rate limiting, which should no longer be necessary.
On-demand obtain and renew do not yet use cancelable contexts, because
what defines the context of a TLS handshake is still unclear. We might
end up using a simple context with a timeout that is the maximum length
of a TLS handshake in practice, say, 1 minute.
This is a breaking change, but critical for larger deployments with very
dynamic configurations.
2019-12-17 06:36:41 +10:00
func ( certCache * Cache ) RenewManagedCertificates ( ctx context . Context ) error {
2022-09-27 02:19:28 +10:00
log := certCache . logger . Named ( "maintenance" )
2020-07-30 11:38:12 +10:00
2024-05-08 01:46:03 +10:00
// configs will hold a map of certificate hash to the config
Significant refactoring to improve correctness and flexibility (#39)
* Significant refactor
This refactoring expands the capabilities of the library for advanced
use cases, as well as improving the overall architecture, including
possible memory leak fixes if used over a long period with many certs
loaded into memory. This refactor enables using different configs
depending on the certificate.
The public API has changed slightly, however, and arguably it is
slightly less convenient/elegant. I have never quite found the perfect
design for this package, and this certainly isn't it, but I think it's
better than what we had before.
There is still work to be done, but this is a good step forward. I've
decoupled Storage from Cache, and made it easier and more correct for
Configs (and Storage values) to be short-lived. Cache is the only value
that should be long-lived.
Note that CertMagic no longer automatically takes care of storage (i.e.
it used to delete old OCSP staples, but now it doesn't). The functions
to do this are still there and even exported, and now we expect the
application to call the cleanup functions when it wants to.
* Fix little oopsies
* Create Manager abstraction so obtain/renew isn't limited to ACME
2019-04-21 02:44:55 +10:00
// to use when managing that certificate
configs := make ( map [ string ] * Config )
2018-12-10 13:15:26 +10:00
// we use the queues for a very important reason: to do any and all
// operations that could require an exclusive write lock outside
// of the read lock! otherwise we get a deadlock, yikes. in other
// words, our first iteration through the certificate cache does NOT
// perform any operations--only queues them--so that more fine-grained
// write locks may be obtained during the actual operations.
2024-05-08 01:46:03 +10:00
var renewQueue , reloadQueue , deleteQueue , ariQueue certList
2018-12-10 13:15:26 +10:00
certCache . mu . RLock ( )
for certKey , cert := range certCache . cache {
if ! cert . managed {
continue
}
// the list of names on this cert should never be empty... programmer error?
if cert . Names == nil || len ( cert . Names ) == 0 {
2022-09-27 02:19:28 +10:00
log . Warn ( "certificate has no names; removing from cache" , zap . String ( "cert_key" , certKey ) )
2018-12-10 13:15:26 +10:00
deleteQueue = append ( deleteQueue , cert )
continue
}
Significant refactoring to improve correctness and flexibility (#39)
* Significant refactor
This refactoring expands the capabilities of the library for advanced
use cases, as well as improving the overall architecture, including
possible memory leak fixes if used over a long period with many certs
loaded into memory. This refactor enables using different configs
depending on the certificate.
The public API has changed slightly, however, and arguably it is
slightly less convenient/elegant. I have never quite found the perfect
design for this package, and this certainly isn't it, but I think it's
better than what we had before.
There is still work to be done, but this is a good step forward. I've
decoupled Storage from Cache, and made it easier and more correct for
Configs (and Storage values) to be short-lived. Cache is the only value
that should be long-lived.
Note that CertMagic no longer automatically takes care of storage (i.e.
it used to delete old OCSP staples, but now it doesn't). The functions
to do this are still there and even exported, and now we expect the
application to call the cleanup functions when it wants to.
* Fix little oopsies
* Create Manager abstraction so obtain/renew isn't limited to ACME
2019-04-21 02:44:55 +10:00
// get the config associated with this certificate
cfg , err := certCache . getConfig ( cert )
if err != nil {
2022-09-27 02:19:28 +10:00
log . Error ( "unable to get configuration to manage certificate; unable to renew" ,
zap . Strings ( "identifiers" , cert . Names ) ,
zap . Error ( err ) )
Significant refactoring to improve correctness and flexibility (#39)
* Significant refactor
This refactoring expands the capabilities of the library for advanced
use cases, as well as improving the overall architecture, including
possible memory leak fixes if used over a long period with many certs
loaded into memory. This refactor enables using different configs
depending on the certificate.
The public API has changed slightly, however, and arguably it is
slightly less convenient/elegant. I have never quite found the perfect
design for this package, and this certainly isn't it, but I think it's
better than what we had before.
There is still work to be done, but this is a good step forward. I've
decoupled Storage from Cache, and made it easier and more correct for
Configs (and Storage values) to be short-lived. Cache is the only value
that should be long-lived.
Note that CertMagic no longer automatically takes care of storage (i.e.
it used to delete old OCSP staples, but now it doesn't). The functions
to do this are still there and even exported, and now we expect the
application to call the cleanup functions when it wants to.
* Fix little oopsies
* Create Manager abstraction so obtain/renew isn't limited to ACME
2019-04-21 02:44:55 +10:00
continue
}
if cfg == nil {
// this is bad if this happens, probably a programmer error (oops)
2022-09-27 02:19:28 +10:00
log . Error ( "no configuration associated with certificate; unable to manage" ,
zap . Strings ( "identifiers" , cert . Names ) )
Significant refactoring to improve correctness and flexibility (#39)
* Significant refactor
This refactoring expands the capabilities of the library for advanced
use cases, as well as improving the overall architecture, including
possible memory leak fixes if used over a long period with many certs
loaded into memory. This refactor enables using different configs
depending on the certificate.
The public API has changed slightly, however, and arguably it is
slightly less convenient/elegant. I have never quite found the perfect
design for this package, and this certainly isn't it, but I think it's
better than what we had before.
There is still work to be done, but this is a good step forward. I've
decoupled Storage from Cache, and made it easier and more correct for
Configs (and Storage values) to be short-lived. Cache is the only value
that should be long-lived.
Note that CertMagic no longer automatically takes care of storage (i.e.
it used to delete old OCSP staples, but now it doesn't). The functions
to do this are still there and even exported, and now we expect the
application to call the cleanup functions when it wants to.
* Fix little oopsies
* Create Manager abstraction so obtain/renew isn't limited to ACME
2019-04-21 02:44:55 +10:00
continue
}
2021-02-11 07:35:27 +10:00
if cfg . OnDemand != nil {
continue
}
Significant refactoring to improve correctness and flexibility (#39)
* Significant refactor
This refactoring expands the capabilities of the library for advanced
use cases, as well as improving the overall architecture, including
possible memory leak fixes if used over a long period with many certs
loaded into memory. This refactor enables using different configs
depending on the certificate.
The public API has changed slightly, however, and arguably it is
slightly less convenient/elegant. I have never quite found the perfect
design for this package, and this certainly isn't it, but I think it's
better than what we had before.
There is still work to be done, but this is a good step forward. I've
decoupled Storage from Cache, and made it easier and more correct for
Configs (and Storage values) to be short-lived. Cache is the only value
that should be long-lived.
Note that CertMagic no longer automatically takes care of storage (i.e.
it used to delete old OCSP staples, but now it doesn't). The functions
to do this are still there and even exported, and now we expect the
application to call the cleanup functions when it wants to.
* Fix little oopsies
* Create Manager abstraction so obtain/renew isn't limited to ACME
2019-04-21 02:44:55 +10:00
2024-05-08 01:46:03 +10:00
// ACME-specific: see if if ACME Renewal Info (ARI) window needs refreshing
2024-08-09 00:08:29 +10:00
if ! cfg . DisableARI && cert . ari . NeedsRefresh ( ) {
2024-05-08 01:46:03 +10:00
configs [ cert . hash ] = cfg
ariQueue = append ( ariQueue , cert )
}
2018-12-10 13:15:26 +10:00
// if time is up or expires soon, we need to try to renew it
Significant refactoring to improve correctness and flexibility (#39)
* Significant refactor
This refactoring expands the capabilities of the library for advanced
use cases, as well as improving the overall architecture, including
possible memory leak fixes if used over a long period with many certs
loaded into memory. This refactor enables using different configs
depending on the certificate.
The public API has changed slightly, however, and arguably it is
slightly less convenient/elegant. I have never quite found the perfect
design for this package, and this certainly isn't it, but I think it's
better than what we had before.
There is still work to be done, but this is a good step forward. I've
decoupled Storage from Cache, and made it easier and more correct for
Configs (and Storage values) to be short-lived. Cache is the only value
that should be long-lived.
Note that CertMagic no longer automatically takes care of storage (i.e.
it used to delete old OCSP staples, but now it doesn't). The functions
to do this are still there and even exported, and now we expect the
application to call the cleanup functions when it wants to.
* Fix little oopsies
* Create Manager abstraction so obtain/renew isn't limited to ACME
2019-04-21 02:44:55 +10:00
if cert . NeedsRenewal ( cfg ) {
2024-05-08 01:46:03 +10:00
configs [ cert . hash ] = cfg
2020-02-25 11:42:27 +10:00
2018-12-10 13:15:26 +10:00
// see if the certificate in storage has already been renewed, possibly by another
// instance that didn't coordinate with this one; if so, just load it (this
// might happen if another instance already renewed it - kinda sloppy but checking disk
// first is a simple way to possibly drastically reduce rate limit problems)
2024-05-08 01:46:03 +10:00
storedCertNeedsRenew , err := cfg . managedCertInStorageNeedsRenewal ( ctx , cert )
2018-12-10 13:15:26 +10:00
if err != nil {
// hmm, weird, but not a big deal, maybe it was deleted or something
2022-09-27 02:19:28 +10:00
log . Warn ( "error while checking if stored certificate is also expiring soon" ,
zap . Strings ( "identifiers" , cert . Names ) ,
zap . Error ( err ) )
2024-05-08 01:46:03 +10:00
} else if ! storedCertNeedsRenew {
// if the certificate does NOT need renewal and there was no error, then we
2018-12-10 13:15:26 +10:00
// are good to just reload the certificate from storage instead of repeating
// a likely-unnecessary renewal procedure
reloadQueue = append ( reloadQueue , cert )
continue
}
// the certificate in storage has not been renewed yet, so we will do it
// NOTE: It is super-important to note that the TLS-ALPN challenge requires
// a write lock on the cache in order to complete its challenge, so it is extra
// vital that this renew operation does not happen inside our read lock!
2024-05-08 01:46:03 +10:00
renewQueue . insert ( cert )
2018-12-10 13:15:26 +10:00
}
}
certCache . mu . RUnlock ( )
2024-05-08 01:46:03 +10:00
// Update ARI, and then for any certs where the ARI window changed,
// be sure to queue them for renewal if necessary
for _ , cert := range ariQueue {
cfg := configs [ cert . hash ]
cert , changed , err := cfg . updateARI ( ctx , cert , log )
if err != nil {
log . Error ( "updating ARI" , zap . Error ( err ) )
}
if changed && cert . NeedsRenewal ( cfg ) {
// it's theoretically possible that another instance already got the memo
// on the changed ARI and even renewed the cert already, and thus doing it
// here is wasteful, but I have never heard of this happening in reality,
// so to save some cycles for now I think we'll just queue it for renewal
// (notice how we use 'insert' to avoid duplicates, in case it was already
// scheduled for renewal anyway)
renewQueue . insert ( cert )
}
}
2018-12-10 13:15:26 +10:00
// Reload certificates that merely need to be updated in memory
for _ , oldCert := range reloadQueue {
2022-08-17 10:08:34 +10:00
timeLeft := expiresAt ( oldCert . Leaf ) . Sub ( time . Now ( ) . UTC ( ) )
2022-09-27 02:19:28 +10:00
log . Info ( "certificate expires soon, but is already renewed in storage; reloading stored certificate" ,
zap . Strings ( "identifiers" , oldCert . Names ) ,
zap . Duration ( "remaining" , timeLeft ) )
2018-12-10 13:15:26 +10:00
2024-05-08 01:46:03 +10:00
cfg := configs [ oldCert . hash ]
Significant refactoring to improve correctness and flexibility (#39)
* Significant refactor
This refactoring expands the capabilities of the library for advanced
use cases, as well as improving the overall architecture, including
possible memory leak fixes if used over a long period with many certs
loaded into memory. This refactor enables using different configs
depending on the certificate.
The public API has changed slightly, however, and arguably it is
slightly less convenient/elegant. I have never quite found the perfect
design for this package, and this certainly isn't it, but I think it's
better than what we had before.
There is still work to be done, but this is a good step forward. I've
decoupled Storage from Cache, and made it easier and more correct for
Configs (and Storage values) to be short-lived. Cache is the only value
that should be long-lived.
Note that CertMagic no longer automatically takes care of storage (i.e.
it used to delete old OCSP staples, but now it doesn't). The functions
to do this are still there and even exported, and now we expect the
application to call the cleanup functions when it wants to.
* Fix little oopsies
* Create Manager abstraction so obtain/renew isn't limited to ACME
2019-04-21 02:44:55 +10:00
2019-09-18 22:00:34 +10:00
// crucially, this happens OUTSIDE a lock on the certCache
2022-03-08 05:26:52 +10:00
_ , err := cfg . reloadManagedCertificate ( ctx , oldCert )
2018-12-10 13:15:26 +10:00
if err != nil {
2022-09-27 02:19:28 +10:00
log . Error ( "loading renewed certificate" ,
zap . Strings ( "identifiers" , oldCert . Names ) ,
zap . Error ( err ) )
2019-10-16 16:19:57 +10:00
continue
2018-12-10 13:15:26 +10:00
}
}
// Renewal queue
for _ , oldCert := range renewQueue {
2024-05-08 01:46:03 +10:00
cfg := configs [ oldCert . hash ]
Major refactor to improve performance, correctness, and extensibility
Breaking changes; thank goodness we're not 1.0 yet 😅 - read on!
This change completely separates ACME-specific code from the rest of the
certificate management process, allowing pluggable sources for certs
that aren't ACME.
Notably, most of Config was spliced into ACMEManager. Similarly, there's
now Default and DefaultACME.
Storage structure had to be reconfigured. Certificates are no longer in
the acme/ subfolder since they can be obtained by ways other than ACME!
Certificates moved to a new certificates/ subfolder. The subfolders in
that folder use the path of the ACME endpoint instead of just the host,
so that also changed. Be aware that unless you move your certs over,
CertMagic will not find them and will attempt to get new ones. That is
usually fine for most users, but for extremely large deployments, you
will want to move them over first.
Old certs path:
acme/acme-staging-v02.api.letsencrypt.org/...
New certs path:
certificates/acme-staging-v02.api.letsencrypt.org-directory/...
That's all for significant storage changes!
But this refactor also vastly improves performance, especially at scale,
and makes CertMagic way more resilient to errors. Retries are done on
the staging endpoint by default, so they won't count against your rate
limit. If your hardware can handle it, I'm now pretty confident that you
can give CertMagic a million domain names and it will gracefully manage
them, as fast as it can within internal and external rate limits, even
in the presence of errors. Errors will of course slow some things down,
but you should be good to go if you're monitoring logs and can fix any
misconfigurations or other external errors!
Several other mostly-minor enhancements fix bugs, especially at scale.
For example, duplicated renewal tasks (that continuously fail) will not
pile up on each other: only one will operate, under exponential backoff.
Closes #50 and fixes #55
2020-02-22 07:32:57 +10:00
err := certCache . queueRenewalTask ( ctx , oldCert , cfg )
if err != nil {
2022-09-27 02:19:28 +10:00
log . Error ( "queueing renewal task" ,
zap . Strings ( "identifiers" , oldCert . Names ) ,
zap . Error ( err ) )
Major refactor to improve performance, correctness, and extensibility
Breaking changes; thank goodness we're not 1.0 yet 😅 - read on!
This change completely separates ACME-specific code from the rest of the
certificate management process, allowing pluggable sources for certs
that aren't ACME.
Notably, most of Config was spliced into ACMEManager. Similarly, there's
now Default and DefaultACME.
Storage structure had to be reconfigured. Certificates are no longer in
the acme/ subfolder since they can be obtained by ways other than ACME!
Certificates moved to a new certificates/ subfolder. The subfolders in
that folder use the path of the ACME endpoint instead of just the host,
so that also changed. Be aware that unless you move your certs over,
CertMagic will not find them and will attempt to get new ones. That is
usually fine for most users, but for extremely large deployments, you
will want to move them over first.
Old certs path:
acme/acme-staging-v02.api.letsencrypt.org/...
New certs path:
certificates/acme-staging-v02.api.letsencrypt.org-directory/...
That's all for significant storage changes!
But this refactor also vastly improves performance, especially at scale,
and makes CertMagic way more resilient to errors. Retries are done on
the staging endpoint by default, so they won't count against your rate
limit. If your hardware can handle it, I'm now pretty confident that you
can give CertMagic a million domain names and it will gracefully manage
them, as fast as it can within internal and external rate limits, even
in the presence of errors. Errors will of course slow some things down,
but you should be good to go if you're monitoring logs and can fix any
misconfigurations or other external errors!
Several other mostly-minor enhancements fix bugs, especially at scale.
For example, duplicated renewal tasks (that continuously fail) will not
pile up on each other: only one will operate, under exponential backoff.
Closes #50 and fixes #55
2020-02-22 07:32:57 +10:00
continue
}
}
Significant refactoring to improve correctness and flexibility (#39)
* Significant refactor
This refactoring expands the capabilities of the library for advanced
use cases, as well as improving the overall architecture, including
possible memory leak fixes if used over a long period with many certs
loaded into memory. This refactor enables using different configs
depending on the certificate.
The public API has changed slightly, however, and arguably it is
slightly less convenient/elegant. I have never quite found the perfect
design for this package, and this certainly isn't it, but I think it's
better than what we had before.
There is still work to be done, but this is a good step forward. I've
decoupled Storage from Cache, and made it easier and more correct for
Configs (and Storage values) to be short-lived. Cache is the only value
that should be long-lived.
Note that CertMagic no longer automatically takes care of storage (i.e.
it used to delete old OCSP staples, but now it doesn't). The functions
to do this are still there and even exported, and now we expect the
application to call the cleanup functions when it wants to.
* Fix little oopsies
* Create Manager abstraction so obtain/renew isn't limited to ACME
2019-04-21 02:44:55 +10:00
Major refactor to improve performance, correctness, and extensibility
Breaking changes; thank goodness we're not 1.0 yet 😅 - read on!
This change completely separates ACME-specific code from the rest of the
certificate management process, allowing pluggable sources for certs
that aren't ACME.
Notably, most of Config was spliced into ACMEManager. Similarly, there's
now Default and DefaultACME.
Storage structure had to be reconfigured. Certificates are no longer in
the acme/ subfolder since they can be obtained by ways other than ACME!
Certificates moved to a new certificates/ subfolder. The subfolders in
that folder use the path of the ACME endpoint instead of just the host,
so that also changed. Be aware that unless you move your certs over,
CertMagic will not find them and will attempt to get new ones. That is
usually fine for most users, but for extremely large deployments, you
will want to move them over first.
Old certs path:
acme/acme-staging-v02.api.letsencrypt.org/...
New certs path:
certificates/acme-staging-v02.api.letsencrypt.org-directory/...
That's all for significant storage changes!
But this refactor also vastly improves performance, especially at scale,
and makes CertMagic way more resilient to errors. Retries are done on
the staging endpoint by default, so they won't count against your rate
limit. If your hardware can handle it, I'm now pretty confident that you
can give CertMagic a million domain names and it will gracefully manage
them, as fast as it can within internal and external rate limits, even
in the presence of errors. Errors will of course slow some things down,
but you should be good to go if you're monitoring logs and can fix any
misconfigurations or other external errors!
Several other mostly-minor enhancements fix bugs, especially at scale.
For example, duplicated renewal tasks (that continuously fail) will not
pile up on each other: only one will operate, under exponential backoff.
Closes #50 and fixes #55
2020-02-22 07:32:57 +10:00
// Deletion queue
certCache . mu . Lock ( )
for _ , cert := range deleteQueue {
certCache . removeCertificate ( cert )
}
certCache . mu . Unlock ( )
return nil
}
func ( certCache * Cache ) queueRenewalTask ( ctx context . Context , oldCert Certificate , cfg * Config ) error {
2022-09-27 02:19:28 +10:00
log := certCache . logger . Named ( "maintenance" )
2020-07-30 11:38:12 +10:00
2022-08-17 10:08:34 +10:00
timeLeft := expiresAt ( oldCert . Leaf ) . Sub ( time . Now ( ) . UTC ( ) )
2022-09-27 02:19:28 +10:00
log . Info ( "certificate expires soon; queuing for renewal" ,
zap . Strings ( "identifiers" , oldCert . Names ) ,
zap . Duration ( "remaining" , timeLeft ) )
Major refactor to improve performance, correctness, and extensibility
Breaking changes; thank goodness we're not 1.0 yet 😅 - read on!
This change completely separates ACME-specific code from the rest of the
certificate management process, allowing pluggable sources for certs
that aren't ACME.
Notably, most of Config was spliced into ACMEManager. Similarly, there's
now Default and DefaultACME.
Storage structure had to be reconfigured. Certificates are no longer in
the acme/ subfolder since they can be obtained by ways other than ACME!
Certificates moved to a new certificates/ subfolder. The subfolders in
that folder use the path of the ACME endpoint instead of just the host,
so that also changed. Be aware that unless you move your certs over,
CertMagic will not find them and will attempt to get new ones. That is
usually fine for most users, but for extremely large deployments, you
will want to move them over first.
Old certs path:
acme/acme-staging-v02.api.letsencrypt.org/...
New certs path:
certificates/acme-staging-v02.api.letsencrypt.org-directory/...
That's all for significant storage changes!
But this refactor also vastly improves performance, especially at scale,
and makes CertMagic way more resilient to errors. Retries are done on
the staging endpoint by default, so they won't count against your rate
limit. If your hardware can handle it, I'm now pretty confident that you
can give CertMagic a million domain names and it will gracefully manage
them, as fast as it can within internal and external rate limits, even
in the presence of errors. Errors will of course slow some things down,
but you should be good to go if you're monitoring logs and can fix any
misconfigurations or other external errors!
Several other mostly-minor enhancements fix bugs, especially at scale.
For example, duplicated renewal tasks (that continuously fail) will not
pile up on each other: only one will operate, under exponential backoff.
Closes #50 and fixes #55
2020-02-22 07:32:57 +10:00
// Get the name which we should use to renew this certificate;
// we only support managing certificates with one name per cert,
// so this should be easy.
renewName := oldCert . Names [ 0 ]
// queue up this renewal job (is a no-op if already active or queued)
2020-07-30 11:38:12 +10:00
jm . Submit ( cfg . Logger , "renew_" + renewName , func ( ) error {
2022-08-17 10:08:34 +10:00
timeLeft := expiresAt ( oldCert . Leaf ) . Sub ( time . Now ( ) . UTC ( ) )
2022-09-27 02:19:28 +10:00
log . Info ( "attempting certificate renewal" ,
zap . Strings ( "identifiers" , oldCert . Names ) ,
zap . Duration ( "remaining" , timeLeft ) )
2018-12-10 13:15:26 +10:00
2019-09-18 22:00:34 +10:00
// perform renewal - crucially, this happens OUTSIDE a lock on certCache
2021-06-13 05:47:47 +10:00
err := cfg . RenewCertAsync ( ctx , renewName , false )
2018-12-10 13:15:26 +10:00
if err != nil {
Significant refactoring to improve correctness and flexibility (#39)
* Significant refactor
This refactoring expands the capabilities of the library for advanced
use cases, as well as improving the overall architecture, including
possible memory leak fixes if used over a long period with many certs
loaded into memory. This refactor enables using different configs
depending on the certificate.
The public API has changed slightly, however, and arguably it is
slightly less convenient/elegant. I have never quite found the perfect
design for this package, and this certainly isn't it, but I think it's
better than what we had before.
There is still work to be done, but this is a good step forward. I've
decoupled Storage from Cache, and made it easier and more correct for
Configs (and Storage values) to be short-lived. Cache is the only value
that should be long-lived.
Note that CertMagic no longer automatically takes care of storage (i.e.
it used to delete old OCSP staples, but now it doesn't). The functions
to do this are still there and even exported, and now we expect the
application to call the cleanup functions when it wants to.
* Fix little oopsies
* Create Manager abstraction so obtain/renew isn't limited to ACME
2019-04-21 02:44:55 +10:00
if cfg . OnDemand != nil {
2018-12-10 13:15:26 +10:00
// loaded dynamically, remove dynamically
Major refactor to improve performance, correctness, and extensibility
Breaking changes; thank goodness we're not 1.0 yet 😅 - read on!
This change completely separates ACME-specific code from the rest of the
certificate management process, allowing pluggable sources for certs
that aren't ACME.
Notably, most of Config was spliced into ACMEManager. Similarly, there's
now Default and DefaultACME.
Storage structure had to be reconfigured. Certificates are no longer in
the acme/ subfolder since they can be obtained by ways other than ACME!
Certificates moved to a new certificates/ subfolder. The subfolders in
that folder use the path of the ACME endpoint instead of just the host,
so that also changed. Be aware that unless you move your certs over,
CertMagic will not find them and will attempt to get new ones. That is
usually fine for most users, but for extremely large deployments, you
will want to move them over first.
Old certs path:
acme/acme-staging-v02.api.letsencrypt.org/...
New certs path:
certificates/acme-staging-v02.api.letsencrypt.org-directory/...
That's all for significant storage changes!
But this refactor also vastly improves performance, especially at scale,
and makes CertMagic way more resilient to errors. Retries are done on
the staging endpoint by default, so they won't count against your rate
limit. If your hardware can handle it, I'm now pretty confident that you
can give CertMagic a million domain names and it will gracefully manage
them, as fast as it can within internal and external rate limits, even
in the presence of errors. Errors will of course slow some things down,
but you should be good to go if you're monitoring logs and can fix any
misconfigurations or other external errors!
Several other mostly-minor enhancements fix bugs, especially at scale.
For example, duplicated renewal tasks (that continuously fail) will not
pile up on each other: only one will operate, under exponential backoff.
Closes #50 and fixes #55
2020-02-22 07:32:57 +10:00
certCache . mu . Lock ( )
certCache . removeCertificate ( oldCert )
certCache . mu . Unlock ( )
2018-12-10 13:15:26 +10:00
}
Major refactor to improve performance, correctness, and extensibility
Breaking changes; thank goodness we're not 1.0 yet 😅 - read on!
This change completely separates ACME-specific code from the rest of the
certificate management process, allowing pluggable sources for certs
that aren't ACME.
Notably, most of Config was spliced into ACMEManager. Similarly, there's
now Default and DefaultACME.
Storage structure had to be reconfigured. Certificates are no longer in
the acme/ subfolder since they can be obtained by ways other than ACME!
Certificates moved to a new certificates/ subfolder. The subfolders in
that folder use the path of the ACME endpoint instead of just the host,
so that also changed. Be aware that unless you move your certs over,
CertMagic will not find them and will attempt to get new ones. That is
usually fine for most users, but for extremely large deployments, you
will want to move them over first.
Old certs path:
acme/acme-staging-v02.api.letsencrypt.org/...
New certs path:
certificates/acme-staging-v02.api.letsencrypt.org-directory/...
That's all for significant storage changes!
But this refactor also vastly improves performance, especially at scale,
and makes CertMagic way more resilient to errors. Retries are done on
the staging endpoint by default, so they won't count against your rate
limit. If your hardware can handle it, I'm now pretty confident that you
can give CertMagic a million domain names and it will gracefully manage
them, as fast as it can within internal and external rate limits, even
in the presence of errors. Errors will of course slow some things down,
but you should be good to go if you're monitoring logs and can fix any
misconfigurations or other external errors!
Several other mostly-minor enhancements fix bugs, especially at scale.
For example, duplicated renewal tasks (that continuously fail) will not
pile up on each other: only one will operate, under exponential backoff.
Closes #50 and fixes #55
2020-02-22 07:32:57 +10:00
return fmt . Errorf ( "%v %v" , oldCert . Names , err )
2018-12-10 13:15:26 +10:00
}
// successful renewal, so update in-memory cache by loading
// renewed certificate so it will be used with handshakes
2022-03-08 05:26:52 +10:00
_ , err = cfg . reloadManagedCertificate ( ctx , oldCert )
2018-12-10 13:15:26 +10:00
if err != nil {
Major refactor to improve performance, correctness, and extensibility
Breaking changes; thank goodness we're not 1.0 yet 😅 - read on!
This change completely separates ACME-specific code from the rest of the
certificate management process, allowing pluggable sources for certs
that aren't ACME.
Notably, most of Config was spliced into ACMEManager. Similarly, there's
now Default and DefaultACME.
Storage structure had to be reconfigured. Certificates are no longer in
the acme/ subfolder since they can be obtained by ways other than ACME!
Certificates moved to a new certificates/ subfolder. The subfolders in
that folder use the path of the ACME endpoint instead of just the host,
so that also changed. Be aware that unless you move your certs over,
CertMagic will not find them and will attempt to get new ones. That is
usually fine for most users, but for extremely large deployments, you
will want to move them over first.
Old certs path:
acme/acme-staging-v02.api.letsencrypt.org/...
New certs path:
certificates/acme-staging-v02.api.letsencrypt.org-directory/...
That's all for significant storage changes!
But this refactor also vastly improves performance, especially at scale,
and makes CertMagic way more resilient to errors. Retries are done on
the staging endpoint by default, so they won't count against your rate
limit. If your hardware can handle it, I'm now pretty confident that you
can give CertMagic a million domain names and it will gracefully manage
them, as fast as it can within internal and external rate limits, even
in the presence of errors. Errors will of course slow some things down,
but you should be good to go if you're monitoring logs and can fix any
misconfigurations or other external errors!
Several other mostly-minor enhancements fix bugs, especially at scale.
For example, duplicated renewal tasks (that continuously fail) will not
pile up on each other: only one will operate, under exponential backoff.
Closes #50 and fixes #55
2020-02-22 07:32:57 +10:00
return ErrNoRetry { fmt . Errorf ( "%v %v" , oldCert . Names , err ) }
2018-12-10 13:15:26 +10:00
}
Major refactor to improve performance, correctness, and extensibility
Breaking changes; thank goodness we're not 1.0 yet 😅 - read on!
This change completely separates ACME-specific code from the rest of the
certificate management process, allowing pluggable sources for certs
that aren't ACME.
Notably, most of Config was spliced into ACMEManager. Similarly, there's
now Default and DefaultACME.
Storage structure had to be reconfigured. Certificates are no longer in
the acme/ subfolder since they can be obtained by ways other than ACME!
Certificates moved to a new certificates/ subfolder. The subfolders in
that folder use the path of the ACME endpoint instead of just the host,
so that also changed. Be aware that unless you move your certs over,
CertMagic will not find them and will attempt to get new ones. That is
usually fine for most users, but for extremely large deployments, you
will want to move them over first.
Old certs path:
acme/acme-staging-v02.api.letsencrypt.org/...
New certs path:
certificates/acme-staging-v02.api.letsencrypt.org-directory/...
That's all for significant storage changes!
But this refactor also vastly improves performance, especially at scale,
and makes CertMagic way more resilient to errors. Retries are done on
the staging endpoint by default, so they won't count against your rate
limit. If your hardware can handle it, I'm now pretty confident that you
can give CertMagic a million domain names and it will gracefully manage
them, as fast as it can within internal and external rate limits, even
in the presence of errors. Errors will of course slow some things down,
but you should be good to go if you're monitoring logs and can fix any
misconfigurations or other external errors!
Several other mostly-minor enhancements fix bugs, especially at scale.
For example, duplicated renewal tasks (that continuously fail) will not
pile up on each other: only one will operate, under exponential backoff.
Closes #50 and fixes #55
2020-02-22 07:32:57 +10:00
return nil
} )
2018-12-10 13:15:26 +10:00
return nil
}
2018-12-10 17:26:09 +10:00
// updateOCSPStaples updates the OCSP stapling in all
2018-12-10 13:15:26 +10:00
// eligible, cached certificates.
//
// OCSP maintenance strives to abide the relevant points on
// Ryan Sleevi's recommendations for good OCSP support:
// https://gist.github.com/sleevi/5efe9ef98961ecfb4da8
Update rate limiter to allow cancellation; add context to arguments
The previous rate limiter design did not allow reservation cancellation.
This became problematic with lots of config reloads in Caddy for large
numbers of domain names. While the rate limiter had a backlog, a new
config would come in and add even more to the rate limiter, and even
more over time as background maintenance (renewals) kicked in. This
leaked goroutines and memory as a side-effect, and blocked the issuance
of certificates nigh indefinitely.
The new rate limiter does not make future reservations like the previous
one did. However, this requires us to run a single scheduler goroutine
when a rate limiter is created, which requires being cleaned up when the
rate limiter is no longer needed. As rate limits are global and should
live up to the life of the process, there is currently no actual cleanup
that takes place, but if it did happen, one would simply call Stop() on
the rate limiter to stop that goroutine.
With this new design, reservations are made only as the event actually
happens; implementing cancellation with the old design would have been
almost impossible to do correctly in a practical, elegant way. Although
the trade-off is an extra goroutine that needs cleaning up, this is
seldom (if ever?) needed in practice, and the benefit is that waiting
goroutines can be unblocked when their context is canceled. This allows
Caddy, for example, to reload configs often and cancel any goroutines
that were merely waiting on the rate limiter.
Now, all Obtain, Renew, and Revoke calls accept a context that can be
cancelled.
We also eliminate the acmeMu, a mutex that permitted only a single ACME
operation at a time by the process, which was our early, naive form of
rate limiting, which should no longer be necessary.
On-demand obtain and renew do not yet use cancelable contexts, because
what defines the context of a TLS handshake is still unclear. We might
end up using a simple context with a timeout that is the maximum length
of a TLS handshake in practice, say, 1 minute.
This is a breaking change, but critical for larger deployments with very
dynamic configurations.
2019-12-17 06:36:41 +10:00
func ( certCache * Cache ) updateOCSPStaples ( ctx context . Context ) {
2022-09-27 02:19:28 +10:00
logger := certCache . logger . Named ( "maintenance" )
2020-07-30 11:38:12 +10:00
2020-01-11 04:18:37 +10:00
// temporary structures to store updates or tasks
// so that we can keep our locks short-lived
2018-12-10 13:15:26 +10:00
type ocspUpdate struct {
rawBytes [ ] byte
parsed * ocsp . Response
}
2020-01-11 04:18:37 +10:00
type updateQueueEntry struct {
cert Certificate
certHash string
lastNextUpdate time . Time
2022-02-02 06:24:11 +10:00
cfg * Config
2020-01-11 04:18:37 +10:00
}
2021-06-13 05:47:47 +10:00
type renewQueueEntry struct {
2022-02-02 02:05:24 +10:00
oldCert Certificate
2022-02-02 06:24:11 +10:00
cfg * Config
2021-06-13 05:47:47 +10:00
}
2018-12-10 13:15:26 +10:00
updated := make ( map [ string ] ocspUpdate )
2021-06-13 05:47:47 +10:00
var updateQueue [ ] updateQueueEntry // certs that need a refreshed staple
var renewQueue [ ] renewQueueEntry // certs that need to be renewed (due to revocation)
2018-12-10 13:15:26 +10:00
2020-01-11 04:18:37 +10:00
// obtain brief read lock during our scan to see which staples need updating
2018-12-10 13:15:26 +10:00
certCache . mu . RLock ( )
for certHash , cert := range certCache . cache {
2020-08-18 04:14:46 +10:00
// no point in updating OCSP for expired or "synthetic" certificates
if cert . Leaf == nil || cert . Expired ( ) {
2018-12-10 13:15:26 +10:00
continue
}
2022-02-02 06:24:11 +10:00
cfg , err := certCache . getConfig ( cert )
if err != nil {
2022-09-27 02:19:28 +10:00
logger . Error ( "unable to get automation config for certificate; maintenance for this certificate will likely fail" ,
zap . Strings ( "identifiers" , cert . Names ) ,
zap . Error ( err ) )
2022-02-02 06:24:11 +10:00
continue
}
// always try to replace revoked certificates, even if OCSP response is still fresh
2022-02-02 04:04:25 +10:00
if certShouldBeForceRenewed ( cert ) {
renewQueue = append ( renewQueue , renewQueueEntry {
oldCert : cert ,
2022-02-02 06:24:11 +10:00
cfg : cfg ,
2022-02-02 04:04:25 +10:00
} )
continue
}
// if the status is not fresh, get a new one
2018-12-10 13:15:26 +10:00
var lastNextUpdate time . Time
2019-06-25 03:51:58 +10:00
if cert . ocsp != nil {
lastNextUpdate = cert . ocsp . NextUpdate
2022-02-02 02:05:24 +10:00
if cert . ocsp . Status != ocsp . Unknown && freshOCSP ( cert . ocsp ) {
2022-02-02 06:24:11 +10:00
// no need to update our staple if still fresh and not Unknown
continue
2018-12-10 13:15:26 +10:00
}
}
2022-02-02 06:24:11 +10:00
updateQueue = append ( updateQueue , updateQueueEntry { cert , certHash , lastNextUpdate , cfg } )
2020-01-11 04:18:37 +10:00
}
certCache . mu . RUnlock ( )
// perform updates outside of any lock on certCache
for _ , qe := range updateQueue {
cert := qe . cert
certHash := qe . certHash
lastNextUpdate := qe . lastNextUpdate
2018-12-10 13:15:26 +10:00
2022-02-02 06:24:11 +10:00
if qe . cfg == nil {
Significant refactoring to improve correctness and flexibility (#39)
* Significant refactor
This refactoring expands the capabilities of the library for advanced
use cases, as well as improving the overall architecture, including
possible memory leak fixes if used over a long period with many certs
loaded into memory. This refactor enables using different configs
depending on the certificate.
The public API has changed slightly, however, and arguably it is
slightly less convenient/elegant. I have never quite found the perfect
design for this package, and this certainly isn't it, but I think it's
better than what we had before.
There is still work to be done, but this is a good step forward. I've
decoupled Storage from Cache, and made it easier and more correct for
Configs (and Storage values) to be short-lived. Cache is the only value
that should be long-lived.
Note that CertMagic no longer automatically takes care of storage (i.e.
it used to delete old OCSP staples, but now it doesn't). The functions
to do this are still there and even exported, and now we expect the
application to call the cleanup functions when it wants to.
* Fix little oopsies
* Create Manager abstraction so obtain/renew isn't limited to ACME
2019-04-21 02:44:55 +10:00
// this is bad if this happens, probably a programmer error (oops)
2022-09-27 02:19:28 +10:00
logger . Error ( "no configuration associated with certificate; unable to manage OCSP staples" ,
zap . Strings ( "identifiers" , cert . Names ) )
Significant refactoring to improve correctness and flexibility (#39)
* Significant refactor
This refactoring expands the capabilities of the library for advanced
use cases, as well as improving the overall architecture, including
possible memory leak fixes if used over a long period with many certs
loaded into memory. This refactor enables using different configs
depending on the certificate.
The public API has changed slightly, however, and arguably it is
slightly less convenient/elegant. I have never quite found the perfect
design for this package, and this certainly isn't it, but I think it's
better than what we had before.
There is still work to be done, but this is a good step forward. I've
decoupled Storage from Cache, and made it easier and more correct for
Configs (and Storage values) to be short-lived. Cache is the only value
that should be long-lived.
Note that CertMagic no longer automatically takes care of storage (i.e.
it used to delete old OCSP staples, but now it doesn't). The functions
to do this are still there and even exported, and now we expect the
application to call the cleanup functions when it wants to.
* Fix little oopsies
* Create Manager abstraction so obtain/renew isn't limited to ACME
2019-04-21 02:44:55 +10:00
continue
}
2022-03-08 05:26:52 +10:00
err := stapleOCSP ( ctx , qe . cfg . OCSP , qe . cfg . Storage , & cert , nil )
2022-02-02 02:05:24 +10:00
if err != nil {
2019-06-25 03:51:58 +10:00
if cert . ocsp != nil {
2018-12-10 13:15:26 +10:00
// if there was no staple before, that's fine; otherwise we should log the error
2022-09-27 02:19:28 +10:00
logger . Error ( "stapling OCSP" ,
zap . Strings ( "identifiers" , cert . Names ) ,
zap . Error ( err ) )
2018-12-10 13:15:26 +10:00
}
continue
}
// By this point, we've obtained the latest OCSP response.
// If there was no staple before, or if the response is updated, make
2022-02-02 02:05:24 +10:00
// sure we apply the update to all names on the certificate if
// the status is still Good.
if cert . ocsp != nil && cert . ocsp . Status == ocsp . Good && ( lastNextUpdate . IsZero ( ) || lastNextUpdate != cert . ocsp . NextUpdate ) {
2022-09-27 02:19:28 +10:00
logger . Info ( "advancing OCSP staple" ,
zap . Strings ( "identifiers" , cert . Names ) ,
zap . Time ( "from" , lastNextUpdate ) ,
zap . Time ( "to" , cert . ocsp . NextUpdate ) )
2019-06-25 03:51:58 +10:00
updated [ certHash ] = ocspUpdate { rawBytes : cert . Certificate . OCSPStaple , parsed : cert . ocsp }
2018-12-10 13:15:26 +10:00
}
2019-09-18 22:00:34 +10:00
2022-02-02 06:24:11 +10:00
// If the updated staple shows that the certificate was revoked, we should immediately renew it
2022-02-02 02:05:24 +10:00
if certShouldBeForceRenewed ( cert ) {
2022-09-01 03:13:09 +10:00
qe . cfg . emit ( ctx , "cert_ocsp_revoked" , map [ string ] any {
"subjects" : cert . Names ,
"certificate" : cert ,
"reason" : cert . ocsp . RevocationReason ,
"revoked_at" : cert . ocsp . RevokedAt ,
} )
2021-06-13 05:47:47 +10:00
renewQueue = append ( renewQueue , renewQueueEntry {
2022-02-02 02:05:24 +10:00
oldCert : cert ,
2022-02-02 06:24:11 +10:00
cfg : qe . cfg ,
2021-06-13 05:47:47 +10:00
} )
2019-09-18 22:00:34 +10:00
}
2018-12-10 13:15:26 +10:00
}
// These write locks should be brief since we have all the info we need now.
for certKey , update := range updated {
certCache . mu . Lock ( )
2022-03-23 07:54:52 +10:00
if cert , ok := certCache . cache [ certKey ] ; ok {
cert . ocsp = update . parsed
cert . Certificate . OCSPStaple = update . rawBytes
certCache . cache [ certKey ] = cert
}
2018-12-10 13:15:26 +10:00
certCache . mu . Unlock ( )
}
2019-09-18 22:00:34 +10:00
// We attempt to replace any certificates that were revoked.
// Crucially, this happens OUTSIDE a lock on the certCache.
2021-06-13 05:47:47 +10:00
for _ , renew := range renewQueue {
2022-02-02 06:24:11 +10:00
_ , err := renew . cfg . forceRenew ( ctx , logger , renew . oldCert )
2022-09-27 02:19:28 +10:00
if err != nil {
2022-02-02 02:05:24 +10:00
logger . Info ( "forcefully renewing certificate due to REVOKED status" ,
zap . Strings ( "identifiers" , renew . oldCert . Names ) ,
zap . Error ( err ) )
}
2019-09-18 22:00:34 +10:00
}
2018-12-10 13:15:26 +10:00
}
2024-05-08 01:46:03 +10:00
// storageHasNewerARI returns true if the configured storage has ARI that is newer
// than that of a certificate that is already loaded, along with the value from
// storage.
func ( cfg * Config ) storageHasNewerARI ( ctx context . Context , cert Certificate ) ( bool , acme . RenewalInfo , error ) {
storedCertData , err := cfg . loadStoredACMECertificateMetadata ( ctx , cert )
if err != nil || storedCertData . RenewalInfo == nil {
return false , acme . RenewalInfo { } , err
}
// prefer stored info if it has a window and the loaded one doesn't,
// or if the one in storage has a later RetryAfter (though I suppose
// it's not guaranteed, typically those will move forward in time)
if ( ! cert . ari . HasWindow ( ) && storedCertData . RenewalInfo . HasWindow ( ) ) ||
2024-09-06 02:53:29 +10:00
( cert . ari . RetryAfter == nil || storedCertData . RenewalInfo . RetryAfter . After ( * cert . ari . RetryAfter ) ) {
2024-05-08 01:46:03 +10:00
return true , * storedCertData . RenewalInfo , nil
}
return false , acme . RenewalInfo { } , nil
}
// loadStoredACMECertificateMetadata loads the stored ACME certificate data
// from the cert's sidecar JSON file.
func ( cfg * Config ) loadStoredACMECertificateMetadata ( ctx context . Context , cert Certificate ) ( acme . Certificate , error ) {
metaBytes , err := cfg . Storage . Load ( ctx , StorageKeys . SiteMeta ( cert . issuerKey , cert . Names [ 0 ] ) )
if err != nil {
return acme . Certificate { } , fmt . Errorf ( "loading cert metadata: %w" , err )
}
var certRes CertificateResource
if err = json . Unmarshal ( metaBytes , & certRes ) ; err != nil {
return acme . Certificate { } , fmt . Errorf ( "unmarshaling cert metadata: %w" , err )
}
var acmeCert acme . Certificate
if err = json . Unmarshal ( certRes . IssuerData , & acmeCert ) ; err != nil {
return acme . Certificate { } , fmt . Errorf ( "unmarshaling potential ACME issuer metadata: %v" , err )
}
return acmeCert , nil
}
// updateARI updates the cert's ACME renewal info, first by checking storage for a newer
// one, or getting it from the CA if needed. The updated info is stored in storage and
// updated in the cache. The certificate with the updated ARI is returned. If true is
// returned, the ARI window or selected time has changed, and the caller should check if
// the cert needs to be renewed now, even if there is an error.
2024-06-29 02:33:21 +10:00
//
// This will always try to ARI without checking if it needs to be refreshed. Call
// NeedsRefresh() on the RenewalInfo first, and only call this if that returns true.
2024-05-08 01:46:03 +10:00
func ( cfg * Config ) updateARI ( ctx context . Context , cert Certificate , logger * zap . Logger ) ( updatedCert Certificate , changed bool , err error ) {
logger = logger . With (
zap . Strings ( "identifiers" , cert . Names ) ,
zap . String ( "cert_hash" , cert . hash ) ,
zap . String ( "ari_unique_id" , cert . ari . UniqueIdentifier ) ,
zap . Time ( "cert_expiry" , cert . Leaf . NotAfter ) )
updatedCert = cert
oldARI := cert . ari
2024-06-29 02:33:21 +10:00
// synchronize ARI fetching; see #297
lockName := "ari_" + cert . ari . UniqueIdentifier
if err := acquireLock ( ctx , cfg . Storage , lockName ) ; err != nil {
return cert , false , fmt . Errorf ( "unable to obtain ARI lock: %v" , err )
}
defer func ( ) {
if err := releaseLock ( ctx , cfg . Storage , lockName ) ; err != nil {
logger . Error ( "unable to release ARI lock" , zap . Error ( err ) )
}
} ( )
2024-05-08 01:46:03 +10:00
// see if the stored value has been refreshed already by another instance
gotNewARI , newARI , err := cfg . storageHasNewerARI ( ctx , cert )
// when we're all done, log if something about the schedule is different
// ("WARN" level because ARI window changing may be a sign of external trouble
// and we want to draw their attention to a potential explanation URL)
defer func ( ) {
changed = ! newARI . SameWindow ( oldARI )
if changed {
logger . Warn ( "ARI window or selected renewal time changed" ,
zap . Time ( "prev_start" , oldARI . SuggestedWindow . Start ) ,
zap . Time ( "next_start" , newARI . SuggestedWindow . Start ) ,
zap . Time ( "prev_end" , oldARI . SuggestedWindow . End ) ,
zap . Time ( "next_end" , newARI . SuggestedWindow . End ) ,
zap . Time ( "prev_selected_time" , oldARI . SelectedTime ) ,
zap . Time ( "next_selected_time" , newARI . SelectedTime ) ,
zap . String ( "explanation_url" , newARI . ExplanationURL ) )
}
} ( )
if err == nil && gotNewARI {
// great, storage has a newer one we can use
cfg . certCache . mu . Lock ( )
updatedCert = cfg . certCache . cache [ cert . hash ]
updatedCert . ari = newARI
cfg . certCache . cache [ cert . hash ] = updatedCert
cfg . certCache . mu . Unlock ( )
logger . Info ( "reloaded ARI with newer one in storage" ,
zap . Timep ( "next_refresh" , newARI . RetryAfter ) ,
zap . Time ( "renewal_time" , newARI . SelectedTime ) )
return
}
if err != nil {
logger . Error ( "error while checking storage for updated ARI; updating ARI now" , zap . Error ( err ) )
}
// of the issuers configured, hopefully one of them is the ACME CA we got the cert from
for _ , iss := range cfg . Issuers {
2024-06-02 09:59:39 +10:00
if ariGetter , ok := iss . ( RenewalInfoGetter ) ; ok {
newARI , err = ariGetter . GetRenewalInfo ( ctx , cert ) // be sure to use existing newARI variable so we can compare against old value in the defer
2024-05-08 01:46:03 +10:00
if err != nil {
// could be anything, but a common error might simply be the "wrong" ACME CA
// (meaning, different from the one that issued the cert, thus the only one
// that would have any ARI for it) if multiple ACME CAs are configured
logger . Error ( "failed updating renewal info from ACME CA" ,
zap . String ( "issuer" , iss . IssuerKey ( ) ) ,
zap . Error ( err ) )
continue
}
// when we get the latest ARI, the acme package will select a time within the window
// for us; of course, since it's random, it's likely different from the previously-
// selected time; but if the window doesn't change, there's no need to change the
// selected time (the acme package doesn't know the previous window to know better)
// ... so if the window hasn't changed we'll just put back the selected time
if newARI . SameWindow ( oldARI ) && ! oldARI . SelectedTime . IsZero ( ) {
newARI . SelectedTime = oldARI . SelectedTime
}
// then store the updated ARI (even if the window didn't change, the Retry-After
// likely did) in cache and storage
// be sure we get the cert from the cache while inside a lock to avoid logical races
cfg . certCache . mu . Lock ( )
updatedCert = cfg . certCache . cache [ cert . hash ]
updatedCert . ari = newARI
cfg . certCache . cache [ cert . hash ] = updatedCert
cfg . certCache . mu . Unlock ( )
// update the ARI value in storage
var certData acme . Certificate
certData , err = cfg . loadStoredACMECertificateMetadata ( ctx , cert )
if err != nil {
err = fmt . Errorf ( "got new ARI from %s, but failed loading stored certificate metadata: %v" , iss . IssuerKey ( ) , err )
return
}
certData . RenewalInfo = & newARI
var certDataBytes , certResBytes [ ] byte
certDataBytes , err = json . Marshal ( certData )
if err != nil {
err = fmt . Errorf ( "got new ARI from %s, but failed marshaling certificate ACME metadata: %v" , iss . IssuerKey ( ) , err )
return
}
certResBytes , err = json . MarshalIndent ( CertificateResource {
SANs : cert . Names ,
IssuerData : certDataBytes ,
} , "" , "\t" )
if err != nil {
err = fmt . Errorf ( "got new ARI from %s, but could not re-encode certificate metadata: %v" , iss . IssuerKey ( ) , err )
return
}
if err = cfg . Storage . Store ( ctx , StorageKeys . SiteMeta ( cert . issuerKey , cert . Names [ 0 ] ) , certResBytes ) ; err != nil {
err = fmt . Errorf ( "got new ARI from %s, but could not store it with certificate metadata: %v" , iss . IssuerKey ( ) , err )
return
}
logger . Info ( "updated ACME renewal information" ,
zap . Time ( "selected_time" , newARI . SelectedTime ) ,
zap . Timep ( "next_update" , newARI . RetryAfter ) ,
zap . String ( "explanation_url" , newARI . ExplanationURL ) )
return
}
}
2024-06-02 09:59:39 +10:00
err = fmt . Errorf ( "could not fully update ACME renewal info: either no issuer supporting ARI is configured for certificate, or all such failed (make sure the ACME CA that issued the certificate is configured)" )
2024-05-08 01:46:03 +10:00
return
}
Significant refactoring to improve correctness and flexibility (#39)
* Significant refactor
This refactoring expands the capabilities of the library for advanced
use cases, as well as improving the overall architecture, including
possible memory leak fixes if used over a long period with many certs
loaded into memory. This refactor enables using different configs
depending on the certificate.
The public API has changed slightly, however, and arguably it is
slightly less convenient/elegant. I have never quite found the perfect
design for this package, and this certainly isn't it, but I think it's
better than what we had before.
There is still work to be done, but this is a good step forward. I've
decoupled Storage from Cache, and made it easier and more correct for
Configs (and Storage values) to be short-lived. Cache is the only value
that should be long-lived.
Note that CertMagic no longer automatically takes care of storage (i.e.
it used to delete old OCSP staples, but now it doesn't). The functions
to do this are still there and even exported, and now we expect the
application to call the cleanup functions when it wants to.
* Fix little oopsies
* Create Manager abstraction so obtain/renew isn't limited to ACME
2019-04-21 02:44:55 +10:00
// CleanStorageOptions specifies how to clean up a storage unit.
type CleanStorageOptions struct {
2023-11-14 13:59:51 +10:00
// Optional custom logger.
Logger * zap . Logger
// Optional ID of the instance initiating the cleaning.
InstanceID string
// If set, cleaning will be skipped if it was performed
// more recently than this interval.
Interval time . Duration
// Whether to clean cached OCSP staples.
OCSPStaples bool
// Whether to cleanup expired certificates, and if so,
// how long to let them stay after they've expired.
2019-09-18 07:54:01 +10:00
ExpiredCerts bool
ExpiredCertGracePeriod time . Duration
Significant refactoring to improve correctness and flexibility (#39)
* Significant refactor
This refactoring expands the capabilities of the library for advanced
use cases, as well as improving the overall architecture, including
possible memory leak fixes if used over a long period with many certs
loaded into memory. This refactor enables using different configs
depending on the certificate.
The public API has changed slightly, however, and arguably it is
slightly less convenient/elegant. I have never quite found the perfect
design for this package, and this certainly isn't it, but I think it's
better than what we had before.
There is still work to be done, but this is a good step forward. I've
decoupled Storage from Cache, and made it easier and more correct for
Configs (and Storage values) to be short-lived. Cache is the only value
that should be long-lived.
Note that CertMagic no longer automatically takes care of storage (i.e.
it used to delete old OCSP staples, but now it doesn't). The functions
to do this are still there and even exported, and now we expect the
application to call the cleanup functions when it wants to.
* Fix little oopsies
* Create Manager abstraction so obtain/renew isn't limited to ACME
2019-04-21 02:44:55 +10:00
}
2019-09-18 07:54:01 +10:00
// CleanStorage removes assets which are no longer useful,
// according to opts.
2023-11-14 13:59:51 +10:00
func CleanStorage ( ctx context . Context , storage Storage , opts CleanStorageOptions ) error {
const (
lockName = "storage_clean"
storageKey = "last_clean.json"
)
if opts . Logger == nil {
opts . Logger = defaultLogger . Named ( "clean_storage" )
}
opts . Logger = opts . Logger . With ( zap . Any ( "storage" , storage ) )
// storage cleaning should be globally exclusive
2024-06-29 02:33:21 +10:00
if err := acquireLock ( ctx , storage , lockName ) ; err != nil {
2023-11-14 13:59:51 +10:00
return fmt . Errorf ( "unable to acquire %s lock: %v" , lockName , err )
}
defer func ( ) {
2024-06-29 02:33:21 +10:00
if err := releaseLock ( ctx , storage , lockName ) ; err != nil {
2023-11-14 13:59:51 +10:00
opts . Logger . Error ( "unable to release lock" , zap . Error ( err ) )
return
}
} ( )
// cleaning should not happen more often than the interval
if opts . Interval > 0 {
lastCleanBytes , err := storage . Load ( ctx , storageKey )
2023-11-15 00:45:43 +10:00
if ! errors . Is ( err , fs . ErrNotExist ) {
if err != nil {
return fmt . Errorf ( "loading last clean timestamp: %v" , err )
}
var lastClean lastCleanPayload
err = json . Unmarshal ( lastCleanBytes , & lastClean )
if err != nil {
return fmt . Errorf ( "decoding last clean data: %v" , err )
}
lastTLSClean := lastClean [ "tls" ]
if time . Since ( lastTLSClean . Timestamp ) < opts . Interval {
nextTime := time . Now ( ) . Add ( opts . Interval )
2024-02-20 09:48:18 +10:00
opts . Logger . Info ( "storage cleaning happened too recently; skipping for now" ,
2023-11-15 00:45:43 +10:00
zap . String ( "instance" , lastTLSClean . InstanceID ) ,
zap . Time ( "try_again" , nextTime ) ,
zap . Duration ( "try_again_in" , time . Until ( nextTime ) ) ,
)
return nil
}
2023-11-14 13:59:51 +10:00
}
}
opts . Logger . Info ( "cleaning storage unit" )
Significant refactoring to improve correctness and flexibility (#39)
* Significant refactor
This refactoring expands the capabilities of the library for advanced
use cases, as well as improving the overall architecture, including
possible memory leak fixes if used over a long period with many certs
loaded into memory. This refactor enables using different configs
depending on the certificate.
The public API has changed slightly, however, and arguably it is
slightly less convenient/elegant. I have never quite found the perfect
design for this package, and this certainly isn't it, but I think it's
better than what we had before.
There is still work to be done, but this is a good step forward. I've
decoupled Storage from Cache, and made it easier and more correct for
Configs (and Storage values) to be short-lived. Cache is the only value
that should be long-lived.
Note that CertMagic no longer automatically takes care of storage (i.e.
it used to delete old OCSP staples, but now it doesn't). The functions
to do this are still there and even exported, and now we expect the
application to call the cleanup functions when it wants to.
* Fix little oopsies
* Create Manager abstraction so obtain/renew isn't limited to ACME
2019-04-21 02:44:55 +10:00
if opts . OCSPStaples {
2023-11-14 13:59:51 +10:00
err := deleteOldOCSPStaples ( ctx , storage , opts . Logger )
Significant refactoring to improve correctness and flexibility (#39)
* Significant refactor
This refactoring expands the capabilities of the library for advanced
use cases, as well as improving the overall architecture, including
possible memory leak fixes if used over a long period with many certs
loaded into memory. This refactor enables using different configs
depending on the certificate.
The public API has changed slightly, however, and arguably it is
slightly less convenient/elegant. I have never quite found the perfect
design for this package, and this certainly isn't it, but I think it's
better than what we had before.
There is still work to be done, but this is a good step forward. I've
decoupled Storage from Cache, and made it easier and more correct for
Configs (and Storage values) to be short-lived. Cache is the only value
that should be long-lived.
Note that CertMagic no longer automatically takes care of storage (i.e.
it used to delete old OCSP staples, but now it doesn't). The functions
to do this are still there and even exported, and now we expect the
application to call the cleanup functions when it wants to.
* Fix little oopsies
* Create Manager abstraction so obtain/renew isn't limited to ACME
2019-04-21 02:44:55 +10:00
if err != nil {
2023-11-14 13:59:51 +10:00
opts . Logger . Error ( "deleting old OCSP staples" , zap . Error ( err ) )
Significant refactoring to improve correctness and flexibility (#39)
* Significant refactor
This refactoring expands the capabilities of the library for advanced
use cases, as well as improving the overall architecture, including
possible memory leak fixes if used over a long period with many certs
loaded into memory. This refactor enables using different configs
depending on the certificate.
The public API has changed slightly, however, and arguably it is
slightly less convenient/elegant. I have never quite found the perfect
design for this package, and this certainly isn't it, but I think it's
better than what we had before.
There is still work to be done, but this is a good step forward. I've
decoupled Storage from Cache, and made it easier and more correct for
Configs (and Storage values) to be short-lived. Cache is the only value
that should be long-lived.
Note that CertMagic no longer automatically takes care of storage (i.e.
it used to delete old OCSP staples, but now it doesn't). The functions
to do this are still there and even exported, and now we expect the
application to call the cleanup functions when it wants to.
* Fix little oopsies
* Create Manager abstraction so obtain/renew isn't limited to ACME
2019-04-21 02:44:55 +10:00
}
}
2019-09-18 07:54:01 +10:00
if opts . ExpiredCerts {
2023-11-14 13:59:51 +10:00
err := deleteExpiredCerts ( ctx , storage , opts . Logger , opts . ExpiredCertGracePeriod )
2019-09-18 07:54:01 +10:00
if err != nil {
2023-11-14 13:59:51 +10:00
opts . Logger . Error ( "deleting expired certificates staples" , zap . Error ( err ) )
2019-09-18 07:54:01 +10:00
}
}
2020-03-13 08:02:48 +10:00
// TODO: delete stale locks?
2023-11-14 13:59:51 +10:00
// update the last-clean time
lastCleanBytes , err := json . Marshal ( lastCleanPayload {
"tls" : lastCleaned {
Timestamp : time . Now ( ) ,
InstanceID : opts . InstanceID ,
} ,
} )
if err != nil {
return fmt . Errorf ( "encoding last cleaned info: %v" , err )
}
if err := storage . Store ( ctx , storageKey , lastCleanBytes ) ; err != nil {
return fmt . Errorf ( "storing last clean info: %v" , err )
}
return nil
}
type lastCleanPayload map [ string ] lastCleaned
type lastCleaned struct {
Timestamp time . Time ` json:"timestamp" `
InstanceID string ` json:"instance_id,omitempty" `
Significant refactoring to improve correctness and flexibility (#39)
* Significant refactor
This refactoring expands the capabilities of the library for advanced
use cases, as well as improving the overall architecture, including
possible memory leak fixes if used over a long period with many certs
loaded into memory. This refactor enables using different configs
depending on the certificate.
The public API has changed slightly, however, and arguably it is
slightly less convenient/elegant. I have never quite found the perfect
design for this package, and this certainly isn't it, but I think it's
better than what we had before.
There is still work to be done, but this is a good step forward. I've
decoupled Storage from Cache, and made it easier and more correct for
Configs (and Storage values) to be short-lived. Cache is the only value
that should be long-lived.
Note that CertMagic no longer automatically takes care of storage (i.e.
it used to delete old OCSP staples, but now it doesn't). The functions
to do this are still there and even exported, and now we expect the
application to call the cleanup functions when it wants to.
* Fix little oopsies
* Create Manager abstraction so obtain/renew isn't limited to ACME
2019-04-21 02:44:55 +10:00
}
2023-11-14 13:59:51 +10:00
func deleteOldOCSPStaples ( ctx context . Context , storage Storage , logger * zap . Logger ) error {
2022-03-08 05:26:52 +10:00
ocspKeys , err := storage . List ( ctx , prefixOCSP , false )
2018-12-10 13:15:26 +10:00
if err != nil {
// maybe just hasn't been created yet; no big deal
Significant refactoring to improve correctness and flexibility (#39)
* Significant refactor
This refactoring expands the capabilities of the library for advanced
use cases, as well as improving the overall architecture, including
possible memory leak fixes if used over a long period with many certs
loaded into memory. This refactor enables using different configs
depending on the certificate.
The public API has changed slightly, however, and arguably it is
slightly less convenient/elegant. I have never quite found the perfect
design for this package, and this certainly isn't it, but I think it's
better than what we had before.
There is still work to be done, but this is a good step forward. I've
decoupled Storage from Cache, and made it easier and more correct for
Configs (and Storage values) to be short-lived. Cache is the only value
that should be long-lived.
Note that CertMagic no longer automatically takes care of storage (i.e.
it used to delete old OCSP staples, but now it doesn't). The functions
to do this are still there and even exported, and now we expect the
application to call the cleanup functions when it wants to.
* Fix little oopsies
* Create Manager abstraction so obtain/renew isn't limited to ACME
2019-04-21 02:44:55 +10:00
return nil
2018-12-10 13:15:26 +10:00
}
for _ , key := range ocspKeys {
2020-07-28 08:50:41 +10:00
// if context was cancelled, quit early; otherwise proceed
select {
case <- ctx . Done ( ) :
return ctx . Err ( )
default :
}
2022-03-08 05:26:52 +10:00
ocspBytes , err := storage . Load ( ctx , key )
2018-12-10 13:15:26 +10:00
if err != nil {
2023-11-14 13:59:51 +10:00
logger . Error ( "while deleting old OCSP staples, unable to load staple file" , zap . Error ( err ) )
2018-12-10 13:15:26 +10:00
continue
}
resp , err := ocsp . ParseResponse ( ocspBytes , nil )
if err != nil {
// contents are invalid; delete it
2022-03-08 05:26:52 +10:00
err = storage . Delete ( ctx , key )
2018-12-10 13:15:26 +10:00
if err != nil {
2023-11-14 13:59:51 +10:00
logger . Error ( "purging corrupt staple file" , zap . String ( "storage_key" , key ) , zap . Error ( err ) )
2018-12-10 13:15:26 +10:00
}
continue
}
if time . Now ( ) . After ( resp . NextUpdate ) {
// response has expired; delete it
2022-03-08 05:26:52 +10:00
err = storage . Delete ( ctx , key )
2018-12-10 13:15:26 +10:00
if err != nil {
2023-11-14 13:59:51 +10:00
logger . Error ( "purging expired staple file" , zap . String ( "storage_key" , key ) , zap . Error ( err ) )
2018-12-10 13:15:26 +10:00
}
}
}
Significant refactoring to improve correctness and flexibility (#39)
* Significant refactor
This refactoring expands the capabilities of the library for advanced
use cases, as well as improving the overall architecture, including
possible memory leak fixes if used over a long period with many certs
loaded into memory. This refactor enables using different configs
depending on the certificate.
The public API has changed slightly, however, and arguably it is
slightly less convenient/elegant. I have never quite found the perfect
design for this package, and this certainly isn't it, but I think it's
better than what we had before.
There is still work to be done, but this is a good step forward. I've
decoupled Storage from Cache, and made it easier and more correct for
Configs (and Storage values) to be short-lived. Cache is the only value
that should be long-lived.
Note that CertMagic no longer automatically takes care of storage (i.e.
it used to delete old OCSP staples, but now it doesn't). The functions
to do this are still there and even exported, and now we expect the
application to call the cleanup functions when it wants to.
* Fix little oopsies
* Create Manager abstraction so obtain/renew isn't limited to ACME
2019-04-21 02:44:55 +10:00
return nil
2018-12-10 13:15:26 +10:00
}
2023-11-14 13:59:51 +10:00
func deleteExpiredCerts ( ctx context . Context , storage Storage , logger * zap . Logger , gracePeriod time . Duration ) error {
2022-03-08 05:26:52 +10:00
issuerKeys , err := storage . List ( ctx , prefixCerts , false )
2019-09-18 07:54:01 +10:00
if err != nil {
// maybe just hasn't been created yet; no big deal
return nil
}
2020-03-07 10:53:38 +10:00
for _ , issuerKey := range issuerKeys {
2022-03-08 05:26:52 +10:00
siteKeys , err := storage . List ( ctx , issuerKey , false )
2019-09-18 07:54:01 +10:00
if err != nil {
2023-11-14 13:59:51 +10:00
logger . Error ( "listing contents" , zap . String ( "issuer_key" , issuerKey ) , zap . Error ( err ) )
2019-09-18 07:54:01 +10:00
continue
}
for _ , siteKey := range siteKeys {
2020-07-28 08:50:41 +10:00
// if context was cancelled, quit early; otherwise proceed
select {
case <- ctx . Done ( ) :
return ctx . Err ( )
default :
}
2022-03-08 05:26:52 +10:00
siteAssets , err := storage . List ( ctx , siteKey , false )
2019-09-18 07:54:01 +10:00
if err != nil {
2023-11-14 13:59:51 +10:00
logger . Error ( "listing site contents" , zap . String ( "site_key" , siteKey ) , zap . Error ( err ) )
2019-09-18 07:54:01 +10:00
continue
}
for _ , assetKey := range siteAssets {
if path . Ext ( assetKey ) != ".crt" {
continue
}
2022-03-08 05:26:52 +10:00
certFile , err := storage . Load ( ctx , assetKey )
2019-09-18 07:54:01 +10:00
if err != nil {
return fmt . Errorf ( "loading certificate file %s: %v" , assetKey , err )
}
block , _ := pem . Decode ( certFile )
if block == nil || block . Type != "CERTIFICATE" {
return fmt . Errorf ( "certificate file %s does not contain PEM-encoded certificate" , assetKey )
}
cert , err := x509 . ParseCertificate ( block . Bytes )
if err != nil {
return fmt . Errorf ( "certificate file %s is malformed; error parsing PEM: %v" , assetKey , err )
}
2022-08-17 10:08:34 +10:00
if expiredTime := time . Since ( expiresAt ( cert ) ) ; expiredTime >= gracePeriod {
2023-11-14 13:59:51 +10:00
logger . Info ( "certificate expired beyond grace period; cleaning up" ,
zap . String ( "asset_key" , assetKey ) ,
zap . Duration ( "expired_for" , expiredTime ) ,
zap . Duration ( "grace_period" , gracePeriod ) )
2019-09-18 07:54:01 +10:00
baseName := strings . TrimSuffix ( assetKey , ".crt" )
for _ , relatedAsset := range [ ] string {
assetKey ,
baseName + ".key" ,
baseName + ".json" ,
} {
2023-11-14 13:59:51 +10:00
logger . Info ( "deleting asset because resource expired" , zap . String ( "asset_key" , relatedAsset ) )
2022-03-08 05:26:52 +10:00
err := storage . Delete ( ctx , relatedAsset )
2019-09-18 07:54:01 +10:00
if err != nil {
2023-11-14 13:59:51 +10:00
logger . Error ( "could not clean up asset related to expired certificate" ,
zap . String ( "base_name" , baseName ) ,
zap . String ( "related_asset" , relatedAsset ) ,
zap . Error ( err ) )
2019-09-18 07:54:01 +10:00
}
}
}
}
// update listing; if folder is empty, delete it
2022-03-08 05:26:52 +10:00
siteAssets , err = storage . List ( ctx , siteKey , false )
2019-09-18 07:54:01 +10:00
if err != nil {
continue
}
if len ( siteAssets ) == 0 {
2023-11-14 13:59:51 +10:00
logger . Info ( "deleting site folder because key is empty" , zap . String ( "site_key" , siteKey ) )
2022-03-08 05:26:52 +10:00
err := storage . Delete ( ctx , siteKey )
2019-09-18 07:54:01 +10:00
if err != nil {
return fmt . Errorf ( "deleting empty site folder %s: %v" , siteKey , err )
}
}
}
}
return nil
}
2022-02-02 02:05:24 +10:00
// forceRenew forcefully renews cert and replaces it in the cache, and returns the new certificate. It is intended
// for use primarily in the case of cert revocation. This MUST NOT be called within a lock on cfg.certCacheMu.
func ( cfg * Config ) forceRenew ( ctx context . Context , logger * zap . Logger , cert Certificate ) ( Certificate , error ) {
2022-09-27 02:19:28 +10:00
if cert . ocsp != nil && cert . ocsp . Status == ocsp . Revoked {
logger . Warn ( "OCSP status for managed certificate is REVOKED; attempting to replace with new certificate" ,
zap . Strings ( "identifiers" , cert . Names ) ,
zap . Time ( "expiration" , expiresAt ( cert . Leaf ) ) )
} else {
logger . Warn ( "forcefully renewing certificate" ,
zap . Strings ( "identifiers" , cert . Names ) ,
zap . Time ( "expiration" , expiresAt ( cert . Leaf ) ) )
Automatically replace revoked certs managed on-demand
When I initially wrote the auto-replace feature, it was for the standard mode of operation,
which I presumed the vast majority of CertMagic deployments use. At the time, On-Demand
mode of operation was fairly niche. And at the time, it looked tricky to properly enable this feature for on-demand certificates, so I shelved it considering it would be low-impact anyway.
So on-demand certificates didn't benefit from auto-replace in the case of revocation (oh well,
no other servers / ACME clients do that at all anyway).
I guess since that time, the use of CertMagic's exclusive on-demand feature has risen in
popularity. But there is no way to tell, and I had no real way of knowing whether any
significant use of the feature is being had since Caddy has no telemetry. (We used to
have telemetry -- benign, anonymous technical stats to help us understand usage -- but
unfortunately public backlash forced us to end the program.) Based on public feedback
forced by external events, it seems that on-demand TLS deployments are probably rare,
but each of those few deployments actually serve thousands of sites/domains. (The
true importance of this feature would have been clear months ago if Caddy had telemetry,
as Caddy is the primary importer of CertMagic.)
This commit should enable auto-replace for on-demand certificates. It required some
refactoring and some decisions that aren't *entirely* clear are right, but that's how it
goes.
I haven't tested this. (Last time I worked on this feature it took me about 2 days to test properly.)
2022-01-31 14:58:34 +10:00
}
renewName := cert . Names [ 0 ]
// if revoked for key compromise, we can't be sure whether the storage of
// the key is still safe; however, we KNOW the old key is not safe, and we
// can only hope by the time of revocation that storage has been secured;
// key management is not something we want to get into, but in this case
// it seems prudent to replace the key - and since renewal requires reuse
// of a prior key, we can't do a "renew" to replace the cert if we need a
// new key, so we'll have to do an obtain instead
var obtainInsteadOfRenew bool
2022-02-02 02:05:24 +10:00
if cert . ocsp != nil && cert . ocsp . RevocationReason == acme . ReasonKeyCompromise {
2022-03-08 05:26:52 +10:00
err := cfg . moveCompromisedPrivateKey ( ctx , cert , logger )
2022-09-27 02:19:28 +10:00
if err != nil {
Automatically replace revoked certs managed on-demand
When I initially wrote the auto-replace feature, it was for the standard mode of operation,
which I presumed the vast majority of CertMagic deployments use. At the time, On-Demand
mode of operation was fairly niche. And at the time, it looked tricky to properly enable this feature for on-demand certificates, so I shelved it considering it would be low-impact anyway.
So on-demand certificates didn't benefit from auto-replace in the case of revocation (oh well,
no other servers / ACME clients do that at all anyway).
I guess since that time, the use of CertMagic's exclusive on-demand feature has risen in
popularity. But there is no way to tell, and I had no real way of knowing whether any
significant use of the feature is being had since Caddy has no telemetry. (We used to
have telemetry -- benign, anonymous technical stats to help us understand usage -- but
unfortunately public backlash forced us to end the program.) Based on public feedback
forced by external events, it seems that on-demand TLS deployments are probably rare,
but each of those few deployments actually serve thousands of sites/domains. (The
true importance of this feature would have been clear months ago if Caddy had telemetry,
as Caddy is the primary importer of CertMagic.)
This commit should enable auto-replace for on-demand certificates. It required some
refactoring and some decisions that aren't *entirely* clear are right, but that's how it
goes.
I haven't tested this. (Last time I worked on this feature it took me about 2 days to test properly.)
2022-01-31 14:58:34 +10:00
logger . Error ( "could not remove compromised private key from use" ,
zap . Strings ( "identifiers" , cert . Names ) ,
zap . String ( "issuer" , cert . issuerKey ) ,
zap . Error ( err ) )
}
obtainInsteadOfRenew = true
}
var err error
if obtainInsteadOfRenew {
err = cfg . ObtainCertAsync ( ctx , renewName )
} else {
// notice that we force renewal; otherwise, it might see that the
// certificate isn't close to expiring and return, but we really
// need a replacement certificate! see issue #4191
err = cfg . RenewCertAsync ( ctx , renewName , true )
}
if err != nil {
2022-02-02 02:05:24 +10:00
if cert . ocsp != nil && cert . ocsp . Status == ocsp . Revoked {
// probably better to not serve a revoked certificate at all
2022-09-27 02:19:28 +10:00
logger . Error ( "unable to obtain new to certificate after OCSP status of REVOKED; removing from cache" ,
zap . Strings ( "identifiers" , cert . Names ) ,
zap . Error ( err ) )
2022-02-02 02:05:24 +10:00
cfg . certCache . mu . Lock ( )
cfg . certCache . removeCertificate ( cert )
cfg . certCache . mu . Unlock ( )
Automatically replace revoked certs managed on-demand
When I initially wrote the auto-replace feature, it was for the standard mode of operation,
which I presumed the vast majority of CertMagic deployments use. At the time, On-Demand
mode of operation was fairly niche. And at the time, it looked tricky to properly enable this feature for on-demand certificates, so I shelved it considering it would be low-impact anyway.
So on-demand certificates didn't benefit from auto-replace in the case of revocation (oh well,
no other servers / ACME clients do that at all anyway).
I guess since that time, the use of CertMagic's exclusive on-demand feature has risen in
popularity. But there is no way to tell, and I had no real way of knowing whether any
significant use of the feature is being had since Caddy has no telemetry. (We used to
have telemetry -- benign, anonymous technical stats to help us understand usage -- but
unfortunately public backlash forced us to end the program.) Based on public feedback
forced by external events, it seems that on-demand TLS deployments are probably rare,
but each of those few deployments actually serve thousands of sites/domains. (The
true importance of this feature would have been clear months ago if Caddy had telemetry,
as Caddy is the primary importer of CertMagic.)
This commit should enable auto-replace for on-demand certificates. It required some
refactoring and some decisions that aren't *entirely* clear are right, but that's how it
goes.
I haven't tested this. (Last time I worked on this feature it took me about 2 days to test properly.)
2022-01-31 14:58:34 +10:00
}
2022-02-02 02:05:24 +10:00
return cert , fmt . Errorf ( "unable to forcefully get new certificate for %v: %w" , cert . Names , err )
Automatically replace revoked certs managed on-demand
When I initially wrote the auto-replace feature, it was for the standard mode of operation,
which I presumed the vast majority of CertMagic deployments use. At the time, On-Demand
mode of operation was fairly niche. And at the time, it looked tricky to properly enable this feature for on-demand certificates, so I shelved it considering it would be low-impact anyway.
So on-demand certificates didn't benefit from auto-replace in the case of revocation (oh well,
no other servers / ACME clients do that at all anyway).
I guess since that time, the use of CertMagic's exclusive on-demand feature has risen in
popularity. But there is no way to tell, and I had no real way of knowing whether any
significant use of the feature is being had since Caddy has no telemetry. (We used to
have telemetry -- benign, anonymous technical stats to help us understand usage -- but
unfortunately public backlash forced us to end the program.) Based on public feedback
forced by external events, it seems that on-demand TLS deployments are probably rare,
but each of those few deployments actually serve thousands of sites/domains. (The
true importance of this feature would have been clear months ago if Caddy had telemetry,
as Caddy is the primary importer of CertMagic.)
This commit should enable auto-replace for on-demand certificates. It required some
refactoring and some decisions that aren't *entirely* clear are right, but that's how it
goes.
I haven't tested this. (Last time I worked on this feature it took me about 2 days to test properly.)
2022-01-31 14:58:34 +10:00
}
2022-02-02 02:05:24 +10:00
2022-03-08 05:26:52 +10:00
return cfg . reloadManagedCertificate ( ctx , cert )
Automatically replace revoked certs managed on-demand
When I initially wrote the auto-replace feature, it was for the standard mode of operation,
which I presumed the vast majority of CertMagic deployments use. At the time, On-Demand
mode of operation was fairly niche. And at the time, it looked tricky to properly enable this feature for on-demand certificates, so I shelved it considering it would be low-impact anyway.
So on-demand certificates didn't benefit from auto-replace in the case of revocation (oh well,
no other servers / ACME clients do that at all anyway).
I guess since that time, the use of CertMagic's exclusive on-demand feature has risen in
popularity. But there is no way to tell, and I had no real way of knowing whether any
significant use of the feature is being had since Caddy has no telemetry. (We used to
have telemetry -- benign, anonymous technical stats to help us understand usage -- but
unfortunately public backlash forced us to end the program.) Based on public feedback
forced by external events, it seems that on-demand TLS deployments are probably rare,
but each of those few deployments actually serve thousands of sites/domains. (The
true importance of this feature would have been clear months ago if Caddy had telemetry,
as Caddy is the primary importer of CertMagic.)
This commit should enable auto-replace for on-demand certificates. It required some
refactoring and some decisions that aren't *entirely* clear are right, but that's how it
goes.
I haven't tested this. (Last time I worked on this feature it took me about 2 days to test properly.)
2022-01-31 14:58:34 +10:00
}
2021-06-13 05:47:47 +10:00
// moveCompromisedPrivateKey moves the private key for cert to a ".compromised" file
// by copying the data to the new file, then deleting the old one.
2022-03-08 05:26:52 +10:00
func ( cfg * Config ) moveCompromisedPrivateKey ( ctx context . Context , cert Certificate , logger * zap . Logger ) error {
2021-06-13 05:47:47 +10:00
privKeyStorageKey := StorageKeys . SitePrivateKey ( cert . issuerKey , cert . Names [ 0 ] )
2022-03-08 05:26:52 +10:00
privKeyPEM , err := cfg . Storage . Load ( ctx , privKeyStorageKey )
2021-06-13 05:47:47 +10:00
if err != nil {
return err
}
compromisedPrivKeyStorageKey := privKeyStorageKey + ".compromised"
2022-03-08 05:26:52 +10:00
err = cfg . Storage . Store ( ctx , compromisedPrivKeyStorageKey , privKeyPEM )
2021-06-13 05:47:47 +10:00
if err != nil {
// better safe than sorry: as a last resort, try deleting the key so it won't be reused
2022-03-08 05:26:52 +10:00
cfg . Storage . Delete ( ctx , privKeyStorageKey )
2021-06-13 05:47:47 +10:00
return err
}
2022-03-08 05:26:52 +10:00
err = cfg . Storage . Delete ( ctx , privKeyStorageKey )
2021-06-13 05:47:47 +10:00
if err != nil {
return err
}
logger . Info ( "removed certificate's compromised private key from use" ,
zap . String ( "storage_path" , compromisedPrivKeyStorageKey ) ,
zap . Strings ( "identifiers" , cert . Names ) ,
zap . String ( "issuer" , cert . issuerKey ) )
return nil
}
2022-02-02 02:05:24 +10:00
// certShouldBeForceRenewed returns true if cert should be forcefully renewed
// (like if it is revoked according to its OCSP response).
func certShouldBeForceRenewed ( cert Certificate ) bool {
return cert . managed &&
len ( cert . Names ) > 0 &&
cert . ocsp != nil &&
cert . ocsp . Status == ocsp . Revoked
}
2024-05-08 01:46:03 +10:00
type certList [ ] Certificate
// insert appends cert to the list if it is not already in the list.
// Efficiency: O(n)
func ( certs * certList ) insert ( cert Certificate ) {
for _ , c := range * certs {
if c . hash == cert . hash {
return
}
}
* certs = append ( * certs , cert )
}
2018-12-10 13:15:26 +10:00
const (
2020-02-25 11:42:27 +10:00
// DefaultRenewCheckInterval is how often to check certificates for expiration.
// Scans are very lightweight, so this can be semi-frequent. This default should
// be smaller than <Minimum Cert Lifetime>*DefaultRenewalWindowRatio/3, which
// gives certificates plenty of chance to be renewed on time.
DefaultRenewCheckInterval = 10 * time . Minute
// DefaultRenewalWindowRatio is how much of a certificate's lifetime becomes the
// renewal window. The renewal window is the span of time at the end of the
// certificate's validity period in which it should be renewed. A default value
// of ~1/3 is pretty safe and recommended for most certificates.
DefaultRenewalWindowRatio = 1.0 / 3.0
2018-12-10 13:15:26 +10:00
Significant refactoring to improve correctness and flexibility (#39)
* Significant refactor
This refactoring expands the capabilities of the library for advanced
use cases, as well as improving the overall architecture, including
possible memory leak fixes if used over a long period with many certs
loaded into memory. This refactor enables using different configs
depending on the certificate.
The public API has changed slightly, however, and arguably it is
slightly less convenient/elegant. I have never quite found the perfect
design for this package, and this certainly isn't it, but I think it's
better than what we had before.
There is still work to be done, but this is a good step forward. I've
decoupled Storage from Cache, and made it easier and more correct for
Configs (and Storage values) to be short-lived. Cache is the only value
that should be long-lived.
Note that CertMagic no longer automatically takes care of storage (i.e.
it used to delete old OCSP staples, but now it doesn't). The functions
to do this are still there and even exported, and now we expect the
application to call the cleanup functions when it wants to.
* Fix little oopsies
* Create Manager abstraction so obtain/renew isn't limited to ACME
2019-04-21 02:44:55 +10:00
// DefaultOCSPCheckInterval is how often to check if OCSP stapling needs updating.
DefaultOCSPCheckInterval = 1 * time . Hour
2018-12-10 13:15:26 +10:00
)