2020-07-28 08:50:41 +10:00
|
|
|
// Copyright 2015 Matthew Holt
|
|
|
|
//
|
|
|
|
// Licensed under the Apache License, Version 2.0 (the "License");
|
|
|
|
// you may not use this file except in compliance with the License.
|
|
|
|
// You may obtain a copy of the License at
|
|
|
|
//
|
|
|
|
// http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
//
|
|
|
|
// Unless required by applicable law or agreed to in writing, software
|
|
|
|
// distributed under the License is distributed on an "AS IS" BASIS,
|
|
|
|
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
|
|
// See the License for the specific language governing permissions and
|
|
|
|
// limitations under the License.
|
|
|
|
|
2019-10-22 03:01:59 +10:00
|
|
|
package certmagic
|
|
|
|
|
|
|
|
import (
|
Update rate limiter to allow cancellation; add context to arguments
The previous rate limiter design did not allow reservation cancellation.
This became problematic with lots of config reloads in Caddy for large
numbers of domain names. While the rate limiter had a backlog, a new
config would come in and add even more to the rate limiter, and even
more over time as background maintenance (renewals) kicked in. This
leaked goroutines and memory as a side-effect, and blocked the issuance
of certificates nigh indefinitely.
The new rate limiter does not make future reservations like the previous
one did. However, this requires us to run a single scheduler goroutine
when a rate limiter is created, which requires being cleaned up when the
rate limiter is no longer needed. As rate limits are global and should
live up to the life of the process, there is currently no actual cleanup
that takes place, but if it did happen, one would simply call Stop() on
the rate limiter to stop that goroutine.
With this new design, reservations are made only as the event actually
happens; implementing cancellation with the old design would have been
almost impossible to do correctly in a practical, elegant way. Although
the trade-off is an extra goroutine that needs cleaning up, this is
seldom (if ever?) needed in practice, and the benefit is that waiting
goroutines can be unblocked when their context is canceled. This allows
Caddy, for example, to reload configs often and cancel any goroutines
that were merely waiting on the rate limiter.
Now, all Obtain, Renew, and Revoke calls accept a context that can be
cancelled.
We also eliminate the acmeMu, a mutex that permitted only a single ACME
operation at a time by the process, which was our early, naive form of
rate limiting, which should no longer be necessary.
On-demand obtain and renew do not yet use cancelable contexts, because
what defines the context of a TLS handshake is still unclear. We might
end up using a simple context with a timeout that is the maximum length
of a TLS handshake in practice, say, 1 minute.
This is a breaking change, but critical for larger deployments with very
dynamic configurations.
2019-12-17 06:36:41 +10:00
|
|
|
"context"
|
2020-05-13 01:28:56 +10:00
|
|
|
"log"
|
|
|
|
"runtime"
|
2019-10-22 03:01:59 +10:00
|
|
|
"sync"
|
|
|
|
"time"
|
|
|
|
)
|
|
|
|
|
2019-10-22 03:48:22 +10:00
|
|
|
// NewRateLimiter returns a rate limiter that allows up to maxEvents
|
|
|
|
// in a sliding window of size window. If maxEvents and window are
|
|
|
|
// both 0, or if maxEvents is non-zero and window is 0, rate limiting
|
Update rate limiter to allow cancellation; add context to arguments
The previous rate limiter design did not allow reservation cancellation.
This became problematic with lots of config reloads in Caddy for large
numbers of domain names. While the rate limiter had a backlog, a new
config would come in and add even more to the rate limiter, and even
more over time as background maintenance (renewals) kicked in. This
leaked goroutines and memory as a side-effect, and blocked the issuance
of certificates nigh indefinitely.
The new rate limiter does not make future reservations like the previous
one did. However, this requires us to run a single scheduler goroutine
when a rate limiter is created, which requires being cleaned up when the
rate limiter is no longer needed. As rate limits are global and should
live up to the life of the process, there is currently no actual cleanup
that takes place, but if it did happen, one would simply call Stop() on
the rate limiter to stop that goroutine.
With this new design, reservations are made only as the event actually
happens; implementing cancellation with the old design would have been
almost impossible to do correctly in a practical, elegant way. Although
the trade-off is an extra goroutine that needs cleaning up, this is
seldom (if ever?) needed in practice, and the benefit is that waiting
goroutines can be unblocked when their context is canceled. This allows
Caddy, for example, to reload configs often and cancel any goroutines
that were merely waiting on the rate limiter.
Now, all Obtain, Renew, and Revoke calls accept a context that can be
cancelled.
We also eliminate the acmeMu, a mutex that permitted only a single ACME
operation at a time by the process, which was our early, naive form of
rate limiting, which should no longer be necessary.
On-demand obtain and renew do not yet use cancelable contexts, because
what defines the context of a TLS handshake is still unclear. We might
end up using a simple context with a timeout that is the maximum length
of a TLS handshake in practice, say, 1 minute.
This is a breaking change, but critical for larger deployments with very
dynamic configurations.
2019-12-17 06:36:41 +10:00
|
|
|
// is disabled. This function panics if maxEvents is less than 0 or
|
|
|
|
// if maxEvents is 0 and window is non-zero, which is considered to be
|
|
|
|
// an invalid configuration, as it would never allow events.
|
2019-10-22 03:01:59 +10:00
|
|
|
func NewRateLimiter(maxEvents int, window time.Duration) *RingBufferRateLimiter {
|
2019-10-22 03:48:22 +10:00
|
|
|
if maxEvents < 0 {
|
|
|
|
panic("maxEvents cannot be less than zero")
|
|
|
|
}
|
Update rate limiter to allow cancellation; add context to arguments
The previous rate limiter design did not allow reservation cancellation.
This became problematic with lots of config reloads in Caddy for large
numbers of domain names. While the rate limiter had a backlog, a new
config would come in and add even more to the rate limiter, and even
more over time as background maintenance (renewals) kicked in. This
leaked goroutines and memory as a side-effect, and blocked the issuance
of certificates nigh indefinitely.
The new rate limiter does not make future reservations like the previous
one did. However, this requires us to run a single scheduler goroutine
when a rate limiter is created, which requires being cleaned up when the
rate limiter is no longer needed. As rate limits are global and should
live up to the life of the process, there is currently no actual cleanup
that takes place, but if it did happen, one would simply call Stop() on
the rate limiter to stop that goroutine.
With this new design, reservations are made only as the event actually
happens; implementing cancellation with the old design would have been
almost impossible to do correctly in a practical, elegant way. Although
the trade-off is an extra goroutine that needs cleaning up, this is
seldom (if ever?) needed in practice, and the benefit is that waiting
goroutines can be unblocked when their context is canceled. This allows
Caddy, for example, to reload configs often and cancel any goroutines
that were merely waiting on the rate limiter.
Now, all Obtain, Renew, and Revoke calls accept a context that can be
cancelled.
We also eliminate the acmeMu, a mutex that permitted only a single ACME
operation at a time by the process, which was our early, naive form of
rate limiting, which should no longer be necessary.
On-demand obtain and renew do not yet use cancelable contexts, because
what defines the context of a TLS handshake is still unclear. We might
end up using a simple context with a timeout that is the maximum length
of a TLS handshake in practice, say, 1 minute.
This is a breaking change, but critical for larger deployments with very
dynamic configurations.
2019-12-17 06:36:41 +10:00
|
|
|
if maxEvents == 0 && window != 0 {
|
2023-05-06 12:26:50 +10:00
|
|
|
panic("NewRateLimiter: invalid configuration: maxEvents = 0 and window != 0 would not allow any events")
|
2019-10-22 03:01:59 +10:00
|
|
|
}
|
Update rate limiter to allow cancellation; add context to arguments
The previous rate limiter design did not allow reservation cancellation.
This became problematic with lots of config reloads in Caddy for large
numbers of domain names. While the rate limiter had a backlog, a new
config would come in and add even more to the rate limiter, and even
more over time as background maintenance (renewals) kicked in. This
leaked goroutines and memory as a side-effect, and blocked the issuance
of certificates nigh indefinitely.
The new rate limiter does not make future reservations like the previous
one did. However, this requires us to run a single scheduler goroutine
when a rate limiter is created, which requires being cleaned up when the
rate limiter is no longer needed. As rate limits are global and should
live up to the life of the process, there is currently no actual cleanup
that takes place, but if it did happen, one would simply call Stop() on
the rate limiter to stop that goroutine.
With this new design, reservations are made only as the event actually
happens; implementing cancellation with the old design would have been
almost impossible to do correctly in a practical, elegant way. Although
the trade-off is an extra goroutine that needs cleaning up, this is
seldom (if ever?) needed in practice, and the benefit is that waiting
goroutines can be unblocked when their context is canceled. This allows
Caddy, for example, to reload configs often and cancel any goroutines
that were merely waiting on the rate limiter.
Now, all Obtain, Renew, and Revoke calls accept a context that can be
cancelled.
We also eliminate the acmeMu, a mutex that permitted only a single ACME
operation at a time by the process, which was our early, naive form of
rate limiting, which should no longer be necessary.
On-demand obtain and renew do not yet use cancelable contexts, because
what defines the context of a TLS handshake is still unclear. We might
end up using a simple context with a timeout that is the maximum length
of a TLS handshake in practice, say, 1 minute.
This is a breaking change, but critical for larger deployments with very
dynamic configurations.
2019-12-17 06:36:41 +10:00
|
|
|
rbrl := &RingBufferRateLimiter{
|
|
|
|
window: window,
|
|
|
|
ring: make([]time.Time, maxEvents),
|
2019-12-17 14:23:11 +10:00
|
|
|
started: make(chan struct{}),
|
|
|
|
stopped: make(chan struct{}),
|
|
|
|
ticket: make(chan struct{}),
|
Update rate limiter to allow cancellation; add context to arguments
The previous rate limiter design did not allow reservation cancellation.
This became problematic with lots of config reloads in Caddy for large
numbers of domain names. While the rate limiter had a backlog, a new
config would come in and add even more to the rate limiter, and even
more over time as background maintenance (renewals) kicked in. This
leaked goroutines and memory as a side-effect, and blocked the issuance
of certificates nigh indefinitely.
The new rate limiter does not make future reservations like the previous
one did. However, this requires us to run a single scheduler goroutine
when a rate limiter is created, which requires being cleaned up when the
rate limiter is no longer needed. As rate limits are global and should
live up to the life of the process, there is currently no actual cleanup
that takes place, but if it did happen, one would simply call Stop() on
the rate limiter to stop that goroutine.
With this new design, reservations are made only as the event actually
happens; implementing cancellation with the old design would have been
almost impossible to do correctly in a practical, elegant way. Although
the trade-off is an extra goroutine that needs cleaning up, this is
seldom (if ever?) needed in practice, and the benefit is that waiting
goroutines can be unblocked when their context is canceled. This allows
Caddy, for example, to reload configs often and cancel any goroutines
that were merely waiting on the rate limiter.
Now, all Obtain, Renew, and Revoke calls accept a context that can be
cancelled.
We also eliminate the acmeMu, a mutex that permitted only a single ACME
operation at a time by the process, which was our early, naive form of
rate limiting, which should no longer be necessary.
On-demand obtain and renew do not yet use cancelable contexts, because
what defines the context of a TLS handshake is still unclear. We might
end up using a simple context with a timeout that is the maximum length
of a TLS handshake in practice, say, 1 minute.
This is a breaking change, but critical for larger deployments with very
dynamic configurations.
2019-12-17 06:36:41 +10:00
|
|
|
}
|
|
|
|
go rbrl.loop()
|
2019-12-17 14:23:11 +10:00
|
|
|
<-rbrl.started // make sure loop is ready to receive before we return
|
Update rate limiter to allow cancellation; add context to arguments
The previous rate limiter design did not allow reservation cancellation.
This became problematic with lots of config reloads in Caddy for large
numbers of domain names. While the rate limiter had a backlog, a new
config would come in and add even more to the rate limiter, and even
more over time as background maintenance (renewals) kicked in. This
leaked goroutines and memory as a side-effect, and blocked the issuance
of certificates nigh indefinitely.
The new rate limiter does not make future reservations like the previous
one did. However, this requires us to run a single scheduler goroutine
when a rate limiter is created, which requires being cleaned up when the
rate limiter is no longer needed. As rate limits are global and should
live up to the life of the process, there is currently no actual cleanup
that takes place, but if it did happen, one would simply call Stop() on
the rate limiter to stop that goroutine.
With this new design, reservations are made only as the event actually
happens; implementing cancellation with the old design would have been
almost impossible to do correctly in a practical, elegant way. Although
the trade-off is an extra goroutine that needs cleaning up, this is
seldom (if ever?) needed in practice, and the benefit is that waiting
goroutines can be unblocked when their context is canceled. This allows
Caddy, for example, to reload configs often and cancel any goroutines
that were merely waiting on the rate limiter.
Now, all Obtain, Renew, and Revoke calls accept a context that can be
cancelled.
We also eliminate the acmeMu, a mutex that permitted only a single ACME
operation at a time by the process, which was our early, naive form of
rate limiting, which should no longer be necessary.
On-demand obtain and renew do not yet use cancelable contexts, because
what defines the context of a TLS handshake is still unclear. We might
end up using a simple context with a timeout that is the maximum length
of a TLS handshake in practice, say, 1 minute.
This is a breaking change, but critical for larger deployments with very
dynamic configurations.
2019-12-17 06:36:41 +10:00
|
|
|
return rbrl
|
2019-10-22 03:01:59 +10:00
|
|
|
}
|
|
|
|
|
|
|
|
// RingBufferRateLimiter uses a ring to enforce rate limits
|
|
|
|
// consisting of a maximum number of events within a single
|
|
|
|
// sliding window of a given duration. An empty value is
|
|
|
|
// not valid; use NewRateLimiter to get one.
|
|
|
|
type RingBufferRateLimiter struct {
|
Update rate limiter to allow cancellation; add context to arguments
The previous rate limiter design did not allow reservation cancellation.
This became problematic with lots of config reloads in Caddy for large
numbers of domain names. While the rate limiter had a backlog, a new
config would come in and add even more to the rate limiter, and even
more over time as background maintenance (renewals) kicked in. This
leaked goroutines and memory as a side-effect, and blocked the issuance
of certificates nigh indefinitely.
The new rate limiter does not make future reservations like the previous
one did. However, this requires us to run a single scheduler goroutine
when a rate limiter is created, which requires being cleaned up when the
rate limiter is no longer needed. As rate limits are global and should
live up to the life of the process, there is currently no actual cleanup
that takes place, but if it did happen, one would simply call Stop() on
the rate limiter to stop that goroutine.
With this new design, reservations are made only as the event actually
happens; implementing cancellation with the old design would have been
almost impossible to do correctly in a practical, elegant way. Although
the trade-off is an extra goroutine that needs cleaning up, this is
seldom (if ever?) needed in practice, and the benefit is that waiting
goroutines can be unblocked when their context is canceled. This allows
Caddy, for example, to reload configs often and cancel any goroutines
that were merely waiting on the rate limiter.
Now, all Obtain, Renew, and Revoke calls accept a context that can be
cancelled.
We also eliminate the acmeMu, a mutex that permitted only a single ACME
operation at a time by the process, which was our early, naive form of
rate limiting, which should no longer be necessary.
On-demand obtain and renew do not yet use cancelable contexts, because
what defines the context of a TLS handshake is still unclear. We might
end up using a simple context with a timeout that is the maximum length
of a TLS handshake in practice, say, 1 minute.
This is a breaking change, but critical for larger deployments with very
dynamic configurations.
2019-12-17 06:36:41 +10:00
|
|
|
window time.Duration
|
|
|
|
ring []time.Time // maxEvents == len(ring)
|
|
|
|
cursor int // always points to the oldest timestamp
|
|
|
|
mu sync.Mutex // protects ring, cursor, and window
|
2019-12-17 14:23:11 +10:00
|
|
|
started chan struct{}
|
|
|
|
stopped chan struct{}
|
|
|
|
ticket chan struct{}
|
Update rate limiter to allow cancellation; add context to arguments
The previous rate limiter design did not allow reservation cancellation.
This became problematic with lots of config reloads in Caddy for large
numbers of domain names. While the rate limiter had a backlog, a new
config would come in and add even more to the rate limiter, and even
more over time as background maintenance (renewals) kicked in. This
leaked goroutines and memory as a side-effect, and blocked the issuance
of certificates nigh indefinitely.
The new rate limiter does not make future reservations like the previous
one did. However, this requires us to run a single scheduler goroutine
when a rate limiter is created, which requires being cleaned up when the
rate limiter is no longer needed. As rate limits are global and should
live up to the life of the process, there is currently no actual cleanup
that takes place, but if it did happen, one would simply call Stop() on
the rate limiter to stop that goroutine.
With this new design, reservations are made only as the event actually
happens; implementing cancellation with the old design would have been
almost impossible to do correctly in a practical, elegant way. Although
the trade-off is an extra goroutine that needs cleaning up, this is
seldom (if ever?) needed in practice, and the benefit is that waiting
goroutines can be unblocked when their context is canceled. This allows
Caddy, for example, to reload configs often and cancel any goroutines
that were merely waiting on the rate limiter.
Now, all Obtain, Renew, and Revoke calls accept a context that can be
cancelled.
We also eliminate the acmeMu, a mutex that permitted only a single ACME
operation at a time by the process, which was our early, naive form of
rate limiting, which should no longer be necessary.
On-demand obtain and renew do not yet use cancelable contexts, because
what defines the context of a TLS handshake is still unclear. We might
end up using a simple context with a timeout that is the maximum length
of a TLS handshake in practice, say, 1 minute.
This is a breaking change, but critical for larger deployments with very
dynamic configurations.
2019-12-17 06:36:41 +10:00
|
|
|
}
|
|
|
|
|
|
|
|
// Stop cleans up r's scheduling goroutine.
|
|
|
|
func (r *RingBufferRateLimiter) Stop() {
|
2019-12-17 14:23:11 +10:00
|
|
|
close(r.stopped)
|
Update rate limiter to allow cancellation; add context to arguments
The previous rate limiter design did not allow reservation cancellation.
This became problematic with lots of config reloads in Caddy for large
numbers of domain names. While the rate limiter had a backlog, a new
config would come in and add even more to the rate limiter, and even
more over time as background maintenance (renewals) kicked in. This
leaked goroutines and memory as a side-effect, and blocked the issuance
of certificates nigh indefinitely.
The new rate limiter does not make future reservations like the previous
one did. However, this requires us to run a single scheduler goroutine
when a rate limiter is created, which requires being cleaned up when the
rate limiter is no longer needed. As rate limits are global and should
live up to the life of the process, there is currently no actual cleanup
that takes place, but if it did happen, one would simply call Stop() on
the rate limiter to stop that goroutine.
With this new design, reservations are made only as the event actually
happens; implementing cancellation with the old design would have been
almost impossible to do correctly in a practical, elegant way. Although
the trade-off is an extra goroutine that needs cleaning up, this is
seldom (if ever?) needed in practice, and the benefit is that waiting
goroutines can be unblocked when their context is canceled. This allows
Caddy, for example, to reload configs often and cancel any goroutines
that were merely waiting on the rate limiter.
Now, all Obtain, Renew, and Revoke calls accept a context that can be
cancelled.
We also eliminate the acmeMu, a mutex that permitted only a single ACME
operation at a time by the process, which was our early, naive form of
rate limiting, which should no longer be necessary.
On-demand obtain and renew do not yet use cancelable contexts, because
what defines the context of a TLS handshake is still unclear. We might
end up using a simple context with a timeout that is the maximum length
of a TLS handshake in practice, say, 1 minute.
This is a breaking change, but critical for larger deployments with very
dynamic configurations.
2019-12-17 06:36:41 +10:00
|
|
|
}
|
|
|
|
|
|
|
|
func (r *RingBufferRateLimiter) loop() {
|
2020-05-13 01:28:56 +10:00
|
|
|
defer func() {
|
|
|
|
if err := recover(); err != nil {
|
|
|
|
buf := make([]byte, stackTraceBufferSize)
|
|
|
|
buf = buf[:runtime.Stack(buf, false)]
|
|
|
|
log.Printf("panic: ring buffer rate limiter: %v\n%s", err, buf)
|
|
|
|
}
|
|
|
|
}()
|
|
|
|
|
Update rate limiter to allow cancellation; add context to arguments
The previous rate limiter design did not allow reservation cancellation.
This became problematic with lots of config reloads in Caddy for large
numbers of domain names. While the rate limiter had a backlog, a new
config would come in and add even more to the rate limiter, and even
more over time as background maintenance (renewals) kicked in. This
leaked goroutines and memory as a side-effect, and blocked the issuance
of certificates nigh indefinitely.
The new rate limiter does not make future reservations like the previous
one did. However, this requires us to run a single scheduler goroutine
when a rate limiter is created, which requires being cleaned up when the
rate limiter is no longer needed. As rate limits are global and should
live up to the life of the process, there is currently no actual cleanup
that takes place, but if it did happen, one would simply call Stop() on
the rate limiter to stop that goroutine.
With this new design, reservations are made only as the event actually
happens; implementing cancellation with the old design would have been
almost impossible to do correctly in a practical, elegant way. Although
the trade-off is an extra goroutine that needs cleaning up, this is
seldom (if ever?) needed in practice, and the benefit is that waiting
goroutines can be unblocked when their context is canceled. This allows
Caddy, for example, to reload configs often and cancel any goroutines
that were merely waiting on the rate limiter.
Now, all Obtain, Renew, and Revoke calls accept a context that can be
cancelled.
We also eliminate the acmeMu, a mutex that permitted only a single ACME
operation at a time by the process, which was our early, naive form of
rate limiting, which should no longer be necessary.
On-demand obtain and renew do not yet use cancelable contexts, because
what defines the context of a TLS handshake is still unclear. We might
end up using a simple context with a timeout that is the maximum length
of a TLS handshake in practice, say, 1 minute.
This is a breaking change, but critical for larger deployments with very
dynamic configurations.
2019-12-17 06:36:41 +10:00
|
|
|
for {
|
|
|
|
// if we've been stopped, return
|
|
|
|
select {
|
2019-12-17 14:23:11 +10:00
|
|
|
case <-r.stopped:
|
Update rate limiter to allow cancellation; add context to arguments
The previous rate limiter design did not allow reservation cancellation.
This became problematic with lots of config reloads in Caddy for large
numbers of domain names. While the rate limiter had a backlog, a new
config would come in and add even more to the rate limiter, and even
more over time as background maintenance (renewals) kicked in. This
leaked goroutines and memory as a side-effect, and blocked the issuance
of certificates nigh indefinitely.
The new rate limiter does not make future reservations like the previous
one did. However, this requires us to run a single scheduler goroutine
when a rate limiter is created, which requires being cleaned up when the
rate limiter is no longer needed. As rate limits are global and should
live up to the life of the process, there is currently no actual cleanup
that takes place, but if it did happen, one would simply call Stop() on
the rate limiter to stop that goroutine.
With this new design, reservations are made only as the event actually
happens; implementing cancellation with the old design would have been
almost impossible to do correctly in a practical, elegant way. Although
the trade-off is an extra goroutine that needs cleaning up, this is
seldom (if ever?) needed in practice, and the benefit is that waiting
goroutines can be unblocked when their context is canceled. This allows
Caddy, for example, to reload configs often and cancel any goroutines
that were merely waiting on the rate limiter.
Now, all Obtain, Renew, and Revoke calls accept a context that can be
cancelled.
We also eliminate the acmeMu, a mutex that permitted only a single ACME
operation at a time by the process, which was our early, naive form of
rate limiting, which should no longer be necessary.
On-demand obtain and renew do not yet use cancelable contexts, because
what defines the context of a TLS handshake is still unclear. We might
end up using a simple context with a timeout that is the maximum length
of a TLS handshake in practice, say, 1 minute.
This is a breaking change, but critical for larger deployments with very
dynamic configurations.
2019-12-17 06:36:41 +10:00
|
|
|
return
|
|
|
|
default:
|
|
|
|
}
|
|
|
|
|
|
|
|
if len(r.ring) == 0 {
|
|
|
|
if r.window == 0 {
|
|
|
|
// rate limiting is disabled; always allow immediately
|
|
|
|
r.permit()
|
|
|
|
continue
|
|
|
|
}
|
|
|
|
panic("invalid configuration: maxEvents = 0 and window != 0 does not allow any events")
|
|
|
|
}
|
|
|
|
|
|
|
|
// wait until next slot is available or until we've been stopped
|
2019-12-17 14:23:11 +10:00
|
|
|
r.mu.Lock()
|
Update rate limiter to allow cancellation; add context to arguments
The previous rate limiter design did not allow reservation cancellation.
This became problematic with lots of config reloads in Caddy for large
numbers of domain names. While the rate limiter had a backlog, a new
config would come in and add even more to the rate limiter, and even
more over time as background maintenance (renewals) kicked in. This
leaked goroutines and memory as a side-effect, and blocked the issuance
of certificates nigh indefinitely.
The new rate limiter does not make future reservations like the previous
one did. However, this requires us to run a single scheduler goroutine
when a rate limiter is created, which requires being cleaned up when the
rate limiter is no longer needed. As rate limits are global and should
live up to the life of the process, there is currently no actual cleanup
that takes place, but if it did happen, one would simply call Stop() on
the rate limiter to stop that goroutine.
With this new design, reservations are made only as the event actually
happens; implementing cancellation with the old design would have been
almost impossible to do correctly in a practical, elegant way. Although
the trade-off is an extra goroutine that needs cleaning up, this is
seldom (if ever?) needed in practice, and the benefit is that waiting
goroutines can be unblocked when their context is canceled. This allows
Caddy, for example, to reload configs often and cancel any goroutines
that were merely waiting on the rate limiter.
Now, all Obtain, Renew, and Revoke calls accept a context that can be
cancelled.
We also eliminate the acmeMu, a mutex that permitted only a single ACME
operation at a time by the process, which was our early, naive form of
rate limiting, which should no longer be necessary.
On-demand obtain and renew do not yet use cancelable contexts, because
what defines the context of a TLS handshake is still unclear. We might
end up using a simple context with a timeout that is the maximum length
of a TLS handshake in practice, say, 1 minute.
This is a breaking change, but critical for larger deployments with very
dynamic configurations.
2019-12-17 06:36:41 +10:00
|
|
|
then := r.ring[r.cursor].Add(r.window)
|
|
|
|
r.mu.Unlock()
|
|
|
|
waitDuration := time.Until(then)
|
|
|
|
waitTimer := time.NewTimer(waitDuration)
|
|
|
|
select {
|
|
|
|
case <-waitTimer.C:
|
|
|
|
r.permit()
|
2019-12-17 14:23:11 +10:00
|
|
|
case <-r.stopped:
|
2019-12-17 06:41:32 +10:00
|
|
|
waitTimer.Stop()
|
Update rate limiter to allow cancellation; add context to arguments
The previous rate limiter design did not allow reservation cancellation.
This became problematic with lots of config reloads in Caddy for large
numbers of domain names. While the rate limiter had a backlog, a new
config would come in and add even more to the rate limiter, and even
more over time as background maintenance (renewals) kicked in. This
leaked goroutines and memory as a side-effect, and blocked the issuance
of certificates nigh indefinitely.
The new rate limiter does not make future reservations like the previous
one did. However, this requires us to run a single scheduler goroutine
when a rate limiter is created, which requires being cleaned up when the
rate limiter is no longer needed. As rate limits are global and should
live up to the life of the process, there is currently no actual cleanup
that takes place, but if it did happen, one would simply call Stop() on
the rate limiter to stop that goroutine.
With this new design, reservations are made only as the event actually
happens; implementing cancellation with the old design would have been
almost impossible to do correctly in a practical, elegant way. Although
the trade-off is an extra goroutine that needs cleaning up, this is
seldom (if ever?) needed in practice, and the benefit is that waiting
goroutines can be unblocked when their context is canceled. This allows
Caddy, for example, to reload configs often and cancel any goroutines
that were merely waiting on the rate limiter.
Now, all Obtain, Renew, and Revoke calls accept a context that can be
cancelled.
We also eliminate the acmeMu, a mutex that permitted only a single ACME
operation at a time by the process, which was our early, naive form of
rate limiting, which should no longer be necessary.
On-demand obtain and renew do not yet use cancelable contexts, because
what defines the context of a TLS handshake is still unclear. We might
end up using a simple context with a timeout that is the maximum length
of a TLS handshake in practice, say, 1 minute.
This is a breaking change, but critical for larger deployments with very
dynamic configurations.
2019-12-17 06:36:41 +10:00
|
|
|
return
|
|
|
|
}
|
|
|
|
}
|
2019-10-22 03:01:59 +10:00
|
|
|
}
|
|
|
|
|
|
|
|
// Allow returns true if the event is allowed to
|
|
|
|
// happen right now. It does not wait. If the event
|
Update rate limiter to allow cancellation; add context to arguments
The previous rate limiter design did not allow reservation cancellation.
This became problematic with lots of config reloads in Caddy for large
numbers of domain names. While the rate limiter had a backlog, a new
config would come in and add even more to the rate limiter, and even
more over time as background maintenance (renewals) kicked in. This
leaked goroutines and memory as a side-effect, and blocked the issuance
of certificates nigh indefinitely.
The new rate limiter does not make future reservations like the previous
one did. However, this requires us to run a single scheduler goroutine
when a rate limiter is created, which requires being cleaned up when the
rate limiter is no longer needed. As rate limits are global and should
live up to the life of the process, there is currently no actual cleanup
that takes place, but if it did happen, one would simply call Stop() on
the rate limiter to stop that goroutine.
With this new design, reservations are made only as the event actually
happens; implementing cancellation with the old design would have been
almost impossible to do correctly in a practical, elegant way. Although
the trade-off is an extra goroutine that needs cleaning up, this is
seldom (if ever?) needed in practice, and the benefit is that waiting
goroutines can be unblocked when their context is canceled. This allows
Caddy, for example, to reload configs often and cancel any goroutines
that were merely waiting on the rate limiter.
Now, all Obtain, Renew, and Revoke calls accept a context that can be
cancelled.
We also eliminate the acmeMu, a mutex that permitted only a single ACME
operation at a time by the process, which was our early, naive form of
rate limiting, which should no longer be necessary.
On-demand obtain and renew do not yet use cancelable contexts, because
what defines the context of a TLS handshake is still unclear. We might
end up using a simple context with a timeout that is the maximum length
of a TLS handshake in practice, say, 1 minute.
This is a breaking change, but critical for larger deployments with very
dynamic configurations.
2019-12-17 06:36:41 +10:00
|
|
|
// is allowed, a ticket is claimed.
|
2019-10-22 03:01:59 +10:00
|
|
|
func (r *RingBufferRateLimiter) Allow() bool {
|
Update rate limiter to allow cancellation; add context to arguments
The previous rate limiter design did not allow reservation cancellation.
This became problematic with lots of config reloads in Caddy for large
numbers of domain names. While the rate limiter had a backlog, a new
config would come in and add even more to the rate limiter, and even
more over time as background maintenance (renewals) kicked in. This
leaked goroutines and memory as a side-effect, and blocked the issuance
of certificates nigh indefinitely.
The new rate limiter does not make future reservations like the previous
one did. However, this requires us to run a single scheduler goroutine
when a rate limiter is created, which requires being cleaned up when the
rate limiter is no longer needed. As rate limits are global and should
live up to the life of the process, there is currently no actual cleanup
that takes place, but if it did happen, one would simply call Stop() on
the rate limiter to stop that goroutine.
With this new design, reservations are made only as the event actually
happens; implementing cancellation with the old design would have been
almost impossible to do correctly in a practical, elegant way. Although
the trade-off is an extra goroutine that needs cleaning up, this is
seldom (if ever?) needed in practice, and the benefit is that waiting
goroutines can be unblocked when their context is canceled. This allows
Caddy, for example, to reload configs often and cancel any goroutines
that were merely waiting on the rate limiter.
Now, all Obtain, Renew, and Revoke calls accept a context that can be
cancelled.
We also eliminate the acmeMu, a mutex that permitted only a single ACME
operation at a time by the process, which was our early, naive form of
rate limiting, which should no longer be necessary.
On-demand obtain and renew do not yet use cancelable contexts, because
what defines the context of a TLS handshake is still unclear. We might
end up using a simple context with a timeout that is the maximum length
of a TLS handshake in practice, say, 1 minute.
This is a breaking change, but critical for larger deployments with very
dynamic configurations.
2019-12-17 06:36:41 +10:00
|
|
|
select {
|
2019-12-17 14:23:11 +10:00
|
|
|
case <-r.ticket:
|
2019-10-22 03:01:59 +10:00
|
|
|
return true
|
Update rate limiter to allow cancellation; add context to arguments
The previous rate limiter design did not allow reservation cancellation.
This became problematic with lots of config reloads in Caddy for large
numbers of domain names. While the rate limiter had a backlog, a new
config would come in and add even more to the rate limiter, and even
more over time as background maintenance (renewals) kicked in. This
leaked goroutines and memory as a side-effect, and blocked the issuance
of certificates nigh indefinitely.
The new rate limiter does not make future reservations like the previous
one did. However, this requires us to run a single scheduler goroutine
when a rate limiter is created, which requires being cleaned up when the
rate limiter is no longer needed. As rate limits are global and should
live up to the life of the process, there is currently no actual cleanup
that takes place, but if it did happen, one would simply call Stop() on
the rate limiter to stop that goroutine.
With this new design, reservations are made only as the event actually
happens; implementing cancellation with the old design would have been
almost impossible to do correctly in a practical, elegant way. Although
the trade-off is an extra goroutine that needs cleaning up, this is
seldom (if ever?) needed in practice, and the benefit is that waiting
goroutines can be unblocked when their context is canceled. This allows
Caddy, for example, to reload configs often and cancel any goroutines
that were merely waiting on the rate limiter.
Now, all Obtain, Renew, and Revoke calls accept a context that can be
cancelled.
We also eliminate the acmeMu, a mutex that permitted only a single ACME
operation at a time by the process, which was our early, naive form of
rate limiting, which should no longer be necessary.
On-demand obtain and renew do not yet use cancelable contexts, because
what defines the context of a TLS handshake is still unclear. We might
end up using a simple context with a timeout that is the maximum length
of a TLS handshake in practice, say, 1 minute.
This is a breaking change, but critical for larger deployments with very
dynamic configurations.
2019-12-17 06:36:41 +10:00
|
|
|
default:
|
|
|
|
return false
|
2019-10-22 03:01:59 +10:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
Update rate limiter to allow cancellation; add context to arguments
The previous rate limiter design did not allow reservation cancellation.
This became problematic with lots of config reloads in Caddy for large
numbers of domain names. While the rate limiter had a backlog, a new
config would come in and add even more to the rate limiter, and even
more over time as background maintenance (renewals) kicked in. This
leaked goroutines and memory as a side-effect, and blocked the issuance
of certificates nigh indefinitely.
The new rate limiter does not make future reservations like the previous
one did. However, this requires us to run a single scheduler goroutine
when a rate limiter is created, which requires being cleaned up when the
rate limiter is no longer needed. As rate limits are global and should
live up to the life of the process, there is currently no actual cleanup
that takes place, but if it did happen, one would simply call Stop() on
the rate limiter to stop that goroutine.
With this new design, reservations are made only as the event actually
happens; implementing cancellation with the old design would have been
almost impossible to do correctly in a practical, elegant way. Although
the trade-off is an extra goroutine that needs cleaning up, this is
seldom (if ever?) needed in practice, and the benefit is that waiting
goroutines can be unblocked when their context is canceled. This allows
Caddy, for example, to reload configs often and cancel any goroutines
that were merely waiting on the rate limiter.
Now, all Obtain, Renew, and Revoke calls accept a context that can be
cancelled.
We also eliminate the acmeMu, a mutex that permitted only a single ACME
operation at a time by the process, which was our early, naive form of
rate limiting, which should no longer be necessary.
On-demand obtain and renew do not yet use cancelable contexts, because
what defines the context of a TLS handshake is still unclear. We might
end up using a simple context with a timeout that is the maximum length
of a TLS handshake in practice, say, 1 minute.
This is a breaking change, but critical for larger deployments with very
dynamic configurations.
2019-12-17 06:36:41 +10:00
|
|
|
// Wait blocks until the event is allowed to occur. It returns an
|
|
|
|
// error if the context is cancelled.
|
|
|
|
func (r *RingBufferRateLimiter) Wait(ctx context.Context) error {
|
|
|
|
select {
|
|
|
|
case <-ctx.Done():
|
|
|
|
return context.Canceled
|
2019-12-17 14:23:11 +10:00
|
|
|
case <-r.ticket:
|
Update rate limiter to allow cancellation; add context to arguments
The previous rate limiter design did not allow reservation cancellation.
This became problematic with lots of config reloads in Caddy for large
numbers of domain names. While the rate limiter had a backlog, a new
config would come in and add even more to the rate limiter, and even
more over time as background maintenance (renewals) kicked in. This
leaked goroutines and memory as a side-effect, and blocked the issuance
of certificates nigh indefinitely.
The new rate limiter does not make future reservations like the previous
one did. However, this requires us to run a single scheduler goroutine
when a rate limiter is created, which requires being cleaned up when the
rate limiter is no longer needed. As rate limits are global and should
live up to the life of the process, there is currently no actual cleanup
that takes place, but if it did happen, one would simply call Stop() on
the rate limiter to stop that goroutine.
With this new design, reservations are made only as the event actually
happens; implementing cancellation with the old design would have been
almost impossible to do correctly in a practical, elegant way. Although
the trade-off is an extra goroutine that needs cleaning up, this is
seldom (if ever?) needed in practice, and the benefit is that waiting
goroutines can be unblocked when their context is canceled. This allows
Caddy, for example, to reload configs often and cancel any goroutines
that were merely waiting on the rate limiter.
Now, all Obtain, Renew, and Revoke calls accept a context that can be
cancelled.
We also eliminate the acmeMu, a mutex that permitted only a single ACME
operation at a time by the process, which was our early, naive form of
rate limiting, which should no longer be necessary.
On-demand obtain and renew do not yet use cancelable contexts, because
what defines the context of a TLS handshake is still unclear. We might
end up using a simple context with a timeout that is the maximum length
of a TLS handshake in practice, say, 1 minute.
This is a breaking change, but critical for larger deployments with very
dynamic configurations.
2019-12-17 06:36:41 +10:00
|
|
|
return nil
|
2019-10-22 03:48:22 +10:00
|
|
|
}
|
2019-10-22 03:01:59 +10:00
|
|
|
}
|
|
|
|
|
2019-10-22 03:48:22 +10:00
|
|
|
// MaxEvents returns the maximum number of events that
|
|
|
|
// are allowed within the sliding window.
|
|
|
|
func (r *RingBufferRateLimiter) MaxEvents() int {
|
|
|
|
r.mu.Lock()
|
|
|
|
defer r.mu.Unlock()
|
|
|
|
return len(r.ring)
|
|
|
|
}
|
|
|
|
|
2019-10-22 03:01:59 +10:00
|
|
|
// SetMaxEvents changes the maximum number of events that are
|
|
|
|
// allowed in the sliding window. If the new limit is lower,
|
|
|
|
// the oldest events will be forgotten. If the new limit is
|
|
|
|
// higher, the window will suddenly have capacity for new
|
Update rate limiter to allow cancellation; add context to arguments
The previous rate limiter design did not allow reservation cancellation.
This became problematic with lots of config reloads in Caddy for large
numbers of domain names. While the rate limiter had a backlog, a new
config would come in and add even more to the rate limiter, and even
more over time as background maintenance (renewals) kicked in. This
leaked goroutines and memory as a side-effect, and blocked the issuance
of certificates nigh indefinitely.
The new rate limiter does not make future reservations like the previous
one did. However, this requires us to run a single scheduler goroutine
when a rate limiter is created, which requires being cleaned up when the
rate limiter is no longer needed. As rate limits are global and should
live up to the life of the process, there is currently no actual cleanup
that takes place, but if it did happen, one would simply call Stop() on
the rate limiter to stop that goroutine.
With this new design, reservations are made only as the event actually
happens; implementing cancellation with the old design would have been
almost impossible to do correctly in a practical, elegant way. Although
the trade-off is an extra goroutine that needs cleaning up, this is
seldom (if ever?) needed in practice, and the benefit is that waiting
goroutines can be unblocked when their context is canceled. This allows
Caddy, for example, to reload configs often and cancel any goroutines
that were merely waiting on the rate limiter.
Now, all Obtain, Renew, and Revoke calls accept a context that can be
cancelled.
We also eliminate the acmeMu, a mutex that permitted only a single ACME
operation at a time by the process, which was our early, naive form of
rate limiting, which should no longer be necessary.
On-demand obtain and renew do not yet use cancelable contexts, because
what defines the context of a TLS handshake is still unclear. We might
end up using a simple context with a timeout that is the maximum length
of a TLS handshake in practice, say, 1 minute.
This is a breaking change, but critical for larger deployments with very
dynamic configurations.
2019-12-17 06:36:41 +10:00
|
|
|
// reservations. It panics if maxEvents is 0 and window size
|
2023-05-06 12:26:50 +10:00
|
|
|
// is not zero; if setting both the events limit and the
|
|
|
|
// window size to 0, call SetWindow() first.
|
2019-10-22 03:01:59 +10:00
|
|
|
func (r *RingBufferRateLimiter) SetMaxEvents(maxEvents int) {
|
|
|
|
newRing := make([]time.Time, maxEvents)
|
|
|
|
r.mu.Lock()
|
|
|
|
defer r.mu.Unlock()
|
|
|
|
|
Update rate limiter to allow cancellation; add context to arguments
The previous rate limiter design did not allow reservation cancellation.
This became problematic with lots of config reloads in Caddy for large
numbers of domain names. While the rate limiter had a backlog, a new
config would come in and add even more to the rate limiter, and even
more over time as background maintenance (renewals) kicked in. This
leaked goroutines and memory as a side-effect, and blocked the issuance
of certificates nigh indefinitely.
The new rate limiter does not make future reservations like the previous
one did. However, this requires us to run a single scheduler goroutine
when a rate limiter is created, which requires being cleaned up when the
rate limiter is no longer needed. As rate limits are global and should
live up to the life of the process, there is currently no actual cleanup
that takes place, but if it did happen, one would simply call Stop() on
the rate limiter to stop that goroutine.
With this new design, reservations are made only as the event actually
happens; implementing cancellation with the old design would have been
almost impossible to do correctly in a practical, elegant way. Although
the trade-off is an extra goroutine that needs cleaning up, this is
seldom (if ever?) needed in practice, and the benefit is that waiting
goroutines can be unblocked when their context is canceled. This allows
Caddy, for example, to reload configs often and cancel any goroutines
that were merely waiting on the rate limiter.
Now, all Obtain, Renew, and Revoke calls accept a context that can be
cancelled.
We also eliminate the acmeMu, a mutex that permitted only a single ACME
operation at a time by the process, which was our early, naive form of
rate limiting, which should no longer be necessary.
On-demand obtain and renew do not yet use cancelable contexts, because
what defines the context of a TLS handshake is still unclear. We might
end up using a simple context with a timeout that is the maximum length
of a TLS handshake in practice, say, 1 minute.
This is a breaking change, but critical for larger deployments with very
dynamic configurations.
2019-12-17 06:36:41 +10:00
|
|
|
if r.window != 0 && maxEvents == 0 {
|
2023-05-06 12:26:50 +10:00
|
|
|
panic("SetMaxEvents: invalid configuration: maxEvents = 0 and window != 0 would not allow any events")
|
Update rate limiter to allow cancellation; add context to arguments
The previous rate limiter design did not allow reservation cancellation.
This became problematic with lots of config reloads in Caddy for large
numbers of domain names. While the rate limiter had a backlog, a new
config would come in and add even more to the rate limiter, and even
more over time as background maintenance (renewals) kicked in. This
leaked goroutines and memory as a side-effect, and blocked the issuance
of certificates nigh indefinitely.
The new rate limiter does not make future reservations like the previous
one did. However, this requires us to run a single scheduler goroutine
when a rate limiter is created, which requires being cleaned up when the
rate limiter is no longer needed. As rate limits are global and should
live up to the life of the process, there is currently no actual cleanup
that takes place, but if it did happen, one would simply call Stop() on
the rate limiter to stop that goroutine.
With this new design, reservations are made only as the event actually
happens; implementing cancellation with the old design would have been
almost impossible to do correctly in a practical, elegant way. Although
the trade-off is an extra goroutine that needs cleaning up, this is
seldom (if ever?) needed in practice, and the benefit is that waiting
goroutines can be unblocked when their context is canceled. This allows
Caddy, for example, to reload configs often and cancel any goroutines
that were merely waiting on the rate limiter.
Now, all Obtain, Renew, and Revoke calls accept a context that can be
cancelled.
We also eliminate the acmeMu, a mutex that permitted only a single ACME
operation at a time by the process, which was our early, naive form of
rate limiting, which should no longer be necessary.
On-demand obtain and renew do not yet use cancelable contexts, because
what defines the context of a TLS handshake is still unclear. We might
end up using a simple context with a timeout that is the maximum length
of a TLS handshake in practice, say, 1 minute.
This is a breaking change, but critical for larger deployments with very
dynamic configurations.
2019-12-17 06:36:41 +10:00
|
|
|
}
|
|
|
|
|
2019-10-22 03:48:22 +10:00
|
|
|
// only make the change if the new limit is different
|
|
|
|
if maxEvents == len(r.ring) {
|
|
|
|
return
|
|
|
|
}
|
|
|
|
|
2019-10-22 03:01:59 +10:00
|
|
|
// the new ring may be smaller; fast-forward to the
|
|
|
|
// oldest timestamp that will be kept in the new
|
|
|
|
// ring so the oldest ones are forgotten and the
|
|
|
|
// newest ones will be remembered
|
|
|
|
sizeDiff := len(r.ring) - maxEvents
|
|
|
|
for i := 0; i < sizeDiff; i++ {
|
|
|
|
r.advance()
|
|
|
|
}
|
|
|
|
|
2019-10-22 03:59:14 +10:00
|
|
|
if len(r.ring) > 0 {
|
|
|
|
// copy timestamps into the new ring until we
|
|
|
|
// have either copied all of them or have reached
|
|
|
|
// the capacity of the new ring
|
|
|
|
startCursor := r.cursor
|
|
|
|
for i := 0; i < len(newRing); i++ {
|
|
|
|
newRing[i] = r.ring[r.cursor]
|
|
|
|
r.advance()
|
|
|
|
if r.cursor == startCursor {
|
|
|
|
// new ring is larger than old one;
|
|
|
|
// "we've come full circle"
|
|
|
|
break
|
|
|
|
}
|
2019-10-22 03:01:59 +10:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
r.ring = newRing
|
|
|
|
r.cursor = 0
|
|
|
|
}
|
|
|
|
|
2019-10-22 03:48:22 +10:00
|
|
|
// Window returns the size of the sliding window.
|
|
|
|
func (r *RingBufferRateLimiter) Window() time.Duration {
|
|
|
|
r.mu.Lock()
|
|
|
|
defer r.mu.Unlock()
|
|
|
|
return r.window
|
|
|
|
}
|
|
|
|
|
2019-10-22 03:01:59 +10:00
|
|
|
// SetWindow changes r's sliding window duration to window.
|
|
|
|
// Goroutines that are already blocked on a call to Wait()
|
Update rate limiter to allow cancellation; add context to arguments
The previous rate limiter design did not allow reservation cancellation.
This became problematic with lots of config reloads in Caddy for large
numbers of domain names. While the rate limiter had a backlog, a new
config would come in and add even more to the rate limiter, and even
more over time as background maintenance (renewals) kicked in. This
leaked goroutines and memory as a side-effect, and blocked the issuance
of certificates nigh indefinitely.
The new rate limiter does not make future reservations like the previous
one did. However, this requires us to run a single scheduler goroutine
when a rate limiter is created, which requires being cleaned up when the
rate limiter is no longer needed. As rate limits are global and should
live up to the life of the process, there is currently no actual cleanup
that takes place, but if it did happen, one would simply call Stop() on
the rate limiter to stop that goroutine.
With this new design, reservations are made only as the event actually
happens; implementing cancellation with the old design would have been
almost impossible to do correctly in a practical, elegant way. Although
the trade-off is an extra goroutine that needs cleaning up, this is
seldom (if ever?) needed in practice, and the benefit is that waiting
goroutines can be unblocked when their context is canceled. This allows
Caddy, for example, to reload configs often and cancel any goroutines
that were merely waiting on the rate limiter.
Now, all Obtain, Renew, and Revoke calls accept a context that can be
cancelled.
We also eliminate the acmeMu, a mutex that permitted only a single ACME
operation at a time by the process, which was our early, naive form of
rate limiting, which should no longer be necessary.
On-demand obtain and renew do not yet use cancelable contexts, because
what defines the context of a TLS handshake is still unclear. We might
end up using a simple context with a timeout that is the maximum length
of a TLS handshake in practice, say, 1 minute.
This is a breaking change, but critical for larger deployments with very
dynamic configurations.
2019-12-17 06:36:41 +10:00
|
|
|
// will not be affected. It panics if window is non-zero
|
|
|
|
// but the max event limit is 0.
|
2019-10-22 03:01:59 +10:00
|
|
|
func (r *RingBufferRateLimiter) SetWindow(window time.Duration) {
|
|
|
|
r.mu.Lock()
|
Update rate limiter to allow cancellation; add context to arguments
The previous rate limiter design did not allow reservation cancellation.
This became problematic with lots of config reloads in Caddy for large
numbers of domain names. While the rate limiter had a backlog, a new
config would come in and add even more to the rate limiter, and even
more over time as background maintenance (renewals) kicked in. This
leaked goroutines and memory as a side-effect, and blocked the issuance
of certificates nigh indefinitely.
The new rate limiter does not make future reservations like the previous
one did. However, this requires us to run a single scheduler goroutine
when a rate limiter is created, which requires being cleaned up when the
rate limiter is no longer needed. As rate limits are global and should
live up to the life of the process, there is currently no actual cleanup
that takes place, but if it did happen, one would simply call Stop() on
the rate limiter to stop that goroutine.
With this new design, reservations are made only as the event actually
happens; implementing cancellation with the old design would have been
almost impossible to do correctly in a practical, elegant way. Although
the trade-off is an extra goroutine that needs cleaning up, this is
seldom (if ever?) needed in practice, and the benefit is that waiting
goroutines can be unblocked when their context is canceled. This allows
Caddy, for example, to reload configs often and cancel any goroutines
that were merely waiting on the rate limiter.
Now, all Obtain, Renew, and Revoke calls accept a context that can be
cancelled.
We also eliminate the acmeMu, a mutex that permitted only a single ACME
operation at a time by the process, which was our early, naive form of
rate limiting, which should no longer be necessary.
On-demand obtain and renew do not yet use cancelable contexts, because
what defines the context of a TLS handshake is still unclear. We might
end up using a simple context with a timeout that is the maximum length
of a TLS handshake in practice, say, 1 minute.
This is a breaking change, but critical for larger deployments with very
dynamic configurations.
2019-12-17 06:36:41 +10:00
|
|
|
defer r.mu.Unlock()
|
|
|
|
if window != 0 && len(r.ring) == 0 {
|
2023-05-06 12:26:50 +10:00
|
|
|
panic("SetWindow: invalid configuration: maxEvents = 0 and window != 0 would not allow any events")
|
Update rate limiter to allow cancellation; add context to arguments
The previous rate limiter design did not allow reservation cancellation.
This became problematic with lots of config reloads in Caddy for large
numbers of domain names. While the rate limiter had a backlog, a new
config would come in and add even more to the rate limiter, and even
more over time as background maintenance (renewals) kicked in. This
leaked goroutines and memory as a side-effect, and blocked the issuance
of certificates nigh indefinitely.
The new rate limiter does not make future reservations like the previous
one did. However, this requires us to run a single scheduler goroutine
when a rate limiter is created, which requires being cleaned up when the
rate limiter is no longer needed. As rate limits are global and should
live up to the life of the process, there is currently no actual cleanup
that takes place, but if it did happen, one would simply call Stop() on
the rate limiter to stop that goroutine.
With this new design, reservations are made only as the event actually
happens; implementing cancellation with the old design would have been
almost impossible to do correctly in a practical, elegant way. Although
the trade-off is an extra goroutine that needs cleaning up, this is
seldom (if ever?) needed in practice, and the benefit is that waiting
goroutines can be unblocked when their context is canceled. This allows
Caddy, for example, to reload configs often and cancel any goroutines
that were merely waiting on the rate limiter.
Now, all Obtain, Renew, and Revoke calls accept a context that can be
cancelled.
We also eliminate the acmeMu, a mutex that permitted only a single ACME
operation at a time by the process, which was our early, naive form of
rate limiting, which should no longer be necessary.
On-demand obtain and renew do not yet use cancelable contexts, because
what defines the context of a TLS handshake is still unclear. We might
end up using a simple context with a timeout that is the maximum length
of a TLS handshake in practice, say, 1 minute.
This is a breaking change, but critical for larger deployments with very
dynamic configurations.
2019-12-17 06:36:41 +10:00
|
|
|
}
|
2019-10-22 03:01:59 +10:00
|
|
|
r.window = window
|
|
|
|
}
|
|
|
|
|
Update rate limiter to allow cancellation; add context to arguments
The previous rate limiter design did not allow reservation cancellation.
This became problematic with lots of config reloads in Caddy for large
numbers of domain names. While the rate limiter had a backlog, a new
config would come in and add even more to the rate limiter, and even
more over time as background maintenance (renewals) kicked in. This
leaked goroutines and memory as a side-effect, and blocked the issuance
of certificates nigh indefinitely.
The new rate limiter does not make future reservations like the previous
one did. However, this requires us to run a single scheduler goroutine
when a rate limiter is created, which requires being cleaned up when the
rate limiter is no longer needed. As rate limits are global and should
live up to the life of the process, there is currently no actual cleanup
that takes place, but if it did happen, one would simply call Stop() on
the rate limiter to stop that goroutine.
With this new design, reservations are made only as the event actually
happens; implementing cancellation with the old design would have been
almost impossible to do correctly in a practical, elegant way. Although
the trade-off is an extra goroutine that needs cleaning up, this is
seldom (if ever?) needed in practice, and the benefit is that waiting
goroutines can be unblocked when their context is canceled. This allows
Caddy, for example, to reload configs often and cancel any goroutines
that were merely waiting on the rate limiter.
Now, all Obtain, Renew, and Revoke calls accept a context that can be
cancelled.
We also eliminate the acmeMu, a mutex that permitted only a single ACME
operation at a time by the process, which was our early, naive form of
rate limiting, which should no longer be necessary.
On-demand obtain and renew do not yet use cancelable contexts, because
what defines the context of a TLS handshake is still unclear. We might
end up using a simple context with a timeout that is the maximum length
of a TLS handshake in practice, say, 1 minute.
This is a breaking change, but critical for larger deployments with very
dynamic configurations.
2019-12-17 06:36:41 +10:00
|
|
|
// permit allows one event through the throttle. This method
|
|
|
|
// blocks until a goroutine is waiting for a ticket or until
|
|
|
|
// the rate limiter is stopped.
|
|
|
|
func (r *RingBufferRateLimiter) permit() {
|
2019-12-17 14:23:11 +10:00
|
|
|
for {
|
|
|
|
select {
|
|
|
|
case r.started <- struct{}{}:
|
|
|
|
// notify parent goroutine that we've started; should
|
|
|
|
// only happen once, before constructor returns
|
|
|
|
continue
|
|
|
|
case <-r.stopped:
|
|
|
|
return
|
|
|
|
case r.ticket <- struct{}{}:
|
|
|
|
r.mu.Lock()
|
|
|
|
defer r.mu.Unlock()
|
|
|
|
if len(r.ring) > 0 {
|
|
|
|
r.ring[r.cursor] = time.Now()
|
|
|
|
r.advance()
|
|
|
|
}
|
|
|
|
return
|
|
|
|
}
|
Update rate limiter to allow cancellation; add context to arguments
The previous rate limiter design did not allow reservation cancellation.
This became problematic with lots of config reloads in Caddy for large
numbers of domain names. While the rate limiter had a backlog, a new
config would come in and add even more to the rate limiter, and even
more over time as background maintenance (renewals) kicked in. This
leaked goroutines and memory as a side-effect, and blocked the issuance
of certificates nigh indefinitely.
The new rate limiter does not make future reservations like the previous
one did. However, this requires us to run a single scheduler goroutine
when a rate limiter is created, which requires being cleaned up when the
rate limiter is no longer needed. As rate limits are global and should
live up to the life of the process, there is currently no actual cleanup
that takes place, but if it did happen, one would simply call Stop() on
the rate limiter to stop that goroutine.
With this new design, reservations are made only as the event actually
happens; implementing cancellation with the old design would have been
almost impossible to do correctly in a practical, elegant way. Although
the trade-off is an extra goroutine that needs cleaning up, this is
seldom (if ever?) needed in practice, and the benefit is that waiting
goroutines can be unblocked when their context is canceled. This allows
Caddy, for example, to reload configs often and cancel any goroutines
that were merely waiting on the rate limiter.
Now, all Obtain, Renew, and Revoke calls accept a context that can be
cancelled.
We also eliminate the acmeMu, a mutex that permitted only a single ACME
operation at a time by the process, which was our early, naive form of
rate limiting, which should no longer be necessary.
On-demand obtain and renew do not yet use cancelable contexts, because
what defines the context of a TLS handshake is still unclear. We might
end up using a simple context with a timeout that is the maximum length
of a TLS handshake in practice, say, 1 minute.
This is a breaking change, but critical for larger deployments with very
dynamic configurations.
2019-12-17 06:36:41 +10:00
|
|
|
}
|
2019-10-22 03:01:59 +10:00
|
|
|
}
|
|
|
|
|
|
|
|
// advance moves the cursor to the next position.
|
|
|
|
// It is NOT safe for concurrent use, so it must
|
|
|
|
// be called inside a lock on r.mu.
|
|
|
|
func (r *RingBufferRateLimiter) advance() {
|
|
|
|
r.cursor++
|
|
|
|
if r.cursor >= len(r.ring) {
|
|
|
|
r.cursor = 0
|
|
|
|
}
|
|
|
|
}
|