[1/1] support http header 'Cache-Control: no-cache' for soft failure

Message ID 20220919125210.63357-1-list@eworm.de
State New
Headers show
Series [1/1] support http header 'Cache-Control: no-cache' for soft failure | expand

Commit Message

Christian Hesse Sept. 19, 2022, 12:52 p.m. UTC
From: Christian Hesse <mail@eworm.de>

By setting the HTTP header 'Cache-Control: no-cache' when returning with
the status code 404 (not found) the server can indicate that this is a
soft failure. No error message is shown, and server's error count is
not increased.

This can be used by servers that are not expected to be complete, for
example when serving a local cache [0]. In nginx this can be achived by
adding a single directive in location block:

    add_header Cache-Control "no-cache";

Also this is a perfect match for pacredir [1].

[0] https://wiki.archlinux.org/title/Pacman/Tips_and_tricks#Network_shared_pacman_cache
[1] https://git.eworm.de/cgit/pacredir/about/

Signed-off-by: Christian Hesse <mail@eworm.de>
---
 lib/libalpm/dload.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Christian Hesse Sept. 19, 2022, 12:54 p.m. UTC | #1
Christian Hesse <list@eworm.de> on Mon, 2022/09/19 14:52:
> By setting the HTTP header 'Cache-Control: no-cache' when returning with
> the status code 404 (not found) the server can indicate that this is a
> soft failure. No error message is shown, and server's error count is
> not increased.
> 
> This can be used by servers that are not expected to be complete, for
> example when serving a local cache [0]. In nginx this can be achived by
> adding a single directive in location block:
> 
>     add_header Cache-Control "no-cache";

Yes, I know.
We had a very similar patch in May 2021 [2]. That one received a negative
review, possibly because back then I wanted to introduce a custom and
non-standard http header 'X-Pacman-Expected-Failure'.

In January 2022 I send a patch to implement CacheServer [3], that one was
neither reviewed nor merged. At least not yet, and it is a lot more complex.

As this is still a problem in general (and not just a "kind of esoteric
setup", see the number of caching solutions listed on pacman's wiki page) [0]
I decided to make another try.

In contrast to the original patch I decided to go with a standard http
header. If the server sends 'Cache-Control: no-cache' in its response header
pacman will honor that as a soft failure and will not increase server's error
limit. That said... I am open to other suggestions for a header that matches
even better. Could not find one myself.

I am running this code myself for month now (with a custom built pacman
package) without issue. It does exactly what I need, and will help others as
well.

> Also this is a perfect match for pacredir [1].

... which will receive a corresponding commit as soon as this is merged.

BTW, did you know that pacredir is in our official repositories since 2017?
It gives an advice to install pacman (the custom build I mentioned above)
from an unofficial repository since version 6.0 entered the repositories.
This could finally be dropped.

> [0] https://wiki.archlinux.org/title/Pacman/Tips_and_tricks#Network_shared_pacman_cache
> [1] https://git.eworm.de/cgit/pacredir/about/

[2] https://lists.archlinux.org/archives/list/pacman-dev@lists.archlinux.org/message/HHPPC26E36M766IXDUZGYBW4R2CDL5QO/
[3] https://lists.archlinux.org/archives/list/pacman-dev@lists.archlinux.org/message/A5IWFRTNYYYAXGRDWZEBOKQSIFOUJUHR/
Allan McRae Sept. 26, 2022, 10:44 a.m. UTC | #2
On 19/9/22 22:52, Christian Hesse wrote:
> From: Christian Hesse <mail@eworm.de>
> 
> By setting the HTTP header 'Cache-Control: no-cache' when returning with
> the status code 404 (not found) the server can indicate that this is a
> soft failure. No error message is shown, and server's error count is
> not increased.
> 
> This can be used by servers that are not expected to be complete, for
> example when serving a local cache [0]. In nginx this can be achived by
> adding a single directive in location block:
> 
>      add_header Cache-Control "no-cache";

I am not accepting this patch.
1) "no-cache" to indicate this a cache of packages...
2) using a directive that has a completely different meaning.

Allan
Christian Hesse Sept. 26, 2022, 11:54 a.m. UTC | #3
Allan McRae <allan@archlinux.org> on Mon, 2022/09/26 20:44:
> On 19/9/22 22:52, Christian Hesse wrote:
> > From: Christian Hesse <mail@eworm.de>
> > 
> > By setting the HTTP header 'Cache-Control: no-cache' when returning with
> > the status code 404 (not found) the server can indicate that this is a
> > soft failure. No error message is shown, and server's error count is
> > not increased.
> > 
> > This can be used by servers that are not expected to be complete, for
> > example when serving a local cache [0]. In nginx this can be achived by
> > adding a single directive in location block:
> > 
> >      add_header Cache-Control "no-cache";  
> 
> I am not accepting this patch.
> 1) "no-cache" to indicate this a cache of packages...

You missed the point. This does not tell not to cache packages.
It does tell not to cache an error. ;)

Fun fact: Searching for the keywords yields a lot of matches for some well
known cloud providers and their proxy software. Looks like the same is used
there not to cache errors.

> 2) using a directive that has a completely different meaning.

So still unanswered... Could there be a directive that we can agree on?

Anyway - still expected. :-p
So hoping for the CacheServer thing now.
Allan McRae Sept. 26, 2022, 12:35 p.m. UTC | #4
On 26/9/22 21:54, Christian Hesse wrote:
> Allan McRae <allan@archlinux.org> on Mon, 2022/09/26 20:44:
>> On 19/9/22 22:52, Christian Hesse wrote:
>>> From: Christian Hesse <mail@eworm.de>
>>>
>>> By setting the HTTP header 'Cache-Control: no-cache' when returning with
>>> the status code 404 (not found) the server can indicate that this is a
>>> soft failure. No error message is shown, and server's error count is
>>> not increased.
>>>
>>> This can be used by servers that are not expected to be complete, for
>>> example when serving a local cache [0]. In nginx this can be achived by
>>> adding a single directive in location block:
>>>
>>>       add_header Cache-Control "no-cache";
>>
>> I am not accepting this patch.
>> 1) "no-cache" to indicate this a cache of packages...
> 
> You missed the point. This does not tell not to cache packages.
> It does tell not to cache an error. ;)
> 
> Fun fact: Searching for the keywords yields a lot of matches for some well
> known cloud providers and their proxy software. Looks like the same is used
> there not to cache errors.

This is the sort of information that would make considering this as an 
option more likely.  Can you expand on this with some examples?

Allan
Christian Hesse Sept. 26, 2022, 2:24 p.m. UTC | #5
Allan McRae <allan@archlinux.org> on Mon, 2022/09/26 22:35:
> On 26/9/22 21:54, Christian Hesse wrote:
> > Allan McRae <allan@archlinux.org> on Mon, 2022/09/26 20:44:  
> >> On 19/9/22 22:52, Christian Hesse wrote:  
> >>> From: Christian Hesse <mail@eworm.de>
> >>>
> >>> By setting the HTTP header 'Cache-Control: no-cache' when returning with
> >>> the status code 404 (not found) the server can indicate that this is a
> >>> soft failure. No error message is shown, and server's error count is
> >>> not increased.
> >>>
> >>> This can be used by servers that are not expected to be complete, for
> >>> example when serving a local cache [0]. In nginx this can be achived by
> >>> adding a single directive in location block:
> >>>
> >>>       add_header Cache-Control "no-cache";  
> >>
> >> I am not accepting this patch.
> >> 1) "no-cache" to indicate this a cache of packages...  
> > 
> > You missed the point. This does not tell not to cache packages.
> > It does tell not to cache an error. ;)
> > 
> > Fun fact: Searching for the keywords yields a lot of matches for some well
> > known cloud providers and their proxy software. Looks like the same is
> > used there not to cache errors.  
> 
> This is the sort of information that would make considering this as an 
> option more likely.  Can you expand on this with some examples?

Oh, did not expect that to decide the thing.

Here is a selection of links on the topic...

https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/HTTPStatusCodes.html#HTTPStatusCodes-cached-errors
https://cloud.google.com/cdn/docs/using-negative-caching
https://community.cloudflare.com/t/how-to-avoid-caching-404/196262
https://community.cloudflare.com/t/cache-everything-and-dont-cache-404-at-all/108912

For Cloudflare I found the forum discussions only, not sure if the relevant
parts have been removed from documentation... There must have been more. :)

Default caching on 404 happens for something between several
seconds and minutes by default. This can be changed to different time
(CacheControl: age=...) or disabled (CacheControl: no-cache).
kpcyrd Sept. 26, 2022, 3:47 p.m. UTC | #6
On 9/19/22 14:52, Christian Hesse wrote:
> In nginx this can be achived by
> adding a single directive in location block:
> 
>      add_header Cache-Control "no-cache";

If it's added directly to the location block this would disable caching 
for all responses.
Christian Hesse Sept. 26, 2022, 7:17 p.m. UTC | #7
kpcyrd <kpcyrd@rxv.cc> on Mon, 2022/09/26 17:47:
> On 9/19/22 14:52, Christian Hesse wrote:
> > In nginx this can be achived by
> > adding a single directive in location block:
> > 
> >      add_header Cache-Control "no-cache";  
> 
> If it's added directly to the location block this would disable caching 
> for all responses.

Yes, true... Though pacman ignores this. :)

And I found another issue: By default nginx does not add headers
when return code is 404, but it can be forced with extra parameter
"always" [0].

So we should show something like this in the commit message for a complete
server block:

    server {
        server name _default;
        listen 8080;
        root /var/cache/pacman/pkg;
        error_page 404 = @no-cache;
        location @no-cache {
            add_header CacheControl "no-cache" always;
        }
    }

Allan, if you decide to accept the patch I would like to re-send with an
updated commit message.

[0] https://nginx.org/en/docs/http/ngx_http_headers_module.html#add_header

Patch

diff --git a/lib/libalpm/dload.c b/lib/libalpm/dload.c
index 4fa17b35..23034584 100644
--- a/lib/libalpm/dload.c
+++ b/lib/libalpm/dload.c
@@ -274,8 +274,10 @@  static size_t dload_parseheader_cb(void *ptr, size_t size, size_t nmemb, void *u
 {
 	size_t realsize = size * nmemb;
 	const char *fptr, *endptr = NULL;
+	const char * const cc_header = "Cache-Control:";
 	const char * const cd_header = "Content-Disposition:";
 	const char * const fn_key = "filename=";
+	const char * const nc_key = "no-cache";
 	struct dload_payload *payload = (struct dload_payload *)user;
 	long respcode;
 
@@ -302,6 +304,15 @@  static size_t dload_parseheader_cb(void *ptr, size_t size, size_t nmemb, void *u
 		}
 	}
 
+	/* By setting the HTTP header 'Cache-Control: no-cache' the server can indicate
+	   that this is a soft failure which should not be cached. No error message is
+	   shown, and server's error count is not increased. */
+	if(_alpm_raw_ncmp(cc_header, ptr, strlen(cc_header)) == 0) {
+		if(strstr(ptr, nc_key)) {
+			payload->errors_ok = 1;
+		}
+	}
+
 	curl_easy_getinfo(payload->curl, CURLINFO_RESPONSE_CODE, &respcode);
 	if(payload->respcode != respcode) {
 		payload->respcode = respcode;