[dbscripts,2/2] fixup: fix potential bsdtar stream close error by grep

Message ID 20180903115016.12171-2-anthraxx@archlinux.org
State Superseded, archived
Headers show
Series [dbscripts,1/2] readme: switch to travis-ci.com build status badge | expand

Commit Message

Emil Velikov via arch-projects Sept. 3, 2018, 11:50 a.m. UTC
From: anthraxx <anthraxx@archlinux.org>

bsdtar doesn't like it when the stream gets closed before it finishes
which may be the case when grep found its match on potentially huge
archives. Instead of suppressing the whole strerr , we just pipe
the output through cat which ensures the stream remains open for bsdtar
but we may still catch and see useful messages on stderr.
---
 db-functions | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Emil Velikov via arch-projects Sept. 9, 2018, 3:20 p.m. UTC | #1
On 9/3/18 7:50 AM, Levente Polyak via arch-projects wrote:
> From: anthraxx <anthraxx@archlinux.org>
> 
> bsdtar doesn't like it when the stream gets closed before it finishes
> which may be the case when grep found its match on potentially huge
> archives. Instead of suppressing the whole strerr , we just pipe
> the output through cat which ensures the stream remains open for bsdtar
> but we may still catch and see useful messages on stderr.

This is functionally 23c2b82c336bf19b7a29a90d19bca4423d8b8839 again, but
for more locations. I'm never going to understand why some people get
this SIGPIPE but I don't, but I guess it makes sense to do this change
-- especially as we do the same elsewhere.

(We need to buffer it somehow with some extra command, grep doesn't have
a way to only output the first result but still avoid propagating
SIGPIPE. Why does bsdtar care about this anyway...)

Accepted.
Emil Velikov via arch-projects Sept. 9, 2018, 4:20 p.m. UTC | #2
On 9/9/18 11:20 AM, Eli Schwartz wrote:
> On 9/3/18 7:50 AM, Levente Polyak via arch-projects wrote:
>> From: anthraxx <anthraxx@archlinux.org>
>>
>> bsdtar doesn't like it when the stream gets closed before it finishes
>> which may be the case when grep found its match on potentially huge
>> archives. Instead of suppressing the whole strerr , we just pipe
>> the output through cat which ensures the stream remains open for bsdtar
>> but we may still catch and see useful messages on stderr.
> 
> This is functionally 23c2b82c336bf19b7a29a90d19bca4423d8b8839 again, but
> for more locations. I'm never going to understand why some people get
> this SIGPIPE but I don't, but I guess it makes sense to do this change
> -- especially as we do the same elsewhere.
> 
> (We need to buffer it somehow with some extra command, grep doesn't have
> a way to only output the first result but still avoid propagating
> SIGPIPE. Why does bsdtar care about this anyway...)


As discussed on IRC, I can finally reproduce this, e.g.
bsdtar xOf /path/to/file | grep --binary-files=text a
(many matches for the string "a")

With grep -q or grep -m1, it errors

with cat | grep -q/-m1 it still errors due to the buffer going *into*
grep, having insufficient room :(

with grep | head -1 it only errors, if the matching buffer going *out*
of grep is too large. For our uses it should only ever match exactly
once. So this is what we should do.

(Or complain to libarchive. :p)
Emil Velikov via arch-projects Sept. 9, 2018, 4:31 p.m. UTC | #3
On 9/9/18 12:20 PM, Eli Schwartz wrote:
> On 9/9/18 11:20 AM, Eli Schwartz wrote:
>> On 9/3/18 7:50 AM, Levente Polyak via arch-projects wrote:
>>> From: anthraxx <anthraxx@archlinux.org>
>>>
>>> bsdtar doesn't like it when the stream gets closed before it finishes
>>> which may be the case when grep found its match on potentially huge
>>> archives. Instead of suppressing the whole strerr , we just pipe
>>> the output through cat which ensures the stream remains open for bsdtar
>>> but we may still catch and see useful messages on stderr.
>>
>> This is functionally 23c2b82c336bf19b7a29a90d19bca4423d8b8839 again, but
>> for more locations. I'm never going to understand why some people get
>> this SIGPIPE but I don't, but I guess it makes sense to do this change
>> -- especially as we do the same elsewhere.
>>
>> (We need to buffer it somehow with some extra command, grep doesn't have
>> a way to only output the first result but still avoid propagating
>> SIGPIPE. Why does bsdtar care about this anyway...)
> 
> 
> As discussed on IRC, I can finally reproduce this, e.g.
> bsdtar xOf /path/to/file | grep --binary-files=text a
> (many matches for the string "a")
> 
> With grep -q or grep -m1, it errors
> 
> with cat | grep -q/-m1 it still errors due to the buffer going *into*
> grep, having insufficient room :(
> 
> with grep | head -1 it only errors, if the matching buffer going *out*
> of grep is too large. For our uses it should only ever match exactly
> once. So this is what we should do.
> 
> (Or complain to libarchive. :p)

As discussed on IRC, applying modified version of the patch (with
amended commit message) which uses tail (since that should never close
early at all). Thanks.

Patch

diff --git a/db-functions b/db-functions
index 0491c22..f0a6453 100644
--- a/db-functions
+++ b/db-functions
@@ -174,7 +174,7 @@  repo_unlock () { #repo_unlock <repo-name> <arch>
 _grep_pkginfo() {
 	local _ret
 
-	_ret="$(/usr/bin/bsdtar -xOqf "$1" .PKGINFO | grep -m 1 "^${2} = ")"
+	_ret="$(/usr/bin/bsdtar -xOqf "$1" .PKGINFO | cat | grep -m 1 "^${2} = ")"
 	echo "${_ret#${2} = }"
 }
 
@@ -182,7 +182,7 @@  _grep_pkginfo() {
 _grep_buildinfo() {
 	local _ret
 
-	_ret="$(/usr/bin/bsdtar -xOqf "$1" .BUILDINFO | grep -m 1 "^${2} = ")"
+	_ret="$(/usr/bin/bsdtar -xOqf "$1" .BUILDINFO | cat | grep -m 1 "^${2} = ")"
 	echo "${_ret#${2} = }"
 }