[pacman-dev,1/3] libmakepkg: use extraction commands instead of file to find archive type

Message ID 20191126212957.29031-1-e5ten.arch@gmail.com
State Rejected, archived
Headers show
Series [pacman-dev,1/3] libmakepkg: use extraction commands instead of file to find archive type | expand

Commit Message

Ethan Sommer Nov. 26, 2019, 9:29 p.m. UTC
Previously, to determine which command we should use to extract an
archive, we would run file and match the output against our list of
possible extraction commands

Instead, run the archive through each extraction command's -t (--test)
flag, if this succeeds then we know that the command is able to extract
the file and is the one to use

Signed-off-by: Ethan Sommer <e5ten.arch@gmail.com>
---
 scripts/libmakepkg/source/file.sh.in | 39 ++++++++--------------------
 1 file changed, 11 insertions(+), 28 deletions(-)

Comments

Eli Schwartz Nov. 27, 2019, 4:55 p.m. UTC | #1
On 11/26/19 4:29 PM, Ethan Sommer wrote:
> Previously, to determine which command we should use to extract an
> archive, we would run file and match the output against our list of
> possible extraction commands
> 
> Instead, run the archive through each extraction command's -t (--test)
> flag, if this succeeds then we know that the command is able to extract
> the file and is the one to use

Missing rationale why we care.

>  scripts/libmakepkg/source/file.sh.in | 39 ++++++++--------------------
>  1 file changed, 11 insertions(+), 28 deletions(-)
> 
> diff --git a/scripts/libmakepkg/source/file.sh.in b/scripts/libmakepkg/source/file.sh.in
> index 7297a1c6..faace79b 100644
> --- a/scripts/libmakepkg/source/file.sh.in
> +++ b/scripts/libmakepkg/source/file.sh.in
> @@ -96,35 +96,18 @@ extract_file() {
>  	fi
>  
>  	# do not rely on extension for file type
> -	local file_type=$(@FILECMD@ -bizL -- "$file")
> -	local ext=${file##*.}
>  	local cmd=''
> -	case "$file_type" in
> -		*application/x-tar*|*application/zip*|*application/x-zip*|*application/x-cpio*)
> -			cmd="bsdtar" ;;
> -		*application/x-gzip*|*application/gzip*)
> -			case "$ext" in
> -				gz|z|Z) cmd="gzip" ;;
> -				*) return;;
> -			esac ;;
> -		*application/x-bzip*)
> -			case "$ext" in
> -				bz2|bz) cmd="bzip2" ;;
> -				*) return;;
> -			esac ;;
> -		*application/x-xz*)
> -			case "$ext" in
> -				xz) cmd="xz" ;;
> -				*) return;;
> -			esac ;;
> -		*)
> -			# See if bsdtar can recognize the file
> -			if bsdtar -tf "$file" -q '*' &>/dev/null; then
> -				cmd="bsdtar"
> -			else
> -				return 0
> -			fi ;;
> -	esac
> +	if bsdtar -tf "$file" -q '*'; then
> +		cmd='bsdtar'
> +	elif gzip -t "$file"; then
> +		cmd='gzip'
> +	elif bzip2 -t "$file"; then
> +		cmd='bzip2'
> +	elif xz -t "$file"; then
> +		cmd='xz'
> +	else
> +		return 0
> +	fi &>/dev/null

I had to look at this three times before I realized you were redirecting
both stdout and stderr of the entire if/elif block.

Because these commands check to see if the file is "uncorrupted", not to
see if they are "formatted in this compression format", I imagine this
would falsely fail files if they have some class of recoverable error,
but I'm not sure if we care.

What we do care about, I think, is that you're silencing stderr due to
the fact that it will noisily print diagnostic messages when the file is
not compressed or is in a different compression format, and bsdtar will
print listed filenames to stdout. So now, if these commands fail for
*any* reason, we treat this as a successful analysis that the file is in
a different format. One reason for it failing might be that xz is not
installed... this is hardly a successful analysis, and printing an
"error: command not found" message during the decompression, then dying
with "Failed to extract %s" is a desirable feature.

>  	local ret=0
>  	msg2 "$(gettext "Extracting %s with %s")" "$file" "$cmd"
>
Allan McRae Dec. 2, 2019, 4:08 a.m. UTC | #2
On 27/11/19 7:29 am, Ethan Sommer wrote:
> Previously, to determine which command we should use to extract an
> archive, we would run file and match the output against our list of
> possible extraction commands
> 
> Instead, run the archive through each extraction command's -t (--test)
> flag, if this succeeds then we know that the command is able to extract
> the file and is the one to use
> 


Apart from the issues Eli brought up, there is a large performance
regression on large source files.


$ time file -bizL libreoffice-6.3.4.1.tar.xz
application/x-tar; charset=binary compressed-encoding=application/x-xz;
charset=binary

real	0m0.034s
user	0m0.031s
sys	0m0.004s


$ time xz -t libreoffice-6.3.4.1.tar.xz

real	0m11.970s
user	0m11.898s
sys	0m0.057s


Allan

Patch

diff --git a/scripts/libmakepkg/source/file.sh.in b/scripts/libmakepkg/source/file.sh.in
index 7297a1c6..faace79b 100644
--- a/scripts/libmakepkg/source/file.sh.in
+++ b/scripts/libmakepkg/source/file.sh.in
@@ -96,35 +96,18 @@  extract_file() {
 	fi
 
 	# do not rely on extension for file type
-	local file_type=$(@FILECMD@ -bizL -- "$file")
-	local ext=${file##*.}
 	local cmd=''
-	case "$file_type" in
-		*application/x-tar*|*application/zip*|*application/x-zip*|*application/x-cpio*)
-			cmd="bsdtar" ;;
-		*application/x-gzip*|*application/gzip*)
-			case "$ext" in
-				gz|z|Z) cmd="gzip" ;;
-				*) return;;
-			esac ;;
-		*application/x-bzip*)
-			case "$ext" in
-				bz2|bz) cmd="bzip2" ;;
-				*) return;;
-			esac ;;
-		*application/x-xz*)
-			case "$ext" in
-				xz) cmd="xz" ;;
-				*) return;;
-			esac ;;
-		*)
-			# See if bsdtar can recognize the file
-			if bsdtar -tf "$file" -q '*' &>/dev/null; then
-				cmd="bsdtar"
-			else
-				return 0
-			fi ;;
-	esac
+	if bsdtar -tf "$file" -q '*'; then
+		cmd='bsdtar'
+	elif gzip -t "$file"; then
+		cmd='gzip'
+	elif bzip2 -t "$file"; then
+		cmd='bzip2'
+	elif xz -t "$file"; then
+		cmd='xz'
+	else
+		return 0
+	fi &>/dev/null
 
 	local ret=0
 	msg2 "$(gettext "Extracting %s with %s")" "$file" "$cmd"