[pacman-dev] Provides generation for files (a.k.a. rpm fileattrs for makepkg)

Message ID 20200308041216.6815-1-uhhadd@gmail.com
State Rejected, archived
Headers show
Series [pacman-dev] Provides generation for files (a.k.a. rpm fileattrs for makepkg) | expand

Commit Message

Carson Black March 8, 2020, 4:12 a.m. UTC
Whenever I'm working in a relatively fresh Arch environment, my work
flow looks like this:

- cmake .. -DFLAGS=blahblahblah
- Wait a few seconds
- Package providing KF5whatever not found
- Try and deduce Arch package name from CMake package name

Often, trying to figure out the relation between the build system package names
and the Arch package names of missing dependencies ends up being the longest
part of whatever task I was working on, which isn't very efficient.

Compare this to me working in an RPM distro:

- cmake .. -DFLAGS=blahblahblah
- Wait a few seconds
- Package providing KF5whatever not found
- dnf install cmake(KF5whatever)

The latter workflow is a lot more efficient, and is exactly what this
commit introduces to packages generated by makepkg.

Every file is iterated over at the end of the build process to generate
additional provides without the packager needing to manually specify
them.

The code is architected in a manner designed to make it trivial to add
new provider autogenerators.

So far, there are autogenerated providers for:
- pkgconfig(package)
- cmake(package)
- desktop files
  * app(desktopfilename)
  * can-open-mimetype(mimetype)
  * can-open-file-extension(filetype)

While these automatically generated provides can be used for packaging,
this is intended more for interactive usage rather than an attempt to
change how Arch packaging works.

Signed-off-by: Carson Black <uhhadd@gmail.com>
---
 scripts/makepkg.sh.in | 108 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 108 insertions(+)

Comments

Eli Schwartz March 8, 2020, 6:12 a.m. UTC | #1
On 3/7/20 11:12 PM, Carson Black wrote:
> Whenever I'm working in a relatively fresh Arch environment, my work
> flow looks like this:
> 
> - cmake .. -DFLAGS=blahblahblah
> - Wait a few seconds
> - Package providing KF5whatever not found
> - Try and deduce Arch package name from CMake package name

$ pacman -S pkgfile; pkgfile -u; pkgfile -v KF5whatever.cmake
extra/kwhatever 1.0-1     /usr/lib/cmake/KF5Whatever/KF5Whatev
erConfig.cmake

Alternatively, pacman -Fy; pacman -F provides similar functionality to
pkgfile, these days.

> - dnf install cmake(KF5whatever)
> 
> The latter workflow is a lot more efficient, and is exactly what this
> commit introduces to packages generated by makepkg.
> 
> Every file is iterated over at the end of the build process to generate
> additional provides without the packager needing to manually specify
> them.
> 
> The code is architected in a manner designed to make it trivial to add
> new provider autogenerators.
> 
> So far, there are autogenerated providers for:
> - pkgconfig(package)
> - cmake(package)
> - desktop files
>   * app(desktopfilename)
>   * can-open-mimetype(mimetype)
>   * can-open-file-extension(filetype)
> 
> While these automatically generated provides can be used for packaging,
> this is intended more for interactive usage rather than an attempt to
> change how Arch packaging works.

I consider this rpm functionality to be an anti-feature, there are far
too many ways to depend on the exact same thing and none of it is
opt-in. Also, given there are very simple, intuitive tools like pkgfile
or pacman -F, I don't see why such complexity is needed.

> Signed-off-by: Carson Black <uhhadd@gmail.com>
> ---
>  scripts/makepkg.sh.in | 108 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 108 insertions(+)
> 
> diff --git a/scripts/makepkg.sh.in b/scripts/makepkg.sh.in
> index ca3e7459..10703a91 100644
> --- a/scripts/makepkg.sh.in
> +++ b/scripts/makepkg.sh.in
> @@ -520,6 +520,111 @@ find_libdepends() {
>  	(( ${#libdepends[@]} )) && printf '%s\n' "${libdepends[@]}"
>  }
>  
> +pkgconfig_fileattr() {
> +	case $1 in
> +		--generate-provides)
> +			case $2 in
> +				*.pc)
> +					directory="`dirname ${2}`"
> +					export PKG_CONFIG_PATH="$DIR:$DIR/../../share/pkgconfig"
> +					pkg-config --print-provides "$2" 2>/dev/null | while read name _ _; do
> +						[ -n "$name" ] || continue
> +						echo "pkgconfig($name)"
> +					done
> +					;;
> +			esac
> +			;;
> +		--generate-depends)
> +			;;
> +	esac
> +}
> +
> +cmake_fileattr() {
> +	case $1 in
> +		--generate-provides)
> +			case $2 in
> +				*.cmake)
> +					base="$(basename $2)"
> +					case $2 in
> +						*Config.cmake)
> +							echo "cmake(${base%Config.cmake})"
> +							;;
> +						*-config.cmake)
> +							echo "cmake(${base%-config.cmake})"
> +							;;
> +					esac
> +					;;
> +			esac
> +			;;
> +		--generate-depends)
> +			;;
> +	esac
> +}
> +
> +desktop_file_attr() {
> +	case $1 in
> +		--generate-provides)
> +			case "$2" in
> +				*.desktop)
> +					grep -q "Type=Application" "$2" || return
> +					grep -q "Exec=" "$2" || return
> +					base="$(basename $2)"
> +					echo "app(${base%.desktop})"
> +
> +					IFS=';'
> +					mimetypes=`grep MimeType= $2 | cut -d '=' -f 2`
> +					for mimetype in $mimetypes; do
> +						echo "can-open-mimetype($mimetype)"
> +
> +						IFS=' '

... did you really just overwrite IFS (twice), then not restore it?

> +						filetypes=`grep -v ^\# /etc/mime.types | grep $mimetype | cut -d ' ' | xargs`
> +						for filetype in $filetypes; do

That is a lot of grep and sed, using a deprecated shell construct (you
should never use ``, always use $() instead) and if you're going to
build a list of multiple things then iterate over it, use a real
datatype like arrays, then iterate over "${filetypes[@]}" -- and mind
your quoting.

This would mean you no longer overwrote IFS -- using arrays means you'd
have to properly split them upfront (mapfile has options to specify
custom delimiters, so that works too).

> +							echo "can-open-file-extension($filetype)"
> +						done
> +					done
> +					;;
> +			esac
> +			;;
> +		--generate-depends)
> +			;;
> +	esac
> +}
> +
> +check_pkg_dir() {
> +	if [[ ! -d $pkgdir ]]; then
> +		error "$(gettext "Missing %s directory.")" "\$pkgdir/"
> +		plain "$(gettext "Aborting...")"
> +		exit $E_MISSING_PKGDIR
> +	fi
> +}

I don't even understand what the purpose of this function is supposed to
be, since $pkgdir is a directory created by makepkg and cannot
(normally) fail to exist.

In the event that a user-created PKGBUILD deletes it, create_package()
performs the exact same check after all user-provided functions are run,
after which it cd's into that directory, and then, from within that
directory, executes the write_pkginfo function which you want to extend.

> +find_fileattrs_provides() {
> +	check_pkg_dir
> +
> +	for file in `find $pkgdir`; do

There is a lot of existing, example code in makepkg which operates on
all files in the pkgdir. It is completely unacceptable to do
word-splitting like this, because filenames can and do contain
whitespace. There are 143 existing packages in the official repositories
that will die on this, including core/linux-firmware and... cmake itself
(in /usr/share/cmake-3.16/Help/generator/).

The common bash programming methodology for iterating over filenames
looks like this:

while IFS= read -rd '' filename; do
    ...
done < <(find "$pkgdir" -type f -print0)


Note also that "$pkgdir" itself must be quoted.

> +		for function in $(declare -F); do

... no. Just no.

Under no circumstances will we be iterating over every shell function
*ever* and checking if they exist. You know exactly which ones exist,
run them manually. This is far, far too leaky as-is.

Another method would be as demonstrated in scripts/libmakepkg/tidy.sh by
registering functions in the arrays tidy_remove and tidy_modify. This is
the pluginized approach to extending something.

> +			case $function in
> +				*_fileattr)
> +					$function --generate-provides $file
> +					;;
> +			esac
> +		done
> +	done
> +}
> +
> +find_fileattrs_depends() {
> +	check_pkg_dir
> +
> +	for file in `find $pkgdir`; do
> +		for function in $(declare -F); do
> +			case $function in
> +				*_fileattr)
> +					$function --generate-depends $file
> +					;;
> +			esac
> +		done
> +	done
> +}

It seems like your proposed code has this being dead functionality?

>  find_libprovides() {
>  	local p libprovides missing
> @@ -613,6 +718,9 @@ write_pkginfo() {
>  	mapfile -t provides < <(find_libprovides)
>  	mapfile -t depends < <(find_libdepends)
>  
> +	mapfile -t provides < <(find_fileattrs_provides)
> +	mapfile -t depends < <(find_fileattrs_depends)

This seems like you are deleting all existing provides and depends, and
then recreating them with only the autogenerated fileattrs ones?

If you look at the existing find_lib* functions, they actually rebuild
the existing provides/depends arrays, taking explicit care to return the
existing ones, but in-place modifying any "libfoo.so" style dependencies
with additional versioning metadata.

Perhaps you meant to append, not overwrite? You could start adding at
the end like this: mapfile -t -O ${#provides[@]} provides

>  	write_kv_pair "license"     "${license[@]}"
>  	write_kv_pair "replaces"    "${replaces[@]}"
>  	write_kv_pair "group"       "${groups[@]}"
>
Carson Black March 8, 2020, 8:04 p.m. UTC | #2
Your feedback on the code should have been addressed.

> I consider this rpm functionality to be an anti-feature, there are far
> too many ways to depend on the exact same thing and none of it is
> opt-in. Also, given there are very simple, intuitive tools like pkgfile
> or pacman -F, I don't see why such complexity is needed.

I really don't consider this "too many ways to depend on the exact same
thing." The entire point is that you're describing what capabilities you 
want from a package instead of naming the package explicitly.

And yes, I'm aware of the ability to query packages providing files,
but looking through filenames is not as efficient a workflow as throwing a
provides at pacman and letting it find a package that provides it.
Eli Schwartz March 8, 2020, 9:46 p.m. UTC | #3
On 3/8/20 4:04 PM, Carson Black wrote:
> Your feedback on the code should have been addressed.
> 
>> I consider this rpm functionality to be an anti-feature, there are far
>> too many ways to depend on the exact same thing and none of it is
>> opt-in. Also, given there are very simple, intuitive tools like pkgfile
>> or pacman -F, I don't see why such complexity is needed.
> 
> I really don't consider this "too many ways to depend on the exact same
> thing." The entire point is that you're describing what capabilities you 
> want from a package instead of naming the package explicitly.

But that is in fact the exact same thing. You're describing "why" you
want the package, not "what" you want. This is the problem that I have,
because there's an infinite number of "why"s possible.

An rpm can do something like

depends=("/usr/bin/sh")

which seems pretty silly to me! Where does it end?

> And yes, I'm aware of the ability to query packages providing files,
> but looking through filenames is not as efficient a workflow as throwing a
> provides at pacman and letting it find a package that provides it.

pkgfile -q KF5whatever.cmake | pacman -S -

Seems more or less "efficient" to me, but I'm not even sure why that's a
goal. package dependencies are created once, so the focus should be less
on how *fast* you can pacman -S 'the-dependency' and more on the
technical merits of conveying the information that way. And I'm
unconvinced that automatically generating "reasons" to install a package
is the right way to go here.

Patch

diff --git a/scripts/makepkg.sh.in b/scripts/makepkg.sh.in
index ca3e7459..10703a91 100644
--- a/scripts/makepkg.sh.in
+++ b/scripts/makepkg.sh.in
@@ -520,6 +520,111 @@  find_libdepends() {
 	(( ${#libdepends[@]} )) && printf '%s\n' "${libdepends[@]}"
 }
 
+pkgconfig_fileattr() {
+	case $1 in
+		--generate-provides)
+			case $2 in
+				*.pc)
+					directory="`dirname ${2}`"
+					export PKG_CONFIG_PATH="$DIR:$DIR/../../share/pkgconfig"
+					pkg-config --print-provides "$2" 2>/dev/null | while read name _ _; do
+						[ -n "$name" ] || continue
+						echo "pkgconfig($name)"
+					done
+					;;
+			esac
+			;;
+		--generate-depends)
+			;;
+	esac
+}
+
+cmake_fileattr() {
+	case $1 in
+		--generate-provides)
+			case $2 in
+				*.cmake)
+					base="$(basename $2)"
+					case $2 in
+						*Config.cmake)
+							echo "cmake(${base%Config.cmake})"
+							;;
+						*-config.cmake)
+							echo "cmake(${base%-config.cmake})"
+							;;
+					esac
+					;;
+			esac
+			;;
+		--generate-depends)
+			;;
+	esac
+}
+
+desktop_file_attr() {
+	case $1 in
+		--generate-provides)
+			case "$2" in
+				*.desktop)
+					grep -q "Type=Application" "$2" || return
+					grep -q "Exec=" "$2" || return
+					base="$(basename $2)"
+					echo "app(${base%.desktop})"
+
+					IFS=';'
+					mimetypes=`grep MimeType= $2 | cut -d '=' -f 2`
+					for mimetype in $mimetypes; do
+						echo "can-open-mimetype($mimetype)"
+
+						IFS=' '
+						filetypes=`grep -v ^\# /etc/mime.types | grep $mimetype | cut -d ' ' | xargs`
+						for filetype in $filetypes; do
+							echo "can-open-file-extension($filetype)"
+						done
+					done
+					;;
+			esac
+			;;
+		--generate-depends)
+			;;
+	esac
+}
+
+check_pkg_dir() {
+	if [[ ! -d $pkgdir ]]; then
+		error "$(gettext "Missing %s directory.")" "\$pkgdir/"
+		plain "$(gettext "Aborting...")"
+		exit $E_MISSING_PKGDIR
+	fi
+}
+
+find_fileattrs_provides() {
+	check_pkg_dir
+
+	for file in `find $pkgdir`; do
+		for function in $(declare -F); do
+			case $function in
+				*_fileattr)
+					$function --generate-provides $file
+					;;
+			esac
+		done
+	done
+}
+
+find_fileattrs_depends() {
+	check_pkg_dir
+
+	for file in `find $pkgdir`; do
+		for function in $(declare -F); do
+			case $function in
+				*_fileattr)
+					$function --generate-depends $file
+					;;
+			esac
+		done
+	done
+}
 
 find_libprovides() {
 	local p libprovides missing
@@ -613,6 +718,9 @@  write_pkginfo() {
 	mapfile -t provides < <(find_libprovides)
 	mapfile -t depends < <(find_libdepends)
 
+	mapfile -t provides < <(find_fileattrs_provides)
+	mapfile -t depends < <(find_fileattrs_depends)
+
 	write_kv_pair "license"     "${license[@]}"
 	write_kv_pair "replaces"    "${replaces[@]}"
 	write_kv_pair "group"       "${groups[@]}"