[devtools] makechrootpkg: load COMPRESS* vars from makepkg.conf

Message ID 20190318095251.g2gnwxeiz4md22lt@maximbaz.com
State Rejected
Headers show
Series [devtools] makechrootpkg: load COMPRESS* vars from makepkg.conf | expand

Commit Message

Emil Velikov via arch-projects March 18, 2019, 9:52 a.m. UTC
This allows using faster multicore compression tools like pixz

Signed-off-by: Maxim Baz <archlinux@maximbaz.com>
---
 archbuild.in     |  2 +-
 makechrootpkg.in | 18 +++++++++++++-----
 2 files changed, 14 insertions(+), 6 deletions(-)

--
2.21.0

Comments

Emil Velikov via arch-projects March 18, 2019, 9:56 a.m. UTC | #1
I did see an earlier patch proposing the same, which was rejected because it might break delta packages [1].
However, someone mentioned that pacman is dropping support for delta packages altogether [2] referring to a removal commit in pacman repo [3], so I wanted to bring this up again :)

I'm creating a new thread because according to my tests the earlier patch [1] is incomplete.


1: https://lists.archlinux.org/pipermail/arch-projects/2017-May/004574.html
2: https://github.com/AladW/aurutils/issues/547
3: https://git.archlinux.org/pacman.git/commit/?id=9adb0d5b37df7ca668e23877e85431dabeea005e

--
Maxim Baz
Emil Velikov via arch-projects March 18, 2019, 9:58 a.m. UTC | #2
On March 18, 2019 10:56:34 AM GMT+01:00, Maxim Baz via arch-projects <arch-projects@archlinux.org> wrote:
>I did see an earlier patch proposing the same, which was rejected
>because it might break delta packages [1].
>However, someone mentioned that pacman is dropping support for delta
>packages altogether [2] referring to a removal commit in pacman repo
>[3], so I wanted to bring this up again :)
>
>I'm creating a new thread because according to my tests the earlier
>patch [1] is incomplete.
>
>
>1:
>https://lists.archlinux.org/pipermail/arch-projects/2017-May/004574.html
>2: https://github.com/AladW/aurutils/issues/547
>3:
>https://git.archlinux.org/pacman.git/commit/?id=9adb0d5b37df7ca668e23877e85431dabeea005e
>
>--



Messing with this in devtools used for building official repo packages kills reproducible builds when not agreed on very specific flags and tools to compress.
This is not a good idea to do and we won't kill reproducibility by going wild west.
Emil Velikov via arch-projects March 18, 2019, 10:11 a.m. UTC | #3
On Mon, Mar 18, 2019 at 10:58:57AM +0100, Levente Polyak via arch-projects wrote:
> Messing with this in devtools used for building official repo packages kills reproducible builds when not agreed on very specific flags and tools to compress.
> This is not a good idea to do and we won't kill reproducibility by going wild west.

True, sad but valid point.

I've just confirmed that pixz produces different results when run twice on the same data, so I can't even propose to just switch the default to pixz without breaking reproducible builds :(

Thanks for the quick feedback!

--
Maxim Baz
Emil Velikov via arch-projects March 18, 2019, 5:13 p.m. UTC | #4
On 3/18/19 5:56 AM, Maxim Baz via arch-projects wrote:
> I did see an earlier patch proposing the same, which was rejected because it might break delta packages [1].
> However, someone mentioned that pacman is dropping support for delta packages altogether [2] referring to a removal commit in pacman repo [3], so I wanted to bring this up again :)
> 
> I'm creating a new thread because according to my tests the earlier patch [1] is incomplete.
> 
> 
> 1: https://lists.archlinux.org/pipermail/arch-projects/2017-May/004574.html
> 2: https://github.com/AladW/aurutils/issues/547
> 3: https://git.archlinux.org/pacman.git/commit/?id=9adb0d5b37df7ca668e23877e85431dabeea005e

Well, I would ideally like to add delta packages back into pacman,
whether that is via xdelta3 or something else, and it is also worth
noting it got removed at least in large part because archlinux.org
itself has never really used it, and therefore:
- it doesn't exactly get well tested
- for our use cases we don't ultimately notice the difference

(I would like to one day add delta updates to dbscripts. :( But before
that happens, I would need to convince Allan with a tested, safe delta
implementation that someone actually understands and is willing to
maintain.)

Supporting packages in the official repos which are not byte-identical
to the default COMPRESSXZ makes it quite difficult to ever add delta
support back.

...

The fundamental problem behind delta updates remains the same, though,
which is that you cannot actually generate the same package twice using
the same data (even from a byte-identical compressed tar, which is
basically what xdelta3 did). As Levente pointed out, this completely
nukes reproducible builds.

And it is not a pixz issue either. For example xz -T $num will compress
with up to $num cores, as opposed to the default 1, but basic testing
shows that it produces the same output with 2, 3, 4 etc. cores but
different output from what you get when using 1 core. You cannot even
hardcode -T0, since (in addition to requiring packagers to use more
cores than they want to) that may still get interpreted as 1, which is
then unreproducible. You can verify this using taskset -c 1...

(This has been discussed already in #archlinux-pacman.)

Without a guarantee that a given compressor will always be reproducible
no matter how many cores are used, we cannot take advantage of
multithreaded compression on any level.

More generally, without a variation on
https://patchwork.archlinux.org/patch/692/ we cannot take advantage of
any sort of configurability. But my patch was rejected.

Specific sub-issues with unknown configurability that isn't recorded in
the .BUILDINFO:

User-configurable implementations.
Without a guarantee that any implementation of a format retains
byte-level compatibility with the standard format (in this case xz), we
cannot support this as we do not know what was being used.

User-configurable options.
These are inherently unreproducible as long as they include the
compression level.

...

For the topic of configurable implementations, there is an additional
hurdle. Since compression happens inside the chroot, even if we did
support it, it would need some way to look up the required binary and
install it in the chroot, but we have no tooling for this and in fact
makepkg does not even know how to check that the binary exists, so you
will probably just get:

  -> Compressing package...
/usr/share/makepkg/util/compress.sh: line 39: pixz: command not found
bsdtar: Write error
==> ERROR: Failed to create package file.

The solution would have to require either a known, embedded list of
"good" ones (if pixz is not reproducible when running the same command
line twice then we'd need to be able to blacklist that), or else assume
the user installed it when adding it to their makepkg.conf and using
pacman -Qo. It is probably overkill to use pkgfile and support
compressors that aren't installed on the host either...
Emil Velikov via arch-projects March 18, 2019, 7:26 p.m. UTC | #5
Thanks for the detailed reply Eli :)

Before I go further, I should mention that this thread has sparked some interest among our teammates,
and folks on IRC are researching now a possibility of switching from xz to zstd entirely.

So I think we will return to this topic soon with some updates :)

> And it is not a pixz issue either. For example xz -T $num will compress
> with up to $num cores, as opposed to the default 1, but basic testing
> shows that it produces the same output with 2, 3, 4 etc. cores but
> different output from what you get when using 1 core. You cannot even
> hardcode -T0, since (in addition to requiring packagers to use more
> cores than they want to) that may still get interpreted as 1, which is
> then unreproducible. You can verify this using taskset -c 1...
>
> (This has been discussed already in #archlinux-pacman.)

Turns out it is possible to make xz use more threads and produce reproducible results,
check out this message: https://lists.debian.org/debian-dpkg/2016/10/msg00011.html

I've confirmed (with real VMs) that -T2, -T3, -T8, all produce the same result, on both single-core and multi-core CPUs,
and -T0 produces the same result on multi-core CPU, but -T0 produces a different result on a single-core CPU.

Therefore, the simplest workaround that produces reproducible results on all types of CPU is this:

-T$([ $(nproc) -lt 2  ] && echo '2' || echo '0')

Or maybe even something like this:

-T$(( $(nproc) + 1  ))

> For the topic of configurable implementations, there is an additional
> hurdle. Since compression happens inside the chroot, even if we did
> support it, it would need some way to look up the required binary and
> install it in the chroot, but we have no tooling for this and in fact
> makepkg does not even know how to check that the binary exists, so you
> will probably just get:
>
>   -> Compressing package...
> /usr/share/makepkg/util/compress.sh: line 39: pixz: command not found
> bsdtar: Write error
> ==> ERROR: Failed to create package file.
>
> The solution would have to require either a known, embedded list of
> "good" ones (if pixz is not reproducible when running the same command
> line twice then we'd need to be able to blacklist that), or else assume
> the user installed it when adding it to their makepkg.conf and using
> pacman -Qo. It is probably overkill to use pkgfile and support
> compressors that aren't installed on the host either...

That is correct.

One final remark, pixz does produce the same result when is run twice, I made a mistake when I was testing it.
But it doesn't matter, since we'll probably settle for zstd or xz -T.


--
Maxim Baz

Patch

diff --git a/archbuild.in b/archbuild.in
index bd5706d..e3d53ec 100644
--- a/archbuild.in
+++ b/archbuild.in
@@ -39,7 +39,7 @@  while getopts 'hcr:' arg; do
 	esac
 done

-check_root SOURCE_DATE_EPOCH,SRCDEST,SRCPKGDEST,PKGDEST,LOGDEST,MAKEFLAGS,PACKAGER,GNUPGHOME
+check_root SOURCE_DATE_EPOCH,SRCDEST,SRCPKGDEST,PKGDEST,LOGDEST,MAKEFLAGS,PACKAGER,GNUPGHOME,COMPRESSGZ,COMPRESSXZ,COMPRESSLRZ,COMPRESSZ,COMPRESSBZ2,COMPRESSLZO

 # Pass all arguments after -- right to makepkg
 makechrootpkg_args+=("${@:$OPTIND}")
diff --git a/makechrootpkg.in b/makechrootpkg.in
index 4b72a36..e4b7ca3 100644
--- a/makechrootpkg.in
+++ b/makechrootpkg.in
@@ -29,9 +29,9 @@  usage() {
 	echo 'command:'
 	echo '    mkarchroot <chrootdir>/root base-devel'
 	echo ''
-	echo 'This script reads {SRC,SRCPKG,PKG,LOG}DEST, MAKEFLAGS and PACKAGER'
-	echo 'from makepkg.conf(5), if those variables are not part of the'
-	echo 'environment.'
+	echo 'This script reads {SRC,SRCPKG,PKG,LOG}DEST, MAKEFLAGS, PACKAGER'
+	echo 'and COMPRESS{{G,X,LR,}Z,BZ2,LZO} from makepkg.conf(5),'
+	echo 'if those variables are not part of the environment.'
 	echo ''
 	echo "Default makepkg args: ${default_makepkg_args[*]}"
 	echo ''
@@ -63,12 +63,18 @@  usage() {
 #  - LOGDEST
 #  - MAKEFLAGS
 #  - PACKAGER
+#  - COMPRESSGZ
+#  - COMPRESSXZ
+#  - COMPRESSLRZ
+#  - COMPRESSZ
+#  - COMPRESSBZ2
+#  - COMPRESSLZO
 load_vars() {
 	local makepkg_conf="$1" var

 	[[ -f $makepkg_conf ]] || return 1

-	for var in {SRC,SRCPKG,PKG,LOG}DEST MAKEFLAGS PACKAGER; do
+	for var in {SRC,SRCPKG,PKG,LOG}DEST MAKEFLAGS PACKAGER COMPRESS{{G,X,LR,}Z,BZ2,LZO}; do
 		[[ -z ${!var:-} ]] && eval "$(grep -a "^${var}=" "$makepkg_conf")"
 	done

@@ -185,7 +191,9 @@  prepare_chroot() {

 	sed -e '/^MAKEFLAGS=/d' -e '/^PACKAGER=/d' -i "$copydir/etc/makepkg.conf"
 	for x in BUILDDIR=/build PKGDEST=/pkgdest SRCPKGDEST=/srcpkgdest SRCDEST=/srcdest LOGDEST=/logdest \
-		"MAKEFLAGS='${MAKEFLAGS:-}'" "PACKAGER='${PACKAGER:-}'"
+		"MAKEFLAGS='${MAKEFLAGS:-}'" "PACKAGER='${PACKAGER:-}'" "COMPRESSGZ=(${COMPRESSGZ[*]:-})" \
+		"COMPRESSXZ=(${COMPRESSXZ[*]:-})" "COMPRESSLRZ=(${COMPRESSLRZ[*]:-})" "COMPRESSZ=(${COMPRESSZ[*]:-})" \
+		"COMPRESSBZ2=(${COMPRESSBZ2[*]:-})" "COMPRESSLZO=(${COMPRESSLZO[*]:-})"
 	do
 		grep -q "^$x" "$copydir/etc/makepkg.conf" && continue
 		echo "$x" >>"$copydir/etc/makepkg.conf"