[dbscripts,v2,2/2] Add reproducible archive of packages.

Message ID 20190108234037.20059-3-eschwartz@archlinux.org
State Superseded, archived
Headers show
Series Replacing | expand

Commit Message

Emil Velikov via arch-projects Jan. 8, 2019, 11:40 p.m. UTC
Whenever adding new package files to the pool of distributed packages,
copy the file into a longterm archive. This is the first step to merging
the functionality of archivetools, as this implements the shared pool
while also guaranteeing that all packages are archived at the time of
entry rather than once per day if they still exist.

Signed-off-by: Eli Schwartz <eschwartz@archlinux.org>
---
 config                    |  2 ++
 db-archive                | 21 +++++++++++++++++++++
 db-functions              | 39 +++++++++++++++++++++++++++++++--------
 db-update                 |  4 ++++
 test/cases/db-update.bats |  6 ++++++
 5 files changed, 64 insertions(+), 8 deletions(-)
 create mode 100755 db-archive

Comments

Emil Velikov via arch-projects Jan. 9, 2019, 1 p.m. UTC | #1
On Tue, Jan 08, 2019 at 06:40:37PM -0500, Eli Schwartz via arch-projects <arch-projects@archlinux.org> wrote:
> diff --git a/db-archive b/db-archive
> new file mode 100755
> index 00000000..5680b9de
> --- /dev/null
> +++ b/db-archive
> @@ -0,0 +1,21 @@
> +#!/bin/bash
> +
> +. "$(dirname "$(readlink -e "$0")")/config"

This uses $0 (see below).

> +
> +if (( $# != 1 )); then
> +	echo "usage: %s <pkgfile>" "${0##*/}"
> +	exit 1
> +fi
> +
> +if [[ -n ${ARCHIVEUSER} ]]; then
> +	exec sudo -u "${ARCHIVEUSER}" bash "${BASH_SOURCE[0]}" "${@}"

This uses $BASH_SOURCE instead of $0 as used above. Is this intentional,
if so why? I'd argue that this should also use $0, but maybe I'm missing
something?

> +fi
> +
> +pkgfile=${1##*/}
> +pkgname=${pkgfile%-*-*-*}
> +archive_dir="${ARCHIVE_BASE}/packages/${pkgname:0:1}/${pkgname}"
> +
> +if [[ ! -f ${archive_dir}/${pkgfile} ]]; then
> +	mkdir -p "${archive_dir}"
> +	cp -np "${1}"{,.sig} "${archive_dir}/"
> +fi
> diff --git a/db-functions b/db-functions
> index 7aeedced..b8a00b90 100644
> --- a/db-functions
> +++ b/db-functions
> @@ -444,4 +447,24 @@ arch_repo_modify() {
>  	REPO_MODIFIED=1
>  }
>  
> +# Verify the existence of dependent packages needed by a given pkgfile
> +# usage: check_reproducible pkgfile
> +check_reproducible() {
> +	local pkg dir pkgs=() pkgfile pkgfiles=()
> +
> +	mapfile -t pkgs < <(_grep_all_info "${1}" .BUILDINFO installed)
> +
> +	for pkg in "${pkgs[@]}"; do
> +		local pkgname=${pkg%-*-*-*}
> +		for dir in "${ARCHIVE_BASE}/packages/${pkgname:0:1}/${pkgname}" "${STAGING}"/**/; do
> +			if pkgfile="$(getpkgfile "${dir}/${pkg}"${PKGEXTS} 2>/dev/null)"; then
> +				pkgfiles+=("${pkgfile}")
> +				continue 2
> +			fi
> +		done
> +		error "could not find existing package for %s" "${pkg}"


I imagine that I'd be confused if I ever saw this error. How about
clarifying it like this? "could not find package for dependency %s in
reproducibility archive or your staging directory"

> +		return 1
> +	done
> +}
> +
>  . "$(dirname "$(readlink -e "${BASH_SOURCE[0]}")")/db-functions-${VCS}"
> diff --git a/db-update b/db-update
> index 313fb999..04a29bf3 100755
> --- a/db-update
> +++ b/db-update
> @@ -61,6 +61,9 @@ for repo in "${repos[@]}"; do
>  			if ! check_builddir "${pkg}"; then
>  				die "Package %s was not built in a chroot" "$repo/${pkg##*/}"
>  			fi
> +			if ! check_reproducible "${pkg}"; then
> +				die "Package %s is not reproducible" "${pkg}"

Same as above. I'd suggest something like this:

"Package %s depends on packages that are missing in the reproducibility
archive and your staging directory. Ensure that all dependencies either
exist in the repositories or reproducibility archive already or that
they are added together with the package in a single call to db-update."

Florian
Emil Velikov via arch-projects Jan. 9, 2019, 2:49 p.m. UTC | #2
On 1/9/19 8:00 AM, Florian Pritz wrote:
> On Tue, Jan 08, 2019 at 06:40:37PM -0500, Eli Schwartz via arch-projects <arch-projects@archlinux.org> wrote:
>> diff --git a/db-archive b/db-archive
>> new file mode 100755
>> index 00000000..5680b9de
>> --- /dev/null
>> +++ b/db-archive
>> @@ -0,0 +1,21 @@
>> +#!/bin/bash
>> +
>> +. "$(dirname "$(readlink -e "$0")")/config"
> 
> This uses $0 (see below).
> 
>> +
>> +if (( $# != 1 )); then
>> +	echo "usage: %s <pkgfile>" "${0##*/}"
>> +	exit 1
>> +fi
>> +
>> +if [[ -n ${ARCHIVEUSER} ]]; then
>> +	exec sudo -u "${ARCHIVEUSER}" bash "${BASH_SOURCE[0]}" "${@}"
> 
> This uses $BASH_SOURCE instead of $0 as used above. Is this intentional,
> if so why? I'd argue that this should also use $0, but maybe I'm missing
> something?

BASH_SOURCE explicitly refers to the file it was sourced/executed from,
it's like the __file__ macro other languages have.

$0 can be anything since programs can modify their argv0 freely, but by
default it is the toplevel script name, which is why it gets used so
much. It's still not technically correct, and I prefer to use BASH_SOURCE.

In short, this is a copy-paste error above.

>> +fi
>> +
>> +pkgfile=${1##*/}
>> +pkgname=${pkgfile%-*-*-*}
>> +archive_dir="${ARCHIVE_BASE}/packages/${pkgname:0:1}/${pkgname}"
>> +
>> +if [[ ! -f ${archive_dir}/${pkgfile} ]]; then
>> +	mkdir -p "${archive_dir}"
>> +	cp -np "${1}"{,.sig} "${archive_dir}/"
>> +fi
>> diff --git a/db-functions b/db-functions
>> index 7aeedced..b8a00b90 100644
>> --- a/db-functions
>> +++ b/db-functions
>> @@ -444,4 +447,24 @@ arch_repo_modify() {
>>  	REPO_MODIFIED=1
>>  }
>>  
>> +# Verify the existence of dependent packages needed by a given pkgfile
>> +# usage: check_reproducible pkgfile
>> +check_reproducible() {
>> +	local pkg dir pkgs=() pkgfile pkgfiles=()
>> +
>> +	mapfile -t pkgs < <(_grep_all_info "${1}" .BUILDINFO installed)
>> +
>> +	for pkg in "${pkgs[@]}"; do
>> +		local pkgname=${pkg%-*-*-*}
>> +		for dir in "${ARCHIVE_BASE}/packages/${pkgname:0:1}/${pkgname}" "${STAGING}"/**/; do
>> +			if pkgfile="$(getpkgfile "${dir}/${pkg}"${PKGEXTS} 2>/dev/null)"; then
>> +				pkgfiles+=("${pkgfile}")
>> +				continue 2
>> +			fi
>> +		done
>> +		error "could not find existing package for %s" "${pkg}"
> 
> 
> I imagine that I'd be confused if I ever saw this error. How about
> clarifying it like this? "could not find package for dependency %s in
> reproducibility archive or your staging directory"

Maybe "existing or staged package for dependency %s"?

>> +		return 1
>> +	done
>> +}
>> +
>>  . "$(dirname "$(readlink -e "${BASH_SOURCE[0]}")")/db-functions-${VCS}"
>> diff --git a/db-update b/db-update
>> index 313fb999..04a29bf3 100755
>> --- a/db-update
>> +++ b/db-update
>> @@ -61,6 +61,9 @@ for repo in "${repos[@]}"; do
>>  			if ! check_builddir "${pkg}"; then
>>  				die "Package %s was not built in a chroot" "$repo/${pkg##*/}"
>>  			fi
>> +			if ! check_reproducible "${pkg}"; then
>> +				die "Package %s is not reproducible" "${pkg}"
> 
> Same as above. I'd suggest something like this:
> 
> "Package %s depends on packages that are missing in the reproducibility
> archive and your staging directory. Ensure that all dependencies either
> exist in the repositories or reproducibility archive already or that
> they are added together with the package in a single call to db-update."

The two errors will only be called together. I think expanding the
message when printing the missing dependency should be enough.
Emil Velikov via arch-projects Jan. 9, 2019, 3:34 p.m. UTC | #3
On Wed, Jan 09, 2019 at 09:49:26AM -0500, Eli Schwartz via arch-projects <arch-projects@archlinux.org> wrote:
> >> diff --git a/db-functions b/db-functions
> >> index 7aeedced..b8a00b90 100644
> >> --- a/db-functions
> >> +++ b/db-functions
> >> @@ -444,4 +447,24 @@ arch_repo_modify() {
> >>  	REPO_MODIFIED=1
> >>  }
> >>  
> >> +# Verify the existence of dependent packages needed by a given pkgfile
> >> +# usage: check_reproducible pkgfile
> >> +check_reproducible() {
> >> +	local pkg dir pkgs=() pkgfile pkgfiles=()
> >> +
> >> +	mapfile -t pkgs < <(_grep_all_info "${1}" .BUILDINFO installed)
> >> +
> >> +	for pkg in "${pkgs[@]}"; do
> >> +		local pkgname=${pkg%-*-*-*}
> >> +		for dir in "${ARCHIVE_BASE}/packages/${pkgname:0:1}/${pkgname}" "${STAGING}"/**/; do
> >> +			if pkgfile="$(getpkgfile "${dir}/${pkg}"${PKGEXTS} 2>/dev/null)"; then
> >> +				pkgfiles+=("${pkgfile}")
> >> +				continue 2
> >> +			fi
> >> +		done
> >> +		error "could not find existing package for %s" "${pkg}"
> > 
> > 
> > I imagine that I'd be confused if I ever saw this error. How about
> > clarifying it like this? "could not find package for dependency %s in
> > reproducibility archive or your staging directory"
> 
> Maybe "existing or staged package for dependency %s"?

"could not find existing or staged package for dependency %s" is fine by
me.

> 
> >> +		return 1
> >> +	done
> >> +}
> >> +
> >>  . "$(dirname "$(readlink -e "${BASH_SOURCE[0]}")")/db-functions-${VCS}"
> >> diff --git a/db-update b/db-update
> >> index 313fb999..04a29bf3 100755
> >> --- a/db-update
> >> +++ b/db-update
> >> @@ -61,6 +61,9 @@ for repo in "${repos[@]}"; do
> >>  			if ! check_builddir "${pkg}"; then
> >>  				die "Package %s was not built in a chroot" "$repo/${pkg##*/}"
> >>  			fi
> >> +			if ! check_reproducible "${pkg}"; then
> >> +				die "Package %s is not reproducible" "${pkg}"
> > 
> > Same as above. I'd suggest something like this:
> > 
> > "Package %s depends on packages that are missing in the reproducibility
> > archive and your staging directory. Ensure that all dependencies either
> > exist in the repositories or reproducibility archive already or that
> > they are added together with the package in a single call to db-update."
> 
> The two errors will only be called together. I think expanding the
> message when printing the missing dependency should be enough.

I get that, but I think that a user that sees these two message may not
understand that a missing dependency is related to the package being
reproducible. To be honest, I actually expected db-update to run all
checks and show all errors at once instead of terminating after the
first one. I now know that this is not the case. I'm pretty sure that I
would have treat these two messages as separate errors and that I'd then
be confused as to what the "second error" is actually about.

I think that it may save a lot of time and confusion if this error
message is clear about what is wrong and how it can be fixed. Most of
the other error messages in db-update are rather clear about the actual
problem. Maybe not as clear as the message I propose here, but clearer
than "Package %s is not reproducible". Apart from that, does it really
hurt to have a more verbose error message? It will only be shown if
there is an actual error and it doesn't influence normal usage. I'd say
we can afford to be more verbose in that case.

If you still think that this message should not be made more verbose,
I'd argue that it should be removed entirely. If we have just the
message about a dependency not being found, it is quite clear to a user
what is wrong and how they could fix the error. I'd say that is much
less confusing than if there were a second message about reproducibility
that some people may or may not consider to be a different, additional
error as I've explained above.

Florian
Emil Velikov via arch-projects Jan. 9, 2019, 3:45 p.m. UTC | #4
On 1/9/19 10:34 AM, Florian Pritz wrote:
> On Wed, Jan 09, 2019 at 09:49:26AM -0500, Eli Schwartz via arch-projects <arch-projects@archlinux.org> wrote:
>>>> diff --git a/db-functions b/db-functions
>>>> index 7aeedced..b8a00b90 100644
>>>> --- a/db-functions
>>>> +++ b/db-functions
>>>> @@ -444,4 +447,24 @@ arch_repo_modify() {
>>>>  	REPO_MODIFIED=1
>>>>  }
>>>>  
>>>> +# Verify the existence of dependent packages needed by a given pkgfile
>>>> +# usage: check_reproducible pkgfile
>>>> +check_reproducible() {
>>>> +	local pkg dir pkgs=() pkgfile pkgfiles=()
>>>> +
>>>> +	mapfile -t pkgs < <(_grep_all_info "${1}" .BUILDINFO installed)
>>>> +
>>>> +	for pkg in "${pkgs[@]}"; do
>>>> +		local pkgname=${pkg%-*-*-*}
>>>> +		for dir in "${ARCHIVE_BASE}/packages/${pkgname:0:1}/${pkgname}" "${STAGING}"/**/; do
>>>> +			if pkgfile="$(getpkgfile "${dir}/${pkg}"${PKGEXTS} 2>/dev/null)"; then
>>>> +				pkgfiles+=("${pkgfile}")
>>>> +				continue 2
>>>> +			fi
>>>> +		done
>>>> +		error "could not find existing package for %s" "${pkg}"
>>>
>>>
>>> I imagine that I'd be confused if I ever saw this error. How about
>>> clarifying it like this? "could not find package for dependency %s in
>>> reproducibility archive or your staging directory"
>>
>> Maybe "existing or staged package for dependency %s"?
> 
> "could not find existing or staged package for dependency %s" is fine by
> me.
> 
>>
>>>> +		return 1
>>>> +	done
>>>> +}
>>>> +
>>>>  . "$(dirname "$(readlink -e "${BASH_SOURCE[0]}")")/db-functions-${VCS}"
>>>> diff --git a/db-update b/db-update
>>>> index 313fb999..04a29bf3 100755
>>>> --- a/db-update
>>>> +++ b/db-update
>>>> @@ -61,6 +61,9 @@ for repo in "${repos[@]}"; do
>>>>  			if ! check_builddir "${pkg}"; then
>>>>  				die "Package %s was not built in a chroot" "$repo/${pkg##*/}"
>>>>  			fi
>>>> +			if ! check_reproducible "${pkg}"; then
>>>> +				die "Package %s is not reproducible" "${pkg}"
>>>
>>> Same as above. I'd suggest something like this:
>>>
>>> "Package %s depends on packages that are missing in the reproducibility
>>> archive and your staging directory. Ensure that all dependencies either
>>> exist in the repositories or reproducibility archive already or that
>>> they are added together with the package in a single call to db-update."
>>
>> The two errors will only be called together. I think expanding the
>> message when printing the missing dependency should be enough.
> 
> I get that, but I think that a user that sees these two message may not
> understand that a missing dependency is related to the package being
> reproducible. To be honest, I actually expected db-update to run all
> checks and show all errors at once instead of terminating after the
> first one. I now know that this is not the case. I'm pretty sure that I
> would have treat these two messages as separate errors and that I'd then
> be confused as to what the "second error" is actually about.
> 
> I think that it may save a lot of time and confusion if this error
> message is clear about what is wrong and how it can be fixed. Most of
> the other error messages in db-update are rather clear about the actual
> problem. Maybe not as clear as the message I propose here, but clearer
> than "Package %s is not reproducible". Apart from that, does it really
> hurt to have a more verbose error message? It will only be shown if
> there is an actual error and it doesn't influence normal usage. I'd say
> we can afford to be more verbose in that case.
> 
> If you still think that this message should not be made more verbose,
> I'd argue that it should be removed entirely. If we have just the
> message about a dependency not being found, it is quite clear to a user
> what is wrong and how they could fix the error. I'd say that is much
> less confusing than if there were a second message about reproducibility
> that some people may or may not consider to be a different, additional
> error as I've explained above.

Well, db-update check doesn't necessarily know what check_reproducible()
cannot find, I guess we could exit directly in there, but...

Hmm, what about:

Package %s is not reproducible. Ensure that all dependencies are
available in the repositories or are added in the same db-update.
Emil Velikov via arch-projects Jan. 9, 2019, 3:57 p.m. UTC | #5
On Wed, Jan 09, 2019 at 10:45:12AM -0500, Eli Schwartz via arch-projects <arch-projects@archlinux.org> wrote:
> >>>> diff --git a/db-update b/db-update
> >>>> index 313fb999..04a29bf3 100755
> >>>> --- a/db-update
> >>>> +++ b/db-update
> >>>> @@ -61,6 +61,9 @@ for repo in "${repos[@]}"; do
> >>>>  			if ! check_builddir "${pkg}"; then
> >>>>  				die "Package %s was not built in a chroot" "$repo/${pkg##*/}"
> >>>>  			fi
> >>>> +			if ! check_reproducible "${pkg}"; then
> >>>> +				die "Package %s is not reproducible" "${pkg}"
> >>>
> >>> Same as above. I'd suggest something like this:
> >>>
> >>> "Package %s depends on packages that are missing in the reproducibility
> >>> archive and your staging directory. Ensure that all dependencies either
> >>> exist in the repositories or reproducibility archive already or that
> >>> they are added together with the package in a single call to db-update."
> 
> Hmm, what about:
> 
> Package %s is not reproducible. Ensure that all dependencies are
> available in the repositories or are added in the same db-update.

Fine by me.

Florian

Patch

diff --git a/config b/config
index 1cfe11f4..57a2cc47 100644
--- a/config
+++ b/config
@@ -1,6 +1,8 @@ 
 #!/hint/bash
 
 FTP_BASE="/srv/ftp"
+ARCHIVE_BASE="/srv/archive"
+ARCHIVEUSER='archive'
 PKGREPOS=()
 PKGPOOL=''
 SRCPOOL=''
diff --git a/db-archive b/db-archive
new file mode 100755
index 00000000..5680b9de
--- /dev/null
+++ b/db-archive
@@ -0,0 +1,21 @@ 
+#!/bin/bash
+
+. "$(dirname "$(readlink -e "$0")")/config"
+
+if (( $# != 1 )); then
+	echo "usage: %s <pkgfile>" "${0##*/}"
+	exit 1
+fi
+
+if [[ -n ${ARCHIVEUSER} ]]; then
+	exec sudo -u "${ARCHIVEUSER}" bash "${BASH_SOURCE[0]}" "${@}"
+fi
+
+pkgfile=${1##*/}
+pkgname=${pkgfile%-*-*-*}
+archive_dir="${ARCHIVE_BASE}/packages/${pkgname:0:1}/${pkgname}"
+
+if [[ ! -f ${archive_dir}/${pkgfile} ]]; then
+	mkdir -p "${archive_dir}"
+	cp -np "${1}"{,.sig} "${archive_dir}/"
+fi
diff --git a/db-functions b/db-functions
index 7aeedced..b8a00b90 100644
--- a/db-functions
+++ b/db-functions
@@ -165,20 +165,23 @@  repo_unlock () { #repo_unlock <repo-name> <arch>
 	fi
 }
 
+# usage: _grep_all_info pkgfile infofile key
+_grep_all_info() {
+	local _ret=()
+
+	mapfile -t _ret < <(/usr/bin/bsdtar -xOqf "$1" "${2}" | grep "^${3} = ")
+
+	printf '%s\n' "${_ret[@]#${3} = }"
+}
+
 # usage: _grep_pkginfo pkgfile pattern
 _grep_pkginfo() {
-	local _ret
-
-	_ret="$(/usr/bin/bsdtar -xOqf "$1" .PKGINFO | grep "^${2} = " | tail -1)"
-	echo "${_ret#${2} = }"
+	_grep_all_info "${1}" .PKGINFO "${2}" | tail -1
 }
 
 # usage: _grep_buildinfo pkgfile pattern
 _grep_buildinfo() {
-	local _ret
-
-	_ret="$(/usr/bin/bsdtar -xOqf "$1" .BUILDINFO | grep "^${2} = " | tail -1)"
-	echo "${_ret#${2} = }"
+	_grep_all_info "${1}" .BUILDINFO "${2}" | tail -1
 }
 
 # Get the package base or name as fallback
@@ -444,4 +447,24 @@  arch_repo_modify() {
 	REPO_MODIFIED=1
 }
 
+# Verify the existence of dependent packages needed by a given pkgfile
+# usage: check_reproducible pkgfile
+check_reproducible() {
+	local pkg dir pkgs=() pkgfile pkgfiles=()
+
+	mapfile -t pkgs < <(_grep_all_info "${1}" .BUILDINFO installed)
+
+	for pkg in "${pkgs[@]}"; do
+		local pkgname=${pkg%-*-*-*}
+		for dir in "${ARCHIVE_BASE}/packages/${pkgname:0:1}/${pkgname}" "${STAGING}"/**/; do
+			if pkgfile="$(getpkgfile "${dir}/${pkg}"${PKGEXTS} 2>/dev/null)"; then
+				pkgfiles+=("${pkgfile}")
+				continue 2
+			fi
+		done
+		error "could not find existing package for %s" "${pkg}"
+		return 1
+	done
+}
+
 . "$(dirname "$(readlink -e "${BASH_SOURCE[0]}")")/db-functions-${VCS}"
diff --git a/db-update b/db-update
index 313fb999..04a29bf3 100755
--- a/db-update
+++ b/db-update
@@ -61,6 +61,9 @@  for repo in "${repos[@]}"; do
 			if ! check_builddir "${pkg}"; then
 				die "Package %s was not built in a chroot" "$repo/${pkg##*/}"
 			fi
+			if ! check_reproducible "${pkg}"; then
+				die "Package %s is not reproducible" "${pkg}"
+			fi
 		done
 		if ! check_splitpkgs "${repo}" "${pkgs[@]}"; then
 			die "Missing split packages for %s" "$repo"
@@ -82,6 +85,7 @@  for repo in "${repos[@]}"; do
 			# any packages might have been moved by the previous run
 			if [[ -f ${pkg} ]]; then
 				mv "${pkg}" "$FTP_BASE/${PKGPOOL}"
+				"$(dirname "$(readlink -e "$0")")/db-archive" "${FTP_BASE}/${PKGPOOL}/${pkg##*/}"
 			fi
 			ln -s "../../../${PKGPOOL}/${pkgfile}" "$FTP_BASE/$repo/os/${pkgarch}"
 			# also move signatures
diff --git a/test/cases/db-update.bats b/test/cases/db-update.bats
index 9ee06321..bc978302 100644
--- a/test/cases/db-update.bats
+++ b/test/cases/db-update.bats
@@ -87,6 +87,12 @@  load ../lib/common
 	checkPackage testing pkg-any-a 1-2
 }
 
+@test "archive package when releasing" {
+	releasePackage extra pkg-any-a
+	db-update
+	[[ -f ${ARCHIVE_BASE}/packages/p/pkg-any-a/pkg-any-a-1-1-any${PKGEXT} ]]
+}
+
 @test "update same any package to same repository fails" {
 	releasePackage extra pkg-any-a
 	db-update