[dbscripts,2/3] ftpdir-cleanup, sourceballs: replace external find command with bash globbing

Message ID 20180216034504.27610-3-eschwartz@archlinux.org
State Superseded
Headers show
Series Fix ambiguous uses of | expand

Commit Message

Emil Velikov via arch-projects Feb. 16, 2018, 3:45 a.m. UTC
This fully removes the use of find from the codebase, leads to a
micro-optimization in a couple cases,  and ensures that $PKGEXT is
consistently treated as a shell globbing character (which is important
because it is used as one).

Of the eight instances in these files:

- One was unnecessary as `cat` can natively consume all files passed to
  it and no directory traversal was in use.

- Two were unnecessary as they were hardcoded to read a single file....

- Another four were only being used to strip leading directory paths,
  and can be replaced by globstar and ${filepath##*/}

- The final two were checking the modification time of the files, and
  can be replaced with touch(1) and [[ -nt ]]. Although this introduces
  an additional temporary file, this is not such a big deal.

Signed-off-by: Eli Schwartz <eschwartz@archlinux.org>
---
 cron-jobs/ftpdir-cleanup | 24 ++++++++++++++++++------
 cron-jobs/sourceballs    | 21 ++++++++++++++++-----
 2 files changed, 34 insertions(+), 11 deletions(-)

Comments

Luke Shumaker Feb. 17, 2018, 7:29 p.m. UTC | #1
On Thu, 15 Feb 2018 22:45:03 -0500,
Eli Schwartz via arch-projects wrote:
> diff --git a/cron-jobs/ftpdir-cleanup b/cron-jobs/ftpdir-cleanup
> index 2f3d5aa..2d33047 100755
> --- a/cron-jobs/ftpdir-cleanup
> +++ b/cron-jobs/ftpdir-cleanup
...
> -if [ ${#old_pkgs[@]} -ge 1 ]; then
> +if (( ${#old_pkgs[@]} > 1 )); then

That should either be >= 1 or > 0.

> diff --git a/cron-jobs/sourceballs b/cron-jobs/sourceballs
> index 9ab4e98..5844817 100755
> --- a/cron-jobs/sourceballs
> +++ b/cron-jobs/sourceballs
...
> -if [ ${#old_pkgs[@]} -ge 1 ]; then
> +if (( ${#old_pkgs[@]} > 1 )); then

Likewise.
Emil Velikov via arch-projects Feb. 18, 2018, 2:06 a.m. UTC | #2
On 02/17/2018 02:29 PM, Luke Shumaker wrote:
> On Thu, 15 Feb 2018 22:45:03 -0500,
> Eli Schwartz via arch-projects wrote:
>> diff --git a/cron-jobs/ftpdir-cleanup b/cron-jobs/ftpdir-cleanup
>> index 2f3d5aa..2d33047 100755
>> --- a/cron-jobs/ftpdir-cleanup
>> +++ b/cron-jobs/ftpdir-cleanup
> ...
>> -if [ ${#old_pkgs[@]} -ge 1 ]; then
>> +if (( ${#old_pkgs[@]} > 1 )); then
> 
> That should either be >= 1 or > 0.
> 
>> diff --git a/cron-jobs/sourceballs b/cron-jobs/sourceballs
>> index 9ab4e98..5844817 100755
>> --- a/cron-jobs/sourceballs
>> +++ b/cron-jobs/sourceballs
> ...
>> -if [ ${#old_pkgs[@]} -ge 1 ]; then
>> +if (( ${#old_pkgs[@]} > 1 )); then
> 
> Likewise.

Thanks. I've also noticed this whole patchset terribly breaks the
testsuite. Which is sort of expected.

We are overloading PKGEXT to mean something dbscripts specific, but that
then (I think?) gets imported into makepkg during the testsuite builds.
I'm going to rename it to PKGEXTS as that serves a number of purposes:
it avoids clashing with makepkg, it is more descriptive of its actual
purpose, and it provides a free semantic warning to future readers of
the code that this variable is meant to be more than one thing, and
extra care *must* be taken when using it.

But the real issue is that we then use this variable to complete
${pkgnames[@]/%/${PKGEXT}} which works, sort of, as it coincidentally
globs okay with ? but is technically quite wrong for the above mentioned
reasons.

Really, once pacman 5.1 is released containing my fix that makes
--packagelist finally useful for the first time ever, this will
automatically be fixed, as the use of print_all_package_names will
simply return full filename paths and there will be no need to glob
something that matches both filenames from your patch "Update tests to
check for glob regression".

The testsuite keeps looking for files that match some random dbscripts
glob which has nothing to do with the hardcoded .pkg.tar.xz in a stock
makepkg.conf, and the testsuite seems to be subtly buggy. Looks like
PKGEXT is also used for complicated things in checkPackageDB, with more
unquoted  [ ] paths as well as grep -q "${pkgfile%${PKGEXT}}" which
actually *breaks* with extglob. Because extended globs don't fall back
on being a string literal -- which is behavior I approve of. So this
needs to use ${pkgname}-${pkgver}-${pkgarch}.

We can either have $pkgfile not include $PKGEXT, use more is_globfile
(which is not actually available in the testsuite as db-functions is not
sourced and will break everything if you try since it runs mktemp with
the bats TMPDIR or something), or rename PKGEXT, and have the testsuite
use the PKGEXT from makepkg.conf since that is what it will use anyway
when running makepkg...

Patch

diff --git a/cron-jobs/ftpdir-cleanup b/cron-jobs/ftpdir-cleanup
index 2f3d5aa..2d33047 100755
--- a/cron-jobs/ftpdir-cleanup
+++ b/cron-jobs/ftpdir-cleanup
@@ -38,7 +38,11 @@  for repo in ${PKGREPOS[@]}; do
 			continue
 		fi
 		# get a list of actual available package files
-		find "${FTP_BASE}/${repo}/os/${arch}" -xtype f -name "*${PKGEXT}" -printf '%f\n' | sort > "${WORKDIR}/repo-${repo}-${arch}"
+		for f in "${FTP_BASE}"/${repo}/os/${arch}/*${PKGEXT}; do
+			if [[ -f $f ]]; then
+				printf '%s\n' "${f##*/}"
+			fi
+		done | sort > "${WORKDIR}/repo-${repo}-${arch}"
 		# get a list of package files defined in the repo db
 		bsdtar -xOf "${FTP_BASE}/${repo}/os/${arch}/${repo}${DBEXT}" | awk '/^%FILENAME%/{getline;print}' | sort > "${WORKDIR}/db-${repo}-${arch}"
 
@@ -61,10 +65,12 @@  for repo in ${PKGREPOS[@]}; do
 	done
 done
 
-# get a list of all available packages in the pacakge pool
-find "$FTP_BASE/${PKGPOOL}" -name "*${PKGEXT}" -printf '%f\n' | sort > "${WORKDIR}/pool"
+# get a list of all available packages in the package pool
+for f in "$FTP_BASE/${PKGPOOL}"/*${PKGEXT}; do
+	printf '%s\n' "${f##*/}"
+done | sort > "${WORKDIR}/pool"
 # create a list of packages in our db
-find "${WORKDIR}" -maxdepth 1 -type f -name 'db-*' -exec cat {} \; | sort -u > "${WORKDIR}/db"
+cat "${WORKDIR}"/db-* 2>/dev/null | sort -u > "${WORKDIR}/db"
 
 old_pkgs=($(comm -23 "${WORKDIR}/pool" "${WORKDIR}/db"))
 if [ ${#old_pkgs[@]} -ge 1 ]; then
@@ -75,8 +81,14 @@  if [ ${#old_pkgs[@]} -ge 1 ]; then
 	done
 fi
 
-old_pkgs=($(find ${CLEANUP_DESTDIR} -type f -name "*${PKGEXT}" -mtime +${CLEANUP_KEEP} -printf '%f\n'))
-if [ ${#old_pkgs[@]} -ge 1 ]; then
+unset old_pkgs
+touch -d "${CLEANUP_KEEP} days ago"  "${WORKDIR}/cleanup_timestamp"
+for f in "${CLEANUP_DESTDIR}"/**/*${PKGEXT}; do
+	if [[ ${WORKDIR}/cleanup_timestamp -nt $f ]]; then
+		old_pkgs+=("${f##*/}")
+	fi
+done
+if (( ${#old_pkgs[@]} > 1 )); then
 	msg "Removing old packages from the cleanup directory..."
 	for old_pkg in ${old_pkgs[@]}; do
 		msg2 "${old_pkg}"
diff --git a/cron-jobs/sourceballs b/cron-jobs/sourceballs
index 9ab4e98..5844817 100755
--- a/cron-jobs/sourceballs
+++ b/cron-jobs/sourceballs
@@ -46,7 +46,11 @@  for repo in ${PKGREPOS[@]}; do
 done
 
 # Create a list of all available source package file names
-find "${FTP_BASE}/${SRCPOOL}" -xtype f -name "*${SRCEXT}" -printf '%f\n' | sort -u > "${WORKDIR}/available-src-pkgs"
+for f in "${FTP_BASE}"/${SRCPOOL}/*${SRCEXT}; do
+	if [[ -f $f ]]; then
+		printf '%s\n' "${f##*/}"
+	fi
+done | sort -u > "${WORKDIR}/available-src-pkgs"
 
 # Check for all packages if we need to build a source package
 for repo in ${PKGREPOS[@]}; do
@@ -117,8 +121,8 @@  for repo in ${PKGREPOS[@]}; do
 done
 
 # Cleanup old source packages
-find "${WORKDIR}" -maxdepth 1 -type f -name 'expected-src-pkgs' -exec cat {} \; | sort -u > "${WORKDIR}/expected-src-pkgs.sort"
-find "${WORKDIR}" -maxdepth 1 -type f -name 'available-src-pkgs' -exec cat {} \; | sort -u > "${WORKDIR}/available-src-pkgs.sort"
+sort -u "${WORKDIR}/expected-src-pkgs" > "${WORKDIR}/expected-src-pkgs.sort"
+sort -u "${WORKDIR}/available-src-pkgs" > "${WORKDIR}/available-src-pkgs.sort"
 old_pkgs=($(comm -23 "${WORKDIR}/available-src-pkgs.sort" "${WORKDIR}/expected-src-pkgs.sort"))
 
 if [ ${#old_pkgs[@]} -ge 1 ]; then
@@ -133,8 +137,15 @@  if [ ${#old_pkgs[@]} -ge 1 ]; then
 	done
 fi
 
-old_pkgs=($(find ${SOURCE_CLEANUP_DESTDIR} -type f -name "*${SRCEXT}" -mtime +${SOURCE_CLEANUP_KEEP} -printf '%f\n'))
-if [ ${#old_pkgs[@]} -ge 1 ]; then
+unset old_pkgs
+touch -d "${SOURCE_CLEANUP_KEEP} days ago"  "${WORKDIR}/cleanup_timestamp"
+for f in "${SOURCE_CLEANUP_DESTDIR}"/*${SRCEXT}; do
+	if [[ ${WORKDIR}/cleanup_timestamp -nt $f ]]; then
+		old_pkgs+=("${f##*/}")
+	fi
+done
+
+if (( ${#old_pkgs[@]} > 1 )); then
 	msg "Removing old source packages from the cleanup directory..."
 	for old_pkg in ${old_pkgs[@]}; do
 		msg2 "${old_pkg}"