Message ID | 20220102002843.2104841-2-foxboron@archlinux.org |
---|---|
State | Accepted, archived |
Headers | show |
Series | Use debugedit instead of AWK to parse source files | expand |
On 2/1/22 10:28, Morten Linderud wrote: > From: Morten Linderud <morten@linderud.pw> > > This moves us from the fairly ugly AWK parsing line to debugedit which > originally comes out of the rpm project. > > The original code has issues parsing anything that was not straight > C/C++ and languages like Rust or Go would return invalid source code > files. debugedit handles all these cases better. > > Signed-off-by: Morten Linderud <morten@linderud.pw> > --- I'm really happy this was split into a separate project! The original plan was to use debugedit, but it could not be build without (from memory) depending on librpm, which did not seem appropriate! Add a check for debugedit in scripts/libmakepkg/executable/strip.sh.in > scripts/libmakepkg/tidy/strip.sh.in | 13 ++++++++----- > 1 file changed, 8 insertions(+), 5 deletions(-) > > diff --git a/scripts/libmakepkg/tidy/strip.sh.in b/scripts/libmakepkg/tidy/strip.sh.in > index 92a6fb15..c1d8ee3c 100644 > --- a/scripts/libmakepkg/tidy/strip.sh.in > +++ b/scripts/libmakepkg/tidy/strip.sh.in > @@ -36,8 +36,11 @@ build_id() { > } > > source_files() { > - LANG=C readelf "$1" --debug-dump 2>/dev/null | \ > - awk '/DW_AT_name +:/{name=$NF}/DW_AT_comp_dir +:/{{if (name == "<artificial>") next}{if (name !~ /^[<\/]/) {printf "%s/", $NF}}{print name}}' > + dbgsrcdir="${DBGSRCDIR:-/usr/src/debug}" > + local dbgsrclist="$(mktemp "${startdir}/dbgsrclist.${binary##*/}.XXXXXXXXX")" I really do not like making temporary files. Particularly not in ${startdir}, which can be readonly provided BUILDDIR, SRCDIR, etc are set. I may accept using $srcdir if writing this to a file is *essential*. > + LANG=C debugedit -n -b "${srcdir}" -d "${dbgsrcdir}" -l "${dbgsrclist}" "$1" > /dev/null > + sort -zu "${dbgsrclist}" | tr '\0' '\n' > + rm -f "$dbgsrclist" > } > > strip_file() { > @@ -58,9 +61,9 @@ strip_file() { > # copy source files to debug directory > local file dest t > while IFS= read -r t; do > - file=${t/${dbgsrcdir}/"$srcdir"} > - dest="${dbgsrc/"$dbgsrcdir"/}$t" > - if ! [[ -f $dest ]]; then > + file="${srcdir}/${t}" > + dest="${dbgsrc}/${t}" > + if [[ -f "$file" ]] && ! [[ -f $dest ]]; then > mkdir -p "${dest%/*}" > cp -- "$file" "$dest" > fi
On 02/01/2022 06.08, Allan McRae wrote: > On 2/1/22 10:28, Morten Linderud wrote: >> scripts/libmakepkg/tidy/strip.sh.in | 13 ++++++++----- >> 1 file changed, 8 insertions(+), 5 deletions(-) >> >> diff --git a/scripts/libmakepkg/tidy/strip.sh.in b/scripts/libmakepkg/tidy/strip.sh.in >> index 92a6fb15..c1d8ee3c 100644 >> --- a/scripts/libmakepkg/tidy/strip.sh.in >> +++ b/scripts/libmakepkg/tidy/strip.sh.in >> @@ -36,8 +36,11 @@ build_id() { >> } >> source_files() { >> - LANG=C readelf "$1" --debug-dump 2>/dev/null | \ >> - awk '/DW_AT_name +:/{name=$NF}/DW_AT_comp_dir +:/{{if (name == "<artificial>") next}{if (name !~ /^[<\/]/) {printf "%s/", $NF}}{print name}}' >> + dbgsrcdir="${DBGSRCDIR:-/usr/src/debug}" >> + local dbgsrclist="$(mktemp "${startdir}/dbgsrclist.${binary##*/}.XXXXXXXXX")" > > I really do not like making temporary files. Particularly not in ${startdir}, which can be readonly provided BUILDDIR, SRCDIR, etc are set. > > I may accept using $srcdir if writing this to a file is *essential*> >> + LANG=C debugedit -n -b "${srcdir}" -d "${dbgsrcdir}" -l "${dbgsrclist}" "$1" > /dev/null >> + sort -zu "${dbgsrclist}" | tr '\0' '\n' >> + rm -f "$dbgsrclist" I haven't been able to produce any unrelated output on stdout, so just using /dev/stdout for --list-file would probably work ok - to make extra sure, this also works: debugedit --list-file=/dev/fd/3 "$1" 3>&1 >/dev/null | sort [...] Tangentially related, what's the opinion on using short vs. long options? Personally, I find that long options make scripts much easier to grok because they save a lot of looking at man pages/--help outputs, but I can also see how they might make things too crowded and busy, especially for common commands. Either way, might be something to include in HACKING. >> } >> strip_file() { >> @@ -58,9 +61,9 @@ strip_file() { >> # copy source files to debug directory >> local file dest t >> while IFS= read -r t; do >> - file=${t/${dbgsrcdir}/"$srcdir"} >> - dest="${dbgsrc/"$dbgsrcdir"/}$t" >> - if ! [[ -f $dest ]]; then >> + file="${srcdir}/${t}" >> + dest="${dbgsrc}/${t}" >> + if [[ -f "$file" ]] && ! [[ -f $dest ]]; then >> mkdir -p "${dest%/*}" >> cp -- "$file" "$dest" >> fi >
On 02/01/2022 01.28, Morten Linderud wrote: > From: Morten Linderud <morten@linderud.pw> > > This moves us from the fairly ugly AWK parsing line to debugedit which > originally comes out of the rpm project. > > The original code has issues parsing anything that was not straight > C/C++ and languages like Rust or Go would return invalid source code > files. debugedit handles all these cases better. > > Signed-off-by: Morten Linderud <morten@linderud.pw> > --- > scripts/libmakepkg/tidy/strip.sh.in | 13 ++++++++----- > 1 file changed, 8 insertions(+), 5 deletions(-) > > diff --git a/scripts/libmakepkg/tidy/strip.sh.in b/scripts/libmakepkg/tidy/strip.sh.in > index 92a6fb15..c1d8ee3c 100644 > --- a/scripts/libmakepkg/tidy/strip.sh.in > +++ b/scripts/libmakepkg/tidy/strip.sh.in > @@ -36,8 +36,11 @@ build_id() { > } > > source_files() { > - LANG=C readelf "$1" --debug-dump 2>/dev/null | \ > - awk '/DW_AT_name +:/{name=$NF}/DW_AT_comp_dir +:/{{if (name == "<artificial>") next}{if (name !~ /^[<\/]/) {printf "%s/", $NF}}{print name}}' > + dbgsrcdir="${DBGSRCDIR:-/usr/src/debug}" > + local dbgsrclist="$(mktemp "${startdir}/dbgsrclist.${binary##*/}.XXXXXXXXX")" > + LANG=C debugedit -n -b "${srcdir}" -d "${dbgsrcdir}" -l "${dbgsrclist}" "$1" > /dev/null -d/--dest-dir actually causes paths embedded in the binary to be *rewritten* from -b/--base-dir. Normally, this shouldn't happen because all paths are already translated by -fdebug-prefix-map before they end up in the binary (making it equivalent to `debugedit --base-dir "${dbgsrcdir}"`) but if there are any $srcdir-based paths left for whatever reason, this modifies the binary. If this is something we actually want to do here, I think this behaviour is at least worth a source comment. > + sort -zu "${dbgsrclist}" | tr '\0' '\n' > + rm -f "$dbgsrclist" > } > > strip_file() { > @@ -58,9 +61,9 @@ strip_file() { > # copy source files to debug directory > local file dest t > while IFS= read -r t; do > - file=${t/${dbgsrcdir}/"$srcdir"} > - dest="${dbgsrc/"$dbgsrcdir"/}$t" > - if ! [[ -f $dest ]]; then > + file="${srcdir}/${t}" > + dest="${dbgsrc}/${t}" > + if [[ -f "$file" ]] && ! [[ -f $dest ]]; then > mkdir -p "${dest%/*}" > cp -- "$file" "$dest" > fi
On 2/1/22 23:34, Xiretza wrote: > On 02/01/2022 01.28, Morten Linderud wrote: >> From: Morten Linderud <morten@linderud.pw> >> >> This moves us from the fairly ugly AWK parsing line to debugedit which >> originally comes out of the rpm project. >> >> The original code has issues parsing anything that was not straight >> C/C++ and languages like Rust or Go would return invalid source code >> files. debugedit handles all these cases better. >> >> Signed-off-by: Morten Linderud <morten@linderud.pw> >> --- >> scripts/libmakepkg/tidy/strip.sh.in | 13 ++++++++----- >> 1 file changed, 8 insertions(+), 5 deletions(-) >> >> diff --git a/scripts/libmakepkg/tidy/strip.sh.in >> b/scripts/libmakepkg/tidy/strip.sh.in >> index 92a6fb15..c1d8ee3c 100644 >> --- a/scripts/libmakepkg/tidy/strip.sh.in >> +++ b/scripts/libmakepkg/tidy/strip.sh.in >> @@ -36,8 +36,11 @@ build_id() { >> } >> source_files() { >> - LANG=C readelf "$1" --debug-dump 2>/dev/null | \ >> - awk '/DW_AT_name +:/{name=$NF}/DW_AT_comp_dir +:/{{if (name >> == "<artificial>") next}{if (name !~ /^[<\/]/) {printf "%s/", >> $NF}}{print name}}' >> + dbgsrcdir="${DBGSRCDIR:-/usr/src/debug}" >> + local dbgsrclist="$(mktemp >> "${startdir}/dbgsrclist.${binary##*/}.XXXXXXXXX")" >> + LANG=C debugedit -n -b "${srcdir}" -d "${dbgsrcdir}" -l >> "${dbgsrclist}" "$1" > /dev/null > > -d/--dest-dir actually causes paths embedded in the binary to be > *rewritten* from -b/--base-dir. Normally, this shouldn't happen because > all paths are already translated by -fdebug-prefix-map before they end > up in the binary (making it equivalent to `debugedit --base-dir > "${dbgsrcdir}"`) but if there are any $srcdir-based paths left for > whatever reason, this modifies the binary. > > If this is something we actually want to do here, I think this behaviour > is at least worth a source comment. > Does it? During testing the patch I added checksums before and after getting the source file list. This command causes no change to the file. A
On 02/01/2022 14.44, Allan McRae wrote: > On 2/1/22 23:34, Xiretza wrote: >> On 02/01/2022 01.28, Morten Linderud wrote: >>> From: Morten Linderud <morten@linderud.pw> >>> >>> This moves us from the fairly ugly AWK parsing line to debugedit which >>> originally comes out of the rpm project. >>> >>> The original code has issues parsing anything that was not straight >>> C/C++ and languages like Rust or Go would return invalid source code >>> files. debugedit handles all these cases better. >>> >>> Signed-off-by: Morten Linderud <morten@linderud.pw> >>> --- >>> scripts/libmakepkg/tidy/strip.sh.in | 13 ++++++++----- >>> 1 file changed, 8 insertions(+), 5 deletions(-) >>> >>> diff --git a/scripts/libmakepkg/tidy/strip.sh.in b/scripts/libmakepkg/tidy/strip.sh.in >>> index 92a6fb15..c1d8ee3c 100644 >>> --- a/scripts/libmakepkg/tidy/strip.sh.in >>> +++ b/scripts/libmakepkg/tidy/strip.sh.in >>> @@ -36,8 +36,11 @@ build_id() { >>> } >>> source_files() { >>> - LANG=C readelf "$1" --debug-dump 2>/dev/null | \ >>> - awk '/DW_AT_name +:/{name=$NF}/DW_AT_comp_dir +:/{{if (name == "<artificial>") next}{if (name !~ /^[<\/]/) {printf "%s/", $NF}}{print name}}' >>> + dbgsrcdir="${DBGSRCDIR:-/usr/src/debug}" >>> + local dbgsrclist="$(mktemp "${startdir}/dbgsrclist.${binary##*/}.XXXXXXXXX")" >>> + LANG=C debugedit -n -b "${srcdir}" -d "${dbgsrcdir}" -l "${dbgsrclist}" "$1" > /dev/null >> >> -d/--dest-dir actually causes paths embedded in the binary to be *rewritten* from -b/--base-dir. Normally, this shouldn't happen because all paths are already translated by -fdebug-prefix-map before they end up in the binary (making it equivalent to `debugedit --base-dir "${dbgsrcdir}"`) but if there are any $srcdir-based paths left for whatever reason, this modifies the binary. >> >> If this is something we actually want to do here, I think this behaviour is at least worth a source comment. >> > > Does it? During testing the patch I added checksums before and after getting the source file list. This command causes no change to the file. > > A It does for me: $ pwd /tmp/srcdir $ cat main.c int main() {return 0;} $ gcc -g main.c $ debugedit -l /dev/stdout a.out | tr '\0' '\n' /tmp/srcdir/main.c /tmp/srcdir/main.c $ sha256sum a.out 893824faaf62af0cdd5c23883e45b0bd6f1f526fbe5aa89d357c9acd34bbce3e a.out $ debugedit -l /dev/stdout --base-dir /tmp/srcdir --dest-dir /tmp/dbgsrcdir a.out | tr '\0' '\n' main.c main.c $ sha256sum a.out 6e4eebd6863150cb6568d99335ff073a5ddd9585e22c5de5c4dd159e5b6bc0b2 a.out $ debugedit -l /dev/stdout a.out | tr '\0' '\n' /tmp/dbgsrcdir/main.c /tmp/dbgsrcdir/main.c
On 2/1/22 23:51, Xiretza wrote: > > On 02/01/2022 14.44, Allan McRae wrote: >> On 2/1/22 23:34, Xiretza wrote: >>> On 02/01/2022 01.28, Morten Linderud wrote: >>>> From: Morten Linderud <morten@linderud.pw> >>>> >>>> This moves us from the fairly ugly AWK parsing line to debugedit which >>>> originally comes out of the rpm project. >>>> >>>> The original code has issues parsing anything that was not straight >>>> C/C++ and languages like Rust or Go would return invalid source code >>>> files. debugedit handles all these cases better. >>>> >>>> Signed-off-by: Morten Linderud <morten@linderud.pw> >>>> --- >>>> scripts/libmakepkg/tidy/strip.sh.in | 13 ++++++++----- >>>> 1 file changed, 8 insertions(+), 5 deletions(-) >>>> >>>> diff --git a/scripts/libmakepkg/tidy/strip.sh.in >>>> b/scripts/libmakepkg/tidy/strip.sh.in >>>> index 92a6fb15..c1d8ee3c 100644 >>>> --- a/scripts/libmakepkg/tidy/strip.sh.in >>>> +++ b/scripts/libmakepkg/tidy/strip.sh.in >>>> @@ -36,8 +36,11 @@ build_id() { >>>> } >>>> source_files() { >>>> - LANG=C readelf "$1" --debug-dump 2>/dev/null | \ >>>> - awk '/DW_AT_name +:/{name=$NF}/DW_AT_comp_dir +:/{{if (name >>>> == "<artificial>") next}{if (name !~ /^[<\/]/) {printf "%s/", >>>> $NF}}{print name}}' >>>> + dbgsrcdir="${DBGSRCDIR:-/usr/src/debug}" >>>> + local dbgsrclist="$(mktemp >>>> "${startdir}/dbgsrclist.${binary##*/}.XXXXXXXXX")" >>>> + LANG=C debugedit -n -b "${srcdir}" -d "${dbgsrcdir}" -l >>>> "${dbgsrclist}" "$1" > /dev/null >>> >>> -d/--dest-dir actually causes paths embedded in the binary to be >>> *rewritten* from -b/--base-dir. Normally, this shouldn't happen >>> because all paths are already translated by -fdebug-prefix-map before >>> they end up in the binary (making it equivalent to `debugedit >>> --base-dir "${dbgsrcdir}"`) but if there are any $srcdir-based paths >>> left for whatever reason, this modifies the binary. >>> >>> If this is something we actually want to do here, I think this >>> behaviour is at least worth a source comment. >>> >> >> Does it? During testing the patch I added checksums before and after >> getting the source file list. This command causes no change to the file. >> >> A > > It does for me: > > $ pwd > /tmp/srcdir > $ cat main.c > int main() {return 0;} > $ gcc -g main.c > $ debugedit -l /dev/stdout a.out | tr '\0' '\n' > /tmp/srcdir/main.c > /tmp/srcdir/main.c > > $ sha256sum a.out > 893824faaf62af0cdd5c23883e45b0bd6f1f526fbe5aa89d357c9acd34bbce3e a.out > $ debugedit -l /dev/stdout --base-dir /tmp/srcdir --dest-dir > /tmp/dbgsrcdir a.out | tr '\0' '\n' > main.c > main.c > $ sha256sum a.out > 6e4eebd6863150cb6568d99335ff073a5ddd9585e22c5de5c4dd159e5b6bc0b2 a.out > > $ debugedit -l /dev/stdout a.out | tr '\0' '\n' > /tmp/dbgsrcdir/main.c > /tmp/dbgsrcdir/main.c > . Add -n. Allan
On 3/1/22 00:20, Allan McRae wrote: > On 2/1/22 23:51, Xiretza wrote: >> >> On 02/01/2022 14.44, Allan McRae wrote: >>> On 2/1/22 23:34, Xiretza wrote: >>>> On 02/01/2022 01.28, Morten Linderud wrote: >>>>> From: Morten Linderud <morten@linderud.pw> >>>>> >>>>> This moves us from the fairly ugly AWK parsing line to debugedit which >>>>> originally comes out of the rpm project. >>>>> >>>>> The original code has issues parsing anything that was not straight >>>>> C/C++ and languages like Rust or Go would return invalid source code >>>>> files. debugedit handles all these cases better. >>>>> >>>>> Signed-off-by: Morten Linderud <morten@linderud.pw> >>>>> --- >>>>> scripts/libmakepkg/tidy/strip.sh.in | 13 ++++++++----- >>>>> 1 file changed, 8 insertions(+), 5 deletions(-) >>>>> >>>>> diff --git a/scripts/libmakepkg/tidy/strip.sh.in >>>>> b/scripts/libmakepkg/tidy/strip.sh.in >>>>> index 92a6fb15..c1d8ee3c 100644 >>>>> --- a/scripts/libmakepkg/tidy/strip.sh.in >>>>> +++ b/scripts/libmakepkg/tidy/strip.sh.in >>>>> @@ -36,8 +36,11 @@ build_id() { >>>>> } >>>>> source_files() { >>>>> - LANG=C readelf "$1" --debug-dump 2>/dev/null | \ >>>>> - awk '/DW_AT_name +:/{name=$NF}/DW_AT_comp_dir +:/{{if >>>>> (name == "<artificial>") next}{if (name !~ /^[<\/]/) {printf "%s/", >>>>> $NF}}{print name}}' >>>>> + dbgsrcdir="${DBGSRCDIR:-/usr/src/debug}" >>>>> + local dbgsrclist="$(mktemp >>>>> "${startdir}/dbgsrclist.${binary##*/}.XXXXXXXXX")" >>>>> + LANG=C debugedit -n -b "${srcdir}" -d "${dbgsrcdir}" -l >>>>> "${dbgsrclist}" "$1" > /dev/null >>>> >>>> -d/--dest-dir actually causes paths embedded in the binary to be >>>> *rewritten* from -b/--base-dir. Normally, this shouldn't happen >>>> because all paths are already translated by -fdebug-prefix-map >>>> before they end up in the binary (making it equivalent to `debugedit >>>> --base-dir "${dbgsrcdir}"`) but if there are any $srcdir-based paths >>>> left for whatever reason, this modifies the binary. >>>> >>>> If this is something we actually want to do here, I think this >>>> behaviour is at least worth a source comment. >>>> >>> >>> Does it? During testing the patch I added checksums before and after >>> getting the source file list. This command causes no change to the file. >>> >>> A >> >> It does for me: >> >> $ pwd >> /tmp/srcdir >> $ cat main.c >> int main() {return 0;} >> $ gcc -g main.c >> $ debugedit -l /dev/stdout a.out | tr '\0' '\n' >> /tmp/srcdir/main.c >> /tmp/srcdir/main.c >> >> $ sha256sum a.out >> 893824faaf62af0cdd5c23883e45b0bd6f1f526fbe5aa89d357c9acd34bbce3e a.out >> $ debugedit -l /dev/stdout --base-dir /tmp/srcdir --dest-dir >> /tmp/dbgsrcdir a.out | tr '\0' '\n' >> main.c >> main.c >> $ sha256sum a.out >> 6e4eebd6863150cb6568d99335ff073a5ddd9585e22c5de5c4dd159e5b6bc0b2 a.out >> >> $ debugedit -l /dev/stdout a.out | tr '\0' '\n' >> /tmp/dbgsrcdir/main.c >> /tmp/dbgsrcdir/main.c >> . > > Add -n. > For a better example, here is my testing code: source_files() { dbgsrcdir="${DBGSRCDIR:-/usr/src/debug}" local dbgsrclist="$(mktemp "${startdir}/dbgsrclist.${binary##*/}.XXXXXXXXX")" echo $1 >> $startdir/dbginfo echo sha256sum-orig: $(sha256sum $1) >> $startdir/dbginfo LANG=C debugedit -n -b "${srcdir}" -d "${dbgsrcdir}" -l "${dbgsrclist}" "$1" > /dev/null sort -zu "${dbgsrclist}" | tr '\0' '\n' sort -zu "${dbgsrclist}" | tr '\0' '\n' >> $startdir/dbginfo echo sha256sum-after: $(sha256sum $1) >> $startdir/dbginfo rm -f "$dbgsrclist" } and a snippet of the output: ./usr/bin/vercmp sha256sum-orig: 844e2a18277df5d46544fc977a028b02b58d642bc9754d7d9868197d23f42407 ./usr/bin/vercmp pacman/builddir/<artificial> pacman/builddir/<built-in> pacman/lib/libalpm/version.c pacman/src/util/vercmp.c sha256sum-after: 844e2a18277df5d46544fc977a028b02b58d642bc9754d7d9868197d23f42407 ./usr/bin/vercmp
On 02/01/2022 15.24, Allan McRae wrote: > On 3/1/22 00:20, Allan McRae wrote: >> >> Add -n. >> No change. > > For a better example, here is my testing code: > > source_files() { > dbgsrcdir="${DBGSRCDIR:-/usr/src/debug}" > local dbgsrclist="$(mktemp "${startdir}/dbgsrclist.${binary##*/}.XXXXXXXXX")" > echo $1 >> $startdir/dbginfo > echo sha256sum-orig: $(sha256sum $1) >> $startdir/dbginfo > LANG=C debugedit -n -b "${srcdir}" -d "${dbgsrcdir}" -l "${dbgsrclist}" "$1" > /dev/null > sort -zu "${dbgsrclist}" | tr '\0' '\n' > sort -zu "${dbgsrclist}" | tr '\0' '\n' >> $startdir/dbginfo > echo sha256sum-after: $(sha256sum $1) >> $startdir/dbginfo > rm -f "$dbgsrclist" > } > > > and a snippet of the output: > > ./usr/bin/vercmp > sha256sum-orig: 844e2a18277df5d46544fc977a028b02b58d642bc9754d7d9868197d23f42407 ./usr/bin/vercmp > pacman/builddir/<artificial> > pacman/builddir/<built-in> > pacman/lib/libalpm/version.c > pacman/src/util/vercmp.c > sha256sum-after: 844e2a18277df5d46544fc977a028b02b58d642bc9754d7d9868197d23f42407 ./usr/bin/vercmp > > Is it possible that there are simply no source file entries referencing $srcdir (because -fdebug-prefix-map is working as expected)? As I said, if that's the case, the binary is not modified because there are no occurrences of $srcdir to be rewritten to $dbgsrcdir.
On 3/1/22 00:29, Xiretza wrote: > > On 02/01/2022 15.24, Allan McRae wrote: >> On 3/1/22 00:20, Allan McRae wrote: >>> >>> Add -n. >>> > > No change. > >> >> For a better example, here is my testing code: >> >> source_files() { >> dbgsrcdir="${DBGSRCDIR:-/usr/src/debug}" >> local dbgsrclist="$(mktemp >> "${startdir}/dbgsrclist.${binary##*/}.XXXXXXXXX")" >> echo $1 >> $startdir/dbginfo >> echo sha256sum-orig: $(sha256sum $1) >> $startdir/dbginfo >> LANG=C debugedit -n -b "${srcdir}" -d "${dbgsrcdir}" -l >> "${dbgsrclist}" "$1" > /dev/null >> sort -zu "${dbgsrclist}" | tr '\0' '\n' >> sort -zu "${dbgsrclist}" | tr '\0' '\n' >> $startdir/dbginfo >> echo sha256sum-after: $(sha256sum $1) >> $startdir/dbginfo >> rm -f "$dbgsrclist" >> } >> >> >> and a snippet of the output: >> >> ./usr/bin/vercmp >> sha256sum-orig: >> 844e2a18277df5d46544fc977a028b02b58d642bc9754d7d9868197d23f42407 >> ./usr/bin/vercmp >> pacman/builddir/<artificial> >> pacman/builddir/<built-in> >> pacman/lib/libalpm/version.c >> pacman/src/util/vercmp.c >> sha256sum-after: >> 844e2a18277df5d46544fc977a028b02b58d642bc9754d7d9868197d23f42407 >> ./usr/bin/vercmp > > Is it possible that there are simply no source file entries referencing > $srcdir (because -fdebug-prefix-map is working as expected)? As I said, > if that's the case, the binary is not modified because there are no > occurrences of $srcdir to be rewritten to $dbgsrcdir. That is possible - I mostly tested packages I know obey CFLAGS. Saying that, any change would be a good thing, or else those source files placed in ${dbgsrcdir} are kindof useless! So I'm happy to have this rewrite paths - we are not regenerating the build-id, so I'm assuming this does not cause package reproducibility issues... Also, I get a very different list of files with and without -b/-d. Without using them I get a lot of system files. With, I just get the package source files. A
On Mon, Jan 03, 2022 at 12:48:05AM +1000, Allan McRae wrote: > On 3/1/22 00:29, Xiretza wrote: > > > > On 02/01/2022 15.24, Allan McRae wrote: > > > On 3/1/22 00:20, Allan McRae wrote: > > > > > > > > Add -n. > > > > > > > > No change. > > > > > > > > For a better example, here is my testing code: > > > > > > source_files() { > > > dbgsrcdir="${DBGSRCDIR:-/usr/src/debug}" > > > local dbgsrclist="$(mktemp > > > "${startdir}/dbgsrclist.${binary##*/}.XXXXXXXXX")" > > > echo $1 >> $startdir/dbginfo > > > echo sha256sum-orig: $(sha256sum $1) >> $startdir/dbginfo > > > LANG=C debugedit -n -b "${srcdir}" -d "${dbgsrcdir}" -l > > > "${dbgsrclist}" "$1" > /dev/null > > > sort -zu "${dbgsrclist}" | tr '\0' '\n' > > > sort -zu "${dbgsrclist}" | tr '\0' '\n' >> $startdir/dbginfo > > > echo sha256sum-after: $(sha256sum $1) >> $startdir/dbginfo > > > rm -f "$dbgsrclist" > > > } > > > > > > > > > and a snippet of the output: > > > > > > ./usr/bin/vercmp > > > sha256sum-orig: > > > 844e2a18277df5d46544fc977a028b02b58d642bc9754d7d9868197d23f42407 > > > ./usr/bin/vercmp > > > pacman/builddir/<artificial> > > > pacman/builddir/<built-in> > > > pacman/lib/libalpm/version.c > > > pacman/src/util/vercmp.c > > > sha256sum-after: > > > 844e2a18277df5d46544fc977a028b02b58d642bc9754d7d9868197d23f42407 > > > ./usr/bin/vercmp > > > > Is it possible that there are simply no source file entries referencing > > $srcdir (because -fdebug-prefix-map is working as expected)? As I said, > > if that's the case, the binary is not modified because there are no > > occurrences of $srcdir to be rewritten to $dbgsrcdir. > > That is possible - I mostly tested packages I know obey CFLAGS. Saying > that, any change would be a good thing, or else those source files placed in > ${dbgsrcdir} are kindof useless! So I'm happy to have this rewrite paths - > we are not regenerating the build-id, so I'm assuming this does not cause > package reproducibility issues... > > Also, I get a very different list of files with and without -b/-d. Without > using them I get a lot of system files. With, I just get the package source > files. > > A Xiretza is correct. It will rewrite the Go binaries and add correct paths since we do not have a prefix-map option in the Go compiler. However it won't change anything that utilizies prefix-map correctly. There might be a catch here, but I haven't found it.
On 02/01/2022 15.48, Allan McRae wrote: > On 3/1/22 00:29, Xiretza wrote: >> >> On 02/01/2022 15.24, Allan McRae wrote: >>> On 3/1/22 00:20, Allan McRae wrote: >>>> >>>> Add -n. >>>> >> >> No change. >> >>> >>> For a better example, here is my testing code: >>> >>> source_files() { >>> dbgsrcdir="${DBGSRCDIR:-/usr/src/debug}" >>> local dbgsrclist="$(mktemp "${startdir}/dbgsrclist.${binary##*/}.XXXXXXXXX")" >>> echo $1 >> $startdir/dbginfo >>> echo sha256sum-orig: $(sha256sum $1) >> $startdir/dbginfo >>> LANG=C debugedit -n -b "${srcdir}" -d "${dbgsrcdir}" -l "${dbgsrclist}" "$1" > /dev/null >>> sort -zu "${dbgsrclist}" | tr '\0' '\n' >>> sort -zu "${dbgsrclist}" | tr '\0' '\n' >> $startdir/dbginfo >>> echo sha256sum-after: $(sha256sum $1) >> $startdir/dbginfo >>> rm -f "$dbgsrclist" >>> } >>> >>> >>> and a snippet of the output: >>> >>> ./usr/bin/vercmp >>> sha256sum-orig: 844e2a18277df5d46544fc977a028b02b58d642bc9754d7d9868197d23f42407 ./usr/bin/vercmp >>> pacman/builddir/<artificial> >>> pacman/builddir/<built-in> >>> pacman/lib/libalpm/version.c >>> pacman/src/util/vercmp.c >>> sha256sum-after: 844e2a18277df5d46544fc977a028b02b58d642bc9754d7d9868197d23f42407 ./usr/bin/vercmp >> >> Is it possible that there are simply no source file entries referencing $srcdir (because -fdebug-prefix-map is working as expected)? As I said, if that's the case, the binary is not modified because there are no occurrences of $srcdir to be rewritten to $dbgsrcdir. > > That is possible - I mostly tested packages I know obey CFLAGS. Saying that, any change would be a good thing, or else those source files placed in ${dbgsrcdir} are kindof useless! So I'm happy to have this rewrite paths - we are not regenerating the build-id, so I'm assuming this does not cause package reproducibility issues... Makes sense, I can't think of any issues this would cause either - the source file paths have to be deterministic anyway, applying a deterministic transform on top won't change that. Still, having a function called "source_files" actually modify the passed binary deserves a comment, I think. > > Also, I get a very different list of files with and without -b/-d. Without using them I get a lot of system files. With, I just get the package source files. Right, without -b/-d, it just lists all source files. With just -b, it prints all source files rooted in the specified directory, but strips the prefix from the output. With both -b and -d, it replaces any base-dir prefixes with dest-dir (modifying the binary), then prints all paths rooted in the dest-dir (again, stripping the dest-dir prefix in the output), regardless if they've always been rooted in dest-dir (due to -fdebug-prefix-map) or if they were just transformed from base-dir. I hope that made sense. > > A
On 3/1/22 01:00, Xiretza wrote: > On 02/01/2022 15.48, Allan McRae wrote: >> On 3/1/22 00:29, Xiretza wrote: >>> >>> On 02/01/2022 15.24, Allan McRae wrote: >>>> On 3/1/22 00:20, Allan McRae wrote: >>>>> >>>>> Add -n. >>>>> >>> >>> No change. >>> >>>> >>>> For a better example, here is my testing code: >>>> >>>> source_files() { >>>> dbgsrcdir="${DBGSRCDIR:-/usr/src/debug}" >>>> local dbgsrclist="$(mktemp >>>> "${startdir}/dbgsrclist.${binary##*/}.XXXXXXXXX")" >>>> echo $1 >> $startdir/dbginfo >>>> echo sha256sum-orig: $(sha256sum $1) >> $startdir/dbginfo >>>> LANG=C debugedit -n -b "${srcdir}" -d "${dbgsrcdir}" -l >>>> "${dbgsrclist}" "$1" > /dev/null >>>> sort -zu "${dbgsrclist}" | tr '\0' '\n' >>>> sort -zu "${dbgsrclist}" | tr '\0' '\n' >> $startdir/dbginfo >>>> echo sha256sum-after: $(sha256sum $1) >> $startdir/dbginfo >>>> rm -f "$dbgsrclist" >>>> } >>>> >>>> >>>> and a snippet of the output: >>>> >>>> ./usr/bin/vercmp >>>> sha256sum-orig: >>>> 844e2a18277df5d46544fc977a028b02b58d642bc9754d7d9868197d23f42407 >>>> ./usr/bin/vercmp >>>> pacman/builddir/<artificial> >>>> pacman/builddir/<built-in> >>>> pacman/lib/libalpm/version.c >>>> pacman/src/util/vercmp.c >>>> sha256sum-after: >>>> 844e2a18277df5d46544fc977a028b02b58d642bc9754d7d9868197d23f42407 >>>> ./usr/bin/vercmp >>> >>> Is it possible that there are simply no source file entries >>> referencing $srcdir (because -fdebug-prefix-map is working as >>> expected)? As I said, if that's the case, the binary is not modified >>> because there are no occurrences of $srcdir to be rewritten to >>> $dbgsrcdir. >> >> That is possible - I mostly tested packages I know obey CFLAGS. >> Saying that, any change would be a good thing, or else those source >> files placed in ${dbgsrcdir} are kindof useless! So I'm happy to have >> this rewrite paths - we are not regenerating the build-id, so I'm >> assuming this does not cause package reproducibility issues... > > Makes sense, I can't think of any issues this would cause either - the > source file paths have to be deterministic anyway, applying a > deterministic transform on top won't change that. > > Still, having a function called "source_files" actually modify the > passed binary deserves a comment, I think. > Alternatively, happy for the function to be renamed something better. A
On Sun, Jan 02, 2022 at 04:00:56PM +0100, Xiretza wrote: > On 02/01/2022 15.48, Allan McRae wrote: > > Also, I get a very different list of files with and without -b/-d. Without using them I get a lot of system files. With, I just get the package source files. > > Right, without -b/-d, it just lists all source files. With just -b, it prints > all source files rooted in the specified directory, but strips the prefix from > the output. With both -b and -d, it replaces any base-dir prefixes with > dest-dir (modifying the binary), then prints all paths rooted in the dest-dir > (again, stripping the dest-dir prefix in the output), regardless if they've > always been rooted in dest-dir (due to -fdebug-prefix-map) or if they were > just transformed from base-dir. > > I hope that made sense. We can have one function `rewrite_source_list` and one `source_list`? `rewrite_source_list` would do the explicit rewrite, if needed. `source_list` just uses --base-dir and returns the files.
On 02/01/2022 16.18, Morten Linderud wrote: > On Sun, Jan 02, 2022 at 04:00:56PM +0100, Xiretza wrote: >> On 02/01/2022 15.48, Allan McRae wrote: >>> Also, I get a very different list of files with and without -b/-d. Without using them I get a lot of system files. With, I just get the package source files. >> >> Right, without -b/-d, it just lists all source files. With just -b, it prints >> all source files rooted in the specified directory, but strips the prefix from >> the output. With both -b and -d, it replaces any base-dir prefixes with >> dest-dir (modifying the binary), then prints all paths rooted in the dest-dir >> (again, stripping the dest-dir prefix in the output), regardless if they've >> always been rooted in dest-dir (due to -fdebug-prefix-map) or if they were >> just transformed from base-dir. >> >> I hope that made sense. > > We can have one function `rewrite_source_list` and one `source_list`? > > `rewrite_source_list` would do the explicit rewrite, if needed. `source_list` just > uses --base-dir and returns the files. > How would it know which one to run? I don't think it's worth the hassle of figuring that out, it's probably fine as-is.
diff --git a/scripts/libmakepkg/tidy/strip.sh.in b/scripts/libmakepkg/tidy/strip.sh.in index 92a6fb15..c1d8ee3c 100644 --- a/scripts/libmakepkg/tidy/strip.sh.in +++ b/scripts/libmakepkg/tidy/strip.sh.in @@ -36,8 +36,11 @@ build_id() { } source_files() { - LANG=C readelf "$1" --debug-dump 2>/dev/null | \ - awk '/DW_AT_name +:/{name=$NF}/DW_AT_comp_dir +:/{{if (name == "<artificial>") next}{if (name !~ /^[<\/]/) {printf "%s/", $NF}}{print name}}' + dbgsrcdir="${DBGSRCDIR:-/usr/src/debug}" + local dbgsrclist="$(mktemp "${startdir}/dbgsrclist.${binary##*/}.XXXXXXXXX")" + LANG=C debugedit -n -b "${srcdir}" -d "${dbgsrcdir}" -l "${dbgsrclist}" "$1" > /dev/null + sort -zu "${dbgsrclist}" | tr '\0' '\n' + rm -f "$dbgsrclist" } strip_file() { @@ -58,9 +61,9 @@ strip_file() { # copy source files to debug directory local file dest t while IFS= read -r t; do - file=${t/${dbgsrcdir}/"$srcdir"} - dest="${dbgsrc/"$dbgsrcdir"/}$t" - if ! [[ -f $dest ]]; then + file="${srcdir}/${t}" + dest="${dbgsrc}/${t}" + if [[ -f "$file" ]] && ! [[ -f $dest ]]; then mkdir -p "${dest%/*}" cp -- "$file" "$dest" fi