[1/1] strip: Use debugedit instead of AWK to parse source files

Message ID 20220102002843.2104841-2-foxboron@archlinux.org
State Accepted, archived
Headers show
Series Use debugedit instead of AWK to parse source files | expand

Commit Message

Morten Linderud Jan. 2, 2022, 12:28 a.m. UTC
From: Morten Linderud <morten@linderud.pw>

This moves us from the fairly ugly AWK parsing line to debugedit which
originally comes out of the rpm project.

The original code has issues parsing anything that was not straight
C/C++ and languages like Rust or Go would return invalid source code
files. debugedit handles all these cases better.

Signed-off-by: Morten Linderud <morten@linderud.pw>
---
 scripts/libmakepkg/tidy/strip.sh.in | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

Comments

Allan McRae Jan. 2, 2022, 5:08 a.m. UTC | #1
On 2/1/22 10:28, Morten Linderud wrote:
> From: Morten Linderud <morten@linderud.pw>
> 
> This moves us from the fairly ugly AWK parsing line to debugedit which
> originally comes out of the rpm project.
> 
> The original code has issues parsing anything that was not straight
> C/C++ and languages like Rust or Go would return invalid source code
> files. debugedit handles all these cases better.
> 
> Signed-off-by: Morten Linderud <morten@linderud.pw>
> ---

I'm really happy this was split into a separate project!  The original 
plan was to use debugedit, but it could not be build without (from 
memory) depending on librpm, which did not seem appropriate!

Add a check for debugedit in scripts/libmakepkg/executable/strip.sh.in

>   scripts/libmakepkg/tidy/strip.sh.in | 13 ++++++++-----
>   1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/scripts/libmakepkg/tidy/strip.sh.in b/scripts/libmakepkg/tidy/strip.sh.in
> index 92a6fb15..c1d8ee3c 100644
> --- a/scripts/libmakepkg/tidy/strip.sh.in
> +++ b/scripts/libmakepkg/tidy/strip.sh.in
> @@ -36,8 +36,11 @@ build_id() {
>   }
>   
>   source_files() {
> -	LANG=C readelf "$1" --debug-dump 2>/dev/null | \
> -		awk '/DW_AT_name +:/{name=$NF}/DW_AT_comp_dir +:/{{if (name == "<artificial>") next}{if (name !~ /^[<\/]/) {printf "%s/", $NF}}{print name}}'
> +	dbgsrcdir="${DBGSRCDIR:-/usr/src/debug}"
> +	local dbgsrclist="$(mktemp  "${startdir}/dbgsrclist.${binary##*/}.XXXXXXXXX")"

I really do not like making temporary files.  Particularly not in 
${startdir}, which can be readonly provided BUILDDIR, SRCDIR, etc are set.

I may accept using $srcdir if writing this to a file is *essential*.

> +	LANG=C debugedit -n -b "${srcdir}" -d "${dbgsrcdir}" -l "${dbgsrclist}" "$1" > /dev/null
> +	sort -zu "${dbgsrclist}" | tr '\0' '\n' > +	rm -f "$dbgsrclist"
>   }
>   
>   strip_file() {
> @@ -58,9 +61,9 @@ strip_file() {
>   		# copy source files to debug directory
>   		local file dest t
>   		while IFS= read -r t; do
> -			file=${t/${dbgsrcdir}/"$srcdir"}
> -			dest="${dbgsrc/"$dbgsrcdir"/}$t"
> -			if ! [[ -f $dest ]]; then
> +			file="${srcdir}/${t}"
> +			dest="${dbgsrc}/${t}"
> +			if [[ -f "$file" ]] && ! [[ -f $dest ]]; then
>   				mkdir -p "${dest%/*}"
>   				cp -- "$file" "$dest"
>   			fi
Xiretza Jan. 2, 2022, 1:25 p.m. UTC | #2
On 02/01/2022 06.08, Allan McRae wrote:
> On 2/1/22 10:28, Morten Linderud wrote:
>>   scripts/libmakepkg/tidy/strip.sh.in | 13 ++++++++-----
>>   1 file changed, 8 insertions(+), 5 deletions(-)
>>
>> diff --git a/scripts/libmakepkg/tidy/strip.sh.in b/scripts/libmakepkg/tidy/strip.sh.in
>> index 92a6fb15..c1d8ee3c 100644
>> --- a/scripts/libmakepkg/tidy/strip.sh.in
>> +++ b/scripts/libmakepkg/tidy/strip.sh.in
>> @@ -36,8 +36,11 @@ build_id() {
>>   }
>>   source_files() {
>> -    LANG=C readelf "$1" --debug-dump 2>/dev/null | \
>> -        awk '/DW_AT_name +:/{name=$NF}/DW_AT_comp_dir +:/{{if (name == "<artificial>") next}{if (name !~ /^[<\/]/) {printf "%s/", $NF}}{print name}}'
>> +    dbgsrcdir="${DBGSRCDIR:-/usr/src/debug}"
>> +    local dbgsrclist="$(mktemp  "${startdir}/dbgsrclist.${binary##*/}.XXXXXXXXX")"
> 
> I really do not like making temporary files.  Particularly not in ${startdir}, which can be readonly provided BUILDDIR, SRCDIR, etc are set.
> 
> I may accept using $srcdir if writing this to a file is *essential*>
>> +    LANG=C debugedit -n -b "${srcdir}" -d "${dbgsrcdir}" -l "${dbgsrclist}" "$1" > /dev/null
>> +    sort -zu "${dbgsrclist}" | tr '\0' '\n'
>> +    rm -f "$dbgsrclist"

I haven't been able to produce any unrelated output on stdout, so just using /dev/stdout for --list-file would probably work ok - to make extra sure, this also works:

debugedit --list-file=/dev/fd/3 "$1" 3>&1 >/dev/null | sort [...]

Tangentially related, what's the opinion on using short vs. long options? Personally, I find that long options make scripts much easier to grok because they save a lot of looking at man pages/--help outputs, but I can also see how they might make things too crowded and busy, especially for common commands. Either way, might be something to include in HACKING.

>>   }
>>   strip_file() {
>> @@ -58,9 +61,9 @@ strip_file() {
>>           # copy source files to debug directory
>>           local file dest t
>>           while IFS= read -r t; do
>> -            file=${t/${dbgsrcdir}/"$srcdir"}
>> -            dest="${dbgsrc/"$dbgsrcdir"/}$t"
>> -            if ! [[ -f $dest ]]; then
>> +            file="${srcdir}/${t}"
>> +            dest="${dbgsrc}/${t}"
>> +            if [[ -f "$file" ]] && ! [[ -f $dest ]]; then
>>                   mkdir -p "${dest%/*}"
>>                   cp -- "$file" "$dest"
>>               fi
>
Xiretza Jan. 2, 2022, 1:34 p.m. UTC | #3
On 02/01/2022 01.28, Morten Linderud wrote:
> From: Morten Linderud <morten@linderud.pw>
> 
> This moves us from the fairly ugly AWK parsing line to debugedit which
> originally comes out of the rpm project.
> 
> The original code has issues parsing anything that was not straight
> C/C++ and languages like Rust or Go would return invalid source code
> files. debugedit handles all these cases better.
> 
> Signed-off-by: Morten Linderud <morten@linderud.pw>
> ---
>   scripts/libmakepkg/tidy/strip.sh.in | 13 ++++++++-----
>   1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/scripts/libmakepkg/tidy/strip.sh.in b/scripts/libmakepkg/tidy/strip.sh.in
> index 92a6fb15..c1d8ee3c 100644
> --- a/scripts/libmakepkg/tidy/strip.sh.in
> +++ b/scripts/libmakepkg/tidy/strip.sh.in
> @@ -36,8 +36,11 @@ build_id() {
>   }
>   
>   source_files() {
> -	LANG=C readelf "$1" --debug-dump 2>/dev/null | \
> -		awk '/DW_AT_name +:/{name=$NF}/DW_AT_comp_dir +:/{{if (name == "<artificial>") next}{if (name !~ /^[<\/]/) {printf "%s/", $NF}}{print name}}'
> +	dbgsrcdir="${DBGSRCDIR:-/usr/src/debug}"
> +	local dbgsrclist="$(mktemp  "${startdir}/dbgsrclist.${binary##*/}.XXXXXXXXX")"
> +	LANG=C debugedit -n -b "${srcdir}" -d "${dbgsrcdir}" -l "${dbgsrclist}" "$1" > /dev/null

-d/--dest-dir actually causes paths embedded in the binary to be *rewritten* from -b/--base-dir. Normally, this shouldn't happen because all paths are already translated by -fdebug-prefix-map before they end up in the binary (making it equivalent to `debugedit --base-dir "${dbgsrcdir}"`) but if there are any $srcdir-based paths left for whatever reason, this modifies the binary.

If this is something we actually want to do here, I think this behaviour is at least worth a source comment.

> +	sort -zu "${dbgsrclist}" | tr '\0' '\n'
> +	rm -f "$dbgsrclist"
>   }
>   
>   strip_file() {
> @@ -58,9 +61,9 @@ strip_file() {
>   		# copy source files to debug directory
>   		local file dest t
>   		while IFS= read -r t; do
> -			file=${t/${dbgsrcdir}/"$srcdir"}
> -			dest="${dbgsrc/"$dbgsrcdir"/}$t"
> -			if ! [[ -f $dest ]]; then
> +			file="${srcdir}/${t}"
> +			dest="${dbgsrc}/${t}"
> +			if [[ -f "$file" ]] && ! [[ -f $dest ]]; then
>   				mkdir -p "${dest%/*}"
>   				cp -- "$file" "$dest"
>   			fi
Allan McRae Jan. 2, 2022, 1:44 p.m. UTC | #4
On 2/1/22 23:34, Xiretza wrote:
> On 02/01/2022 01.28, Morten Linderud wrote:
>> From: Morten Linderud <morten@linderud.pw>
>>
>> This moves us from the fairly ugly AWK parsing line to debugedit which
>> originally comes out of the rpm project.
>>
>> The original code has issues parsing anything that was not straight
>> C/C++ and languages like Rust or Go would return invalid source code
>> files. debugedit handles all these cases better.
>>
>> Signed-off-by: Morten Linderud <morten@linderud.pw>
>> ---
>>   scripts/libmakepkg/tidy/strip.sh.in | 13 ++++++++-----
>>   1 file changed, 8 insertions(+), 5 deletions(-)
>>
>> diff --git a/scripts/libmakepkg/tidy/strip.sh.in 
>> b/scripts/libmakepkg/tidy/strip.sh.in
>> index 92a6fb15..c1d8ee3c 100644
>> --- a/scripts/libmakepkg/tidy/strip.sh.in
>> +++ b/scripts/libmakepkg/tidy/strip.sh.in
>> @@ -36,8 +36,11 @@ build_id() {
>>   }
>>   source_files() {
>> -    LANG=C readelf "$1" --debug-dump 2>/dev/null | \
>> -        awk '/DW_AT_name +:/{name=$NF}/DW_AT_comp_dir +:/{{if (name 
>> == "<artificial>") next}{if (name !~ /^[<\/]/) {printf "%s/", 
>> $NF}}{print name}}'
>> +    dbgsrcdir="${DBGSRCDIR:-/usr/src/debug}"
>> +    local dbgsrclist="$(mktemp  
>> "${startdir}/dbgsrclist.${binary##*/}.XXXXXXXXX")"
>> +    LANG=C debugedit -n -b "${srcdir}" -d "${dbgsrcdir}" -l 
>> "${dbgsrclist}" "$1" > /dev/null
> 
> -d/--dest-dir actually causes paths embedded in the binary to be 
> *rewritten* from -b/--base-dir. Normally, this shouldn't happen because 
> all paths are already translated by -fdebug-prefix-map before they end 
> up in the binary (making it equivalent to `debugedit --base-dir 
> "${dbgsrcdir}"`) but if there are any $srcdir-based paths left for 
> whatever reason, this modifies the binary.
> 
> If this is something we actually want to do here, I think this behaviour 
> is at least worth a source comment.
>

Does it?  During testing the patch I added checksums before and after 
getting the source file list. This command causes no change to the file.

A
Xiretza Jan. 2, 2022, 1:51 p.m. UTC | #5
On 02/01/2022 14.44, Allan McRae wrote:
> On 2/1/22 23:34, Xiretza wrote:
>> On 02/01/2022 01.28, Morten Linderud wrote:
>>> From: Morten Linderud <morten@linderud.pw>
>>>
>>> This moves us from the fairly ugly AWK parsing line to debugedit which
>>> originally comes out of the rpm project.
>>>
>>> The original code has issues parsing anything that was not straight
>>> C/C++ and languages like Rust or Go would return invalid source code
>>> files. debugedit handles all these cases better.
>>>
>>> Signed-off-by: Morten Linderud <morten@linderud.pw>
>>> ---
>>>   scripts/libmakepkg/tidy/strip.sh.in | 13 ++++++++-----
>>>   1 file changed, 8 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/scripts/libmakepkg/tidy/strip.sh.in b/scripts/libmakepkg/tidy/strip.sh.in
>>> index 92a6fb15..c1d8ee3c 100644
>>> --- a/scripts/libmakepkg/tidy/strip.sh.in
>>> +++ b/scripts/libmakepkg/tidy/strip.sh.in
>>> @@ -36,8 +36,11 @@ build_id() {
>>>   }
>>>   source_files() {
>>> -    LANG=C readelf "$1" --debug-dump 2>/dev/null | \
>>> -        awk '/DW_AT_name +:/{name=$NF}/DW_AT_comp_dir +:/{{if (name == "<artificial>") next}{if (name !~ /^[<\/]/) {printf "%s/", $NF}}{print name}}'
>>> +    dbgsrcdir="${DBGSRCDIR:-/usr/src/debug}"
>>> +    local dbgsrclist="$(mktemp "${startdir}/dbgsrclist.${binary##*/}.XXXXXXXXX")"
>>> +    LANG=C debugedit -n -b "${srcdir}" -d "${dbgsrcdir}" -l "${dbgsrclist}" "$1" > /dev/null
>>
>> -d/--dest-dir actually causes paths embedded in the binary to be *rewritten* from -b/--base-dir. Normally, this shouldn't happen because all paths are already translated by -fdebug-prefix-map before they end up in the binary (making it equivalent to `debugedit --base-dir "${dbgsrcdir}"`) but if there are any $srcdir-based paths left for whatever reason, this modifies the binary.
>>
>> If this is something we actually want to do here, I think this behaviour is at least worth a source comment.
>>
> 
> Does it?  During testing the patch I added checksums before and after getting the source file list. This command causes no change to the file.
> 
> A

It does for me:

$ pwd
/tmp/srcdir
$ cat main.c
int main() {return 0;}
$ gcc -g main.c
$ debugedit -l /dev/stdout a.out | tr '\0' '\n'
/tmp/srcdir/main.c
/tmp/srcdir/main.c

$ sha256sum a.out
893824faaf62af0cdd5c23883e45b0bd6f1f526fbe5aa89d357c9acd34bbce3e  a.out
$ debugedit -l /dev/stdout --base-dir /tmp/srcdir --dest-dir /tmp/dbgsrcdir a.out | tr '\0' '\n'
main.c
main.c
$ sha256sum a.out
6e4eebd6863150cb6568d99335ff073a5ddd9585e22c5de5c4dd159e5b6bc0b2  a.out

$ debugedit -l /dev/stdout a.out | tr '\0' '\n'
/tmp/dbgsrcdir/main.c
/tmp/dbgsrcdir/main.c
Allan McRae Jan. 2, 2022, 2:20 p.m. UTC | #6
On 2/1/22 23:51, Xiretza wrote:
> 
> On 02/01/2022 14.44, Allan McRae wrote:
>> On 2/1/22 23:34, Xiretza wrote:
>>> On 02/01/2022 01.28, Morten Linderud wrote:
>>>> From: Morten Linderud <morten@linderud.pw>
>>>>
>>>> This moves us from the fairly ugly AWK parsing line to debugedit which
>>>> originally comes out of the rpm project.
>>>>
>>>> The original code has issues parsing anything that was not straight
>>>> C/C++ and languages like Rust or Go would return invalid source code
>>>> files. debugedit handles all these cases better.
>>>>
>>>> Signed-off-by: Morten Linderud <morten@linderud.pw>
>>>> ---
>>>>   scripts/libmakepkg/tidy/strip.sh.in | 13 ++++++++-----
>>>>   1 file changed, 8 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/scripts/libmakepkg/tidy/strip.sh.in 
>>>> b/scripts/libmakepkg/tidy/strip.sh.in
>>>> index 92a6fb15..c1d8ee3c 100644
>>>> --- a/scripts/libmakepkg/tidy/strip.sh.in
>>>> +++ b/scripts/libmakepkg/tidy/strip.sh.in
>>>> @@ -36,8 +36,11 @@ build_id() {
>>>>   }
>>>>   source_files() {
>>>> -    LANG=C readelf "$1" --debug-dump 2>/dev/null | \
>>>> -        awk '/DW_AT_name +:/{name=$NF}/DW_AT_comp_dir +:/{{if (name 
>>>> == "<artificial>") next}{if (name !~ /^[<\/]/) {printf "%s/", 
>>>> $NF}}{print name}}'
>>>> +    dbgsrcdir="${DBGSRCDIR:-/usr/src/debug}"
>>>> +    local dbgsrclist="$(mktemp 
>>>> "${startdir}/dbgsrclist.${binary##*/}.XXXXXXXXX")"
>>>> +    LANG=C debugedit -n -b "${srcdir}" -d "${dbgsrcdir}" -l 
>>>> "${dbgsrclist}" "$1" > /dev/null
>>>
>>> -d/--dest-dir actually causes paths embedded in the binary to be 
>>> *rewritten* from -b/--base-dir. Normally, this shouldn't happen 
>>> because all paths are already translated by -fdebug-prefix-map before 
>>> they end up in the binary (making it equivalent to `debugedit 
>>> --base-dir "${dbgsrcdir}"`) but if there are any $srcdir-based paths 
>>> left for whatever reason, this modifies the binary.
>>>
>>> If this is something we actually want to do here, I think this 
>>> behaviour is at least worth a source comment.
>>>
>>
>> Does it?  During testing the patch I added checksums before and after 
>> getting the source file list. This command causes no change to the file.
>>
>> A
> 
> It does for me:
> 
> $ pwd
> /tmp/srcdir
> $ cat main.c
> int main() {return 0;}
> $ gcc -g main.c
> $ debugedit -l /dev/stdout a.out | tr '\0' '\n'
> /tmp/srcdir/main.c
> /tmp/srcdir/main.c
> 
> $ sha256sum a.out
> 893824faaf62af0cdd5c23883e45b0bd6f1f526fbe5aa89d357c9acd34bbce3e  a.out
> $ debugedit -l /dev/stdout --base-dir /tmp/srcdir --dest-dir 
> /tmp/dbgsrcdir a.out | tr '\0' '\n'
> main.c
> main.c
> $ sha256sum a.out
> 6e4eebd6863150cb6568d99335ff073a5ddd9585e22c5de5c4dd159e5b6bc0b2  a.out
> 
> $ debugedit -l /dev/stdout a.out | tr '\0' '\n'
> /tmp/dbgsrcdir/main.c
> /tmp/dbgsrcdir/main.c
> .

Add -n.

Allan
Allan McRae Jan. 2, 2022, 2:24 p.m. UTC | #7
On 3/1/22 00:20, Allan McRae wrote:
> On 2/1/22 23:51, Xiretza wrote:
>>
>> On 02/01/2022 14.44, Allan McRae wrote:
>>> On 2/1/22 23:34, Xiretza wrote:
>>>> On 02/01/2022 01.28, Morten Linderud wrote:
>>>>> From: Morten Linderud <morten@linderud.pw>
>>>>>
>>>>> This moves us from the fairly ugly AWK parsing line to debugedit which
>>>>> originally comes out of the rpm project.
>>>>>
>>>>> The original code has issues parsing anything that was not straight
>>>>> C/C++ and languages like Rust or Go would return invalid source code
>>>>> files. debugedit handles all these cases better.
>>>>>
>>>>> Signed-off-by: Morten Linderud <morten@linderud.pw>
>>>>> ---
>>>>>   scripts/libmakepkg/tidy/strip.sh.in | 13 ++++++++-----
>>>>>   1 file changed, 8 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/scripts/libmakepkg/tidy/strip.sh.in 
>>>>> b/scripts/libmakepkg/tidy/strip.sh.in
>>>>> index 92a6fb15..c1d8ee3c 100644
>>>>> --- a/scripts/libmakepkg/tidy/strip.sh.in
>>>>> +++ b/scripts/libmakepkg/tidy/strip.sh.in
>>>>> @@ -36,8 +36,11 @@ build_id() {
>>>>>   }
>>>>>   source_files() {
>>>>> -    LANG=C readelf "$1" --debug-dump 2>/dev/null | \
>>>>> -        awk '/DW_AT_name +:/{name=$NF}/DW_AT_comp_dir +:/{{if 
>>>>> (name == "<artificial>") next}{if (name !~ /^[<\/]/) {printf "%s/", 
>>>>> $NF}}{print name}}'
>>>>> +    dbgsrcdir="${DBGSRCDIR:-/usr/src/debug}"
>>>>> +    local dbgsrclist="$(mktemp 
>>>>> "${startdir}/dbgsrclist.${binary##*/}.XXXXXXXXX")"
>>>>> +    LANG=C debugedit -n -b "${srcdir}" -d "${dbgsrcdir}" -l 
>>>>> "${dbgsrclist}" "$1" > /dev/null
>>>>
>>>> -d/--dest-dir actually causes paths embedded in the binary to be 
>>>> *rewritten* from -b/--base-dir. Normally, this shouldn't happen 
>>>> because all paths are already translated by -fdebug-prefix-map 
>>>> before they end up in the binary (making it equivalent to `debugedit 
>>>> --base-dir "${dbgsrcdir}"`) but if there are any $srcdir-based paths 
>>>> left for whatever reason, this modifies the binary.
>>>>
>>>> If this is something we actually want to do here, I think this 
>>>> behaviour is at least worth a source comment.
>>>>
>>>
>>> Does it?  During testing the patch I added checksums before and after 
>>> getting the source file list. This command causes no change to the file.
>>>
>>> A
>>
>> It does for me:
>>
>> $ pwd
>> /tmp/srcdir
>> $ cat main.c
>> int main() {return 0;}
>> $ gcc -g main.c
>> $ debugedit -l /dev/stdout a.out | tr '\0' '\n'
>> /tmp/srcdir/main.c
>> /tmp/srcdir/main.c
>>
>> $ sha256sum a.out
>> 893824faaf62af0cdd5c23883e45b0bd6f1f526fbe5aa89d357c9acd34bbce3e  a.out
>> $ debugedit -l /dev/stdout --base-dir /tmp/srcdir --dest-dir 
>> /tmp/dbgsrcdir a.out | tr '\0' '\n'
>> main.c
>> main.c
>> $ sha256sum a.out
>> 6e4eebd6863150cb6568d99335ff073a5ddd9585e22c5de5c4dd159e5b6bc0b2  a.out
>>
>> $ debugedit -l /dev/stdout a.out | tr '\0' '\n'
>> /tmp/dbgsrcdir/main.c
>> /tmp/dbgsrcdir/main.c
>> .
> 
> Add -n.
> 


For a better example, here is my testing code:

source_files() {
	dbgsrcdir="${DBGSRCDIR:-/usr/src/debug}"
	local dbgsrclist="$(mktemp 
"${startdir}/dbgsrclist.${binary##*/}.XXXXXXXXX")"
echo $1 >> $startdir/dbginfo
echo sha256sum-orig: $(sha256sum $1) >> $startdir/dbginfo
	LANG=C debugedit -n -b "${srcdir}" -d "${dbgsrcdir}" -l "${dbgsrclist}" 
"$1" > /dev/null
	sort -zu "${dbgsrclist}" | tr '\0' '\n'
sort -zu "${dbgsrclist}" | tr '\0' '\n' >> $startdir/dbginfo
echo sha256sum-after: $(sha256sum $1) >> $startdir/dbginfo
	rm -f "$dbgsrclist"
}


and a snippet of the output:

./usr/bin/vercmp
sha256sum-orig: 
844e2a18277df5d46544fc977a028b02b58d642bc9754d7d9868197d23f42407 
./usr/bin/vercmp
pacman/builddir/<artificial>
pacman/builddir/<built-in>
pacman/lib/libalpm/version.c
pacman/src/util/vercmp.c
sha256sum-after: 
844e2a18277df5d46544fc977a028b02b58d642bc9754d7d9868197d23f42407 
./usr/bin/vercmp
Xiretza Jan. 2, 2022, 2:29 p.m. UTC | #8
On 02/01/2022 15.24, Allan McRae wrote:
> On 3/1/22 00:20, Allan McRae wrote:
>>
>> Add -n.
>>

No change.

> 
> For a better example, here is my testing code:
> 
> source_files() {
>      dbgsrcdir="${DBGSRCDIR:-/usr/src/debug}"
>      local dbgsrclist="$(mktemp "${startdir}/dbgsrclist.${binary##*/}.XXXXXXXXX")"
> echo $1 >> $startdir/dbginfo
> echo sha256sum-orig: $(sha256sum $1) >> $startdir/dbginfo
>      LANG=C debugedit -n -b "${srcdir}" -d "${dbgsrcdir}" -l "${dbgsrclist}" "$1" > /dev/null
>      sort -zu "${dbgsrclist}" | tr '\0' '\n'
> sort -zu "${dbgsrclist}" | tr '\0' '\n' >> $startdir/dbginfo
> echo sha256sum-after: $(sha256sum $1) >> $startdir/dbginfo
>      rm -f "$dbgsrclist"
> }
> 
> 
> and a snippet of the output:
> 
> ./usr/bin/vercmp
> sha256sum-orig: 844e2a18277df5d46544fc977a028b02b58d642bc9754d7d9868197d23f42407 ./usr/bin/vercmp
> pacman/builddir/<artificial>
> pacman/builddir/<built-in>
> pacman/lib/libalpm/version.c
> pacman/src/util/vercmp.c
> sha256sum-after: 844e2a18277df5d46544fc977a028b02b58d642bc9754d7d9868197d23f42407 ./usr/bin/vercmp
> 
> 

Is it possible that there are simply no source file entries referencing $srcdir (because -fdebug-prefix-map is working as expected)? As I said, if that's the case, the binary is not modified because there are no occurrences of $srcdir to be rewritten to $dbgsrcdir.
Allan McRae Jan. 2, 2022, 2:48 p.m. UTC | #9
On 3/1/22 00:29, Xiretza wrote:
> 
> On 02/01/2022 15.24, Allan McRae wrote:
>> On 3/1/22 00:20, Allan McRae wrote:
>>>
>>> Add -n.
>>>
> 
> No change.
> 
>>
>> For a better example, here is my testing code:
>>
>> source_files() {
>>      dbgsrcdir="${DBGSRCDIR:-/usr/src/debug}"
>>      local dbgsrclist="$(mktemp 
>> "${startdir}/dbgsrclist.${binary##*/}.XXXXXXXXX")"
>> echo $1 >> $startdir/dbginfo
>> echo sha256sum-orig: $(sha256sum $1) >> $startdir/dbginfo
>>      LANG=C debugedit -n -b "${srcdir}" -d "${dbgsrcdir}" -l 
>> "${dbgsrclist}" "$1" > /dev/null
>>      sort -zu "${dbgsrclist}" | tr '\0' '\n'
>> sort -zu "${dbgsrclist}" | tr '\0' '\n' >> $startdir/dbginfo
>> echo sha256sum-after: $(sha256sum $1) >> $startdir/dbginfo
>>      rm -f "$dbgsrclist"
>> }
>>
>>
>> and a snippet of the output:
>>
>> ./usr/bin/vercmp
>> sha256sum-orig: 
>> 844e2a18277df5d46544fc977a028b02b58d642bc9754d7d9868197d23f42407 
>> ./usr/bin/vercmp
>> pacman/builddir/<artificial>
>> pacman/builddir/<built-in>
>> pacman/lib/libalpm/version.c
>> pacman/src/util/vercmp.c
>> sha256sum-after: 
>> 844e2a18277df5d46544fc977a028b02b58d642bc9754d7d9868197d23f42407 
>> ./usr/bin/vercmp
> 
> Is it possible that there are simply no source file entries referencing 
> $srcdir (because -fdebug-prefix-map is working as expected)? As I said, 
> if that's the case, the binary is not modified because there are no 
> occurrences of $srcdir to be rewritten to $dbgsrcdir.

That is possible - I mostly tested packages I know obey CFLAGS.  Saying 
that, any change would be a good thing, or else those source files 
placed in ${dbgsrcdir} are kindof useless!  So I'm happy to have this 
rewrite paths - we are not regenerating the build-id, so I'm assuming 
this does not cause package reproducibility issues...

Also, I get a very different list of files with and without -b/-d. 
Without using them I get a lot of system files.  With, I just get the 
package source files.

A
Morten Linderud Jan. 2, 2022, 2:58 p.m. UTC | #10
On Mon, Jan 03, 2022 at 12:48:05AM +1000, Allan McRae wrote:
> On 3/1/22 00:29, Xiretza wrote:
> > 
> > On 02/01/2022 15.24, Allan McRae wrote:
> > > On 3/1/22 00:20, Allan McRae wrote:
> > > > 
> > > > Add -n.
> > > > 
> > 
> > No change.
> > 
> > > 
> > > For a better example, here is my testing code:
> > > 
> > > source_files() {
> > >      dbgsrcdir="${DBGSRCDIR:-/usr/src/debug}"
> > >      local dbgsrclist="$(mktemp
> > > "${startdir}/dbgsrclist.${binary##*/}.XXXXXXXXX")"
> > > echo $1 >> $startdir/dbginfo
> > > echo sha256sum-orig: $(sha256sum $1) >> $startdir/dbginfo
> > >      LANG=C debugedit -n -b "${srcdir}" -d "${dbgsrcdir}" -l
> > > "${dbgsrclist}" "$1" > /dev/null
> > >      sort -zu "${dbgsrclist}" | tr '\0' '\n'
> > > sort -zu "${dbgsrclist}" | tr '\0' '\n' >> $startdir/dbginfo
> > > echo sha256sum-after: $(sha256sum $1) >> $startdir/dbginfo
> > >      rm -f "$dbgsrclist"
> > > }
> > > 
> > > 
> > > and a snippet of the output:
> > > 
> > > ./usr/bin/vercmp
> > > sha256sum-orig:
> > > 844e2a18277df5d46544fc977a028b02b58d642bc9754d7d9868197d23f42407
> > > ./usr/bin/vercmp
> > > pacman/builddir/<artificial>
> > > pacman/builddir/<built-in>
> > > pacman/lib/libalpm/version.c
> > > pacman/src/util/vercmp.c
> > > sha256sum-after:
> > > 844e2a18277df5d46544fc977a028b02b58d642bc9754d7d9868197d23f42407
> > > ./usr/bin/vercmp
> > 
> > Is it possible that there are simply no source file entries referencing
> > $srcdir (because -fdebug-prefix-map is working as expected)? As I said,
> > if that's the case, the binary is not modified because there are no
> > occurrences of $srcdir to be rewritten to $dbgsrcdir.
> 
> That is possible - I mostly tested packages I know obey CFLAGS.  Saying
> that, any change would be a good thing, or else those source files placed in
> ${dbgsrcdir} are kindof useless!  So I'm happy to have this rewrite paths -
> we are not regenerating the build-id, so I'm assuming this does not cause
> package reproducibility issues...
> 
> Also, I get a very different list of files with and without -b/-d. Without
> using them I get a lot of system files.  With, I just get the package source
> files.
> 
> A

Xiretza is correct. It will rewrite the Go binaries and add correct paths since
we do not have a prefix-map option in the Go compiler. However it won't change
anything that utilizies prefix-map correctly.

There might be a catch here, but I haven't found it.
Xiretza Jan. 2, 2022, 3 p.m. UTC | #11
On 02/01/2022 15.48, Allan McRae wrote:
> On 3/1/22 00:29, Xiretza wrote:
>>
>> On 02/01/2022 15.24, Allan McRae wrote:
>>> On 3/1/22 00:20, Allan McRae wrote:
>>>>
>>>> Add -n.
>>>>
>>
>> No change.
>>
>>>
>>> For a better example, here is my testing code:
>>>
>>> source_files() {
>>>      dbgsrcdir="${DBGSRCDIR:-/usr/src/debug}"
>>>      local dbgsrclist="$(mktemp "${startdir}/dbgsrclist.${binary##*/}.XXXXXXXXX")"
>>> echo $1 >> $startdir/dbginfo
>>> echo sha256sum-orig: $(sha256sum $1) >> $startdir/dbginfo
>>>      LANG=C debugedit -n -b "${srcdir}" -d "${dbgsrcdir}" -l "${dbgsrclist}" "$1" > /dev/null
>>>      sort -zu "${dbgsrclist}" | tr '\0' '\n'
>>> sort -zu "${dbgsrclist}" | tr '\0' '\n' >> $startdir/dbginfo
>>> echo sha256sum-after: $(sha256sum $1) >> $startdir/dbginfo
>>>      rm -f "$dbgsrclist"
>>> }
>>>
>>>
>>> and a snippet of the output:
>>>
>>> ./usr/bin/vercmp
>>> sha256sum-orig: 844e2a18277df5d46544fc977a028b02b58d642bc9754d7d9868197d23f42407 ./usr/bin/vercmp
>>> pacman/builddir/<artificial>
>>> pacman/builddir/<built-in>
>>> pacman/lib/libalpm/version.c
>>> pacman/src/util/vercmp.c
>>> sha256sum-after: 844e2a18277df5d46544fc977a028b02b58d642bc9754d7d9868197d23f42407 ./usr/bin/vercmp
>>
>> Is it possible that there are simply no source file entries referencing $srcdir (because -fdebug-prefix-map is working as expected)? As I said, if that's the case, the binary is not modified because there are no occurrences of $srcdir to be rewritten to $dbgsrcdir.
> 
> That is possible - I mostly tested packages I know obey CFLAGS.  Saying that, any change would be a good thing, or else those source files placed in ${dbgsrcdir} are kindof useless!  So I'm happy to have this rewrite paths - we are not regenerating the build-id, so I'm assuming this does not cause package reproducibility issues...

Makes sense, I can't think of any issues this would cause either - the source file paths have to be deterministic anyway, applying a deterministic transform on top won't change that.

Still, having a function called "source_files" actually modify the passed binary deserves a comment, I think.

> 
> Also, I get a very different list of files with and without -b/-d. Without using them I get a lot of system files.  With, I just get the package source files.

Right, without -b/-d, it just lists all source files.
With just -b, it prints all source files rooted in the specified directory, but strips the prefix from the output.
With both -b and -d, it replaces any base-dir prefixes with dest-dir (modifying the binary), then prints all paths rooted in the dest-dir (again, stripping the dest-dir prefix in the output), regardless if they've always been rooted in dest-dir (due to -fdebug-prefix-map) or if they were just transformed from base-dir.

I hope that made sense.

> 
> A
Allan McRae Jan. 2, 2022, 3:05 p.m. UTC | #12
On 3/1/22 01:00, Xiretza wrote:
> On 02/01/2022 15.48, Allan McRae wrote:
>> On 3/1/22 00:29, Xiretza wrote:
>>>
>>> On 02/01/2022 15.24, Allan McRae wrote:
>>>> On 3/1/22 00:20, Allan McRae wrote:
>>>>>
>>>>> Add -n.
>>>>>
>>>
>>> No change.
>>>
>>>>
>>>> For a better example, here is my testing code:
>>>>
>>>> source_files() {
>>>>      dbgsrcdir="${DBGSRCDIR:-/usr/src/debug}"
>>>>      local dbgsrclist="$(mktemp 
>>>> "${startdir}/dbgsrclist.${binary##*/}.XXXXXXXXX")"
>>>> echo $1 >> $startdir/dbginfo
>>>> echo sha256sum-orig: $(sha256sum $1) >> $startdir/dbginfo
>>>>      LANG=C debugedit -n -b "${srcdir}" -d "${dbgsrcdir}" -l 
>>>> "${dbgsrclist}" "$1" > /dev/null
>>>>      sort -zu "${dbgsrclist}" | tr '\0' '\n'
>>>> sort -zu "${dbgsrclist}" | tr '\0' '\n' >> $startdir/dbginfo
>>>> echo sha256sum-after: $(sha256sum $1) >> $startdir/dbginfo
>>>>      rm -f "$dbgsrclist"
>>>> }
>>>>
>>>>
>>>> and a snippet of the output:
>>>>
>>>> ./usr/bin/vercmp
>>>> sha256sum-orig: 
>>>> 844e2a18277df5d46544fc977a028b02b58d642bc9754d7d9868197d23f42407 
>>>> ./usr/bin/vercmp
>>>> pacman/builddir/<artificial>
>>>> pacman/builddir/<built-in>
>>>> pacman/lib/libalpm/version.c
>>>> pacman/src/util/vercmp.c
>>>> sha256sum-after: 
>>>> 844e2a18277df5d46544fc977a028b02b58d642bc9754d7d9868197d23f42407 
>>>> ./usr/bin/vercmp
>>>
>>> Is it possible that there are simply no source file entries 
>>> referencing $srcdir (because -fdebug-prefix-map is working as 
>>> expected)? As I said, if that's the case, the binary is not modified 
>>> because there are no occurrences of $srcdir to be rewritten to 
>>> $dbgsrcdir.
>>
>> That is possible - I mostly tested packages I know obey CFLAGS.  
>> Saying that, any change would be a good thing, or else those source 
>> files placed in ${dbgsrcdir} are kindof useless!  So I'm happy to have 
>> this rewrite paths - we are not regenerating the build-id, so I'm 
>> assuming this does not cause package reproducibility issues...
> 
> Makes sense, I can't think of any issues this would cause either - the 
> source file paths have to be deterministic anyway, applying a 
> deterministic transform on top won't change that.
> 
> Still, having a function called "source_files" actually modify the 
> passed binary deserves a comment, I think.
> 
Alternatively, happy for the function to be renamed something better.

A
Morten Linderud Jan. 2, 2022, 3:18 p.m. UTC | #13
On Sun, Jan 02, 2022 at 04:00:56PM +0100, Xiretza wrote:
> On 02/01/2022 15.48, Allan McRae wrote:
> > Also, I get a very different list of files with and without -b/-d. Without using them I get a lot of system files.  With, I just get the package source files.
> 
> Right, without -b/-d, it just lists all source files.  With just -b, it prints
> all source files rooted in the specified directory, but strips the prefix from
> the output.  With both -b and -d, it replaces any base-dir prefixes with
> dest-dir (modifying the binary), then prints all paths rooted in the dest-dir
> (again, stripping the dest-dir prefix in the output), regardless if they've
> always been rooted in dest-dir (due to -fdebug-prefix-map) or if they were
> just transformed from base-dir.
> 
> I hope that made sense.

We can have one function `rewrite_source_list` and one `source_list`?

`rewrite_source_list` would do the explicit rewrite, if needed. `source_list` just
uses --base-dir and returns the files.
Xiretza Jan. 2, 2022, 3:34 p.m. UTC | #14
On 02/01/2022 16.18, Morten Linderud wrote:
> On Sun, Jan 02, 2022 at 04:00:56PM +0100, Xiretza wrote:
>> On 02/01/2022 15.48, Allan McRae wrote:
>>> Also, I get a very different list of files with and without -b/-d. Without using them I get a lot of system files.  With, I just get the package source files.
>>
>> Right, without -b/-d, it just lists all source files.  With just -b, it prints
>> all source files rooted in the specified directory, but strips the prefix from
>> the output.  With both -b and -d, it replaces any base-dir prefixes with
>> dest-dir (modifying the binary), then prints all paths rooted in the dest-dir
>> (again, stripping the dest-dir prefix in the output), regardless if they've
>> always been rooted in dest-dir (due to -fdebug-prefix-map) or if they were
>> just transformed from base-dir.
>>
>> I hope that made sense.
> 
> We can have one function `rewrite_source_list` and one `source_list`?
> 
> `rewrite_source_list` would do the explicit rewrite, if needed. `source_list` just
> uses --base-dir and returns the files.
> 

How would it know which one to run? I don't think it's worth the hassle of figuring that out, it's probably fine as-is.

Patch

diff --git a/scripts/libmakepkg/tidy/strip.sh.in b/scripts/libmakepkg/tidy/strip.sh.in
index 92a6fb15..c1d8ee3c 100644
--- a/scripts/libmakepkg/tidy/strip.sh.in
+++ b/scripts/libmakepkg/tidy/strip.sh.in
@@ -36,8 +36,11 @@  build_id() {
 }
 
 source_files() {
-	LANG=C readelf "$1" --debug-dump 2>/dev/null | \
-		awk '/DW_AT_name +:/{name=$NF}/DW_AT_comp_dir +:/{{if (name == "<artificial>") next}{if (name !~ /^[<\/]/) {printf "%s/", $NF}}{print name}}'
+	dbgsrcdir="${DBGSRCDIR:-/usr/src/debug}"
+	local dbgsrclist="$(mktemp  "${startdir}/dbgsrclist.${binary##*/}.XXXXXXXXX")"
+	LANG=C debugedit -n -b "${srcdir}" -d "${dbgsrcdir}" -l "${dbgsrclist}" "$1" > /dev/null
+	sort -zu "${dbgsrclist}" | tr '\0' '\n'
+	rm -f "$dbgsrclist"
 }
 
 strip_file() {
@@ -58,9 +61,9 @@  strip_file() {
 		# copy source files to debug directory
 		local file dest t
 		while IFS= read -r t; do
-			file=${t/${dbgsrcdir}/"$srcdir"}
-			dest="${dbgsrc/"$dbgsrcdir"/}$t"
-			if ! [[ -f $dest ]]; then
+			file="${srcdir}/${t}"
+			dest="${dbgsrc}/${t}"
+			if [[ -f "$file" ]] && ! [[ -f $dest ]]; then
 				mkdir -p "${dest%/*}"
 				cp -- "$file" "$dest"
 			fi