[pacman-dev] Replace MD5 with SHA-256 as a default file integrity check in PKGBUILDs

Message ID 87k15j55hg.fsf@arch2.laptop.juraszek.xyz
State Rejected, archived
Headers show
Series [pacman-dev] Replace MD5 with SHA-256 as a default file integrity check in PKGBUILDs | expand

Commit Message

Artur Juraszek Jan. 23, 2020, 1:25 a.m. UTC
Hi all,

While poking through Arch's package system, I noticed that despite its
bad reputation, MD5 remains a default, and even some kind of a "recommendation", due
to its presence in the example PKBUILDs, hashing algorithm for file integrity verification.

Is there a reason to not have it changed to a more future-proof one? I mean, at least for now,
it seems good enough to protect before a so-called "2nd preimage attack", which is the primary
concern in the classic file verification scenario, BUT:

a) given the huge size of AUR and its rather chaotic nature, it is not that hard to imagine
_a_ malicious upstream which could try to sneak some nasty changes in its own files,
with AUR maintainer not noticing anything - leveraging flaws which do exist and are quite
well-explored even today.

b) it's already shown its weaknesses and it is not going to be any better - the only research direction
is to found more (practical) attacks against MD5, so faster the change, fewer the people possibly
affected in the future

Attaching a patch which, I think, replaces MD5 with SHA256 as a default completely - it's my first
change in ABS-related code, though, so please do not hesitate to criticize if something's wrong ;]

--
Artur Juraszek

Comments

Allan McRae Jan. 23, 2020, 1:35 a.m. UTC | #1
On 23/1/20 11:25 am, Artur Juraszek wrote:
> Hi all,
> 
> While poking through Arch's package system, I noticed that despite its
> bad reputation, MD5 remains a default, and even some kind of a "recommendation", due
> to its presence in the example PKBUILDs, hashing algorithm for file integrity verification.
> 
> Is there a reason to not have it changed to a more future-proof one? I mean, at least for now,
> it seems good enough to protect before a so-called "2nd preimage attack", which is the primary
> concern in the classic file verification scenario, BUT:
> 
> a) given the huge size of AUR and its rather chaotic nature, it is not that hard to imagine
> _a_ malicious upstream which could try to sneak some nasty changes in its own files,
> with AUR maintainer not noticing anything - leveraging flaws which do exist and are quite
> well-explored even today.
> 
> b) it's already shown its weaknesses and it is not going to be any better - the only research direction
> is to found more (practical) attacks against MD5, so faster the change, fewer the people possibly
> affected in the future
> 
> Attaching a patch which, I think, replaces MD5 with SHA256 as a default completely - it's my first
> change in ABS-related code, though, so please do not hesitate to criticize if something's wrong ;]
> 

This change is not happening.   Any checksum is insecure when added to a
PKGBUILD using "makepkg -g", which is all the default value does.  The
person writing a PKGBUILD needs to use what is provided upstream (or
even a PGP signature), in which case the default in makepkg does not
make a difference.

Allan
Filipe Laíns Jan. 23, 2020, 1:36 a.m. UTC | #2
On Thu, 2020-01-23 at 02:25 +0100, Artur Juraszek wrote:
> Hi all,
> 
> While poking through Arch's package system, I noticed that despite its
> bad reputation, MD5 remains a default, and even some kind of a "recommendation", due
> to its presence in the example PKBUILDs, hashing algorithm for file integrity verification.
> 
> Is there a reason to not have it changed to a more future-proof one? I mean, at least for now,
> it seems good enough to protect before a so-called "2nd preimage attack", which is the primary
> concern in the classic file verification scenario, BUT:
> 
> a) given the huge size of AUR and its rather chaotic nature, it is not that hard to imagine
> _a_ malicious upstream which could try to sneak some nasty changes in its own files,
> with AUR maintainer not noticing anything - leveraging flaws which do exist and are quite
> well-explored even today.
> 
> b) it's already shown its weaknesses and it is not going to be any better - the only research direction
> is to found more (practical) attacks against MD5, so faster the change, fewer the people possibly
> affected in the future
> 
> Attaching a patch which, I think, replaces MD5 with SHA256 as a default completely - it's my first
> change in ABS-related code, though, so please do not hesitate to criticize if something's wrong ;]
> 
> --
> Artur Juraszek

I think we should change it to sha512 instead. sha256 and sha512 are
pretty similar but sha512 is faster on 64-bit machine. Since 64-bit is
the new standard for high-power computing, and the only architecture we
support, it would be more beneficial to chose sha512.

A quick benchmark on my machine confirms this:

$ dd if=/dev/zero of=example.img bs=4096 count=512000
512000+0 records in
512000+0 records out
2097152000 bytes (2.1 GB, 2.0 GiB) copied, 2.77283 s, 756 MB/s

$ time sha256sum example.img
274fbb979251bcaceab594dd89d5adfec310e8851e320b5b5f90fd5f18d76149  examp
le.img
real 4.79
user 4.47
sys 0.30

$ time sha512sum example.img
241497cb61e24fcdaf33a13f5635951ff7c21cb27904e6f3de7b221031b0216800cbce1
a667a66aafbdb7ffbfe2a39564b4cb48efea1d3721093fa7663e7a8c9  example.img
real 3.33
user 3.09
sys 0.21

sha512 is ~1.5s than sha256 when calculating the checksum of a 2GiB
zero-ed file.

Thank you,
Filipe Laíns
Filipe Laíns Jan. 23, 2020, 1:38 a.m. UTC | #3
On Thu, 2020-01-23 at 01:36 +0000, Filipe Laíns wrote:
> On Thu, 2020-01-23 at 02:25 +0100, Artur Juraszek wrote:
> > Hi all,
> > 
> > While poking through Arch's package system, I noticed that despite its
> > bad reputation, MD5 remains a default, and even some kind of a "recommendation", due
> > to its presence in the example PKBUILDs, hashing algorithm for file integrity verification.
> > 
> > Is there a reason to not have it changed to a more future-proof one? I mean, at least for now,
> > it seems good enough to protect before a so-called "2nd preimage attack", which is the primary
> > concern in the classic file verification scenario, BUT:
> > 
> > a) given the huge size of AUR and its rather chaotic nature, it is not that hard to imagine
> > _a_ malicious upstream which could try to sneak some nasty changes in its own files,
> > with AUR maintainer not noticing anything - leveraging flaws which do exist and are quite
> > well-explored even today.
> > 
> > b) it's already shown its weaknesses and it is not going to be any better - the only research direction
> > is to found more (practical) attacks against MD5, so faster the change, fewer the people possibly
> > affected in the future
> > 
> > Attaching a patch which, I think, replaces MD5 with SHA256 as a default completely - it's my first
> > change in ABS-related code, though, so please do not hesitate to criticize if something's wrong ;]
> > 
> > --
> > Artur Juraszek
> 
> I think we should change it to sha512 instead. sha256 and sha512 are
> pretty similar but sha512 is faster on 64-bit machine. Since 64-bit is
> the new standard for high-power computing, and the only architecture we
> support, it would be more beneficial to chose sha512.
> 
> A quick benchmark on my machine confirms this:
> 
> $ dd if=/dev/zero of=example.img bs=4096 count=512000
> 512000+0 records in
> 512000+0 records out
> 2097152000 bytes (2.1 GB, 2.0 GiB) copied, 2.77283 s, 756 MB/s
> 
> $ time sha256sum example.img
> 274fbb979251bcaceab594dd89d5adfec310e8851e320b5b5f90fd5f18d76149  examp
> le.img
> real 4.79
> user 4.47
> sys 0.30
> 
> $ time sha512sum example.img
> 241497cb61e24fcdaf33a13f5635951ff7c21cb27904e6f3de7b221031b0216800cbce1
> a667a66aafbdb7ffbfe2a39564b4cb48efea1d3721093fa7663e7a8c9  example.img
> real 3.33
> user 3.09
> sys 0.21
> 
> sha512 is ~1.5s than sha256 when calculating the checksum of a 2GiB
                  ^ *faster
> zero-ed file.
> 
> Thank you,
> Filipe Laíns
Eli Schwartz Jan. 23, 2020, 2:30 a.m. UTC | #4
On 1/22/20 8:25 PM, Artur Juraszek wrote:
> Hi all,
> 
> While poking through Arch's package system, I noticed that despite its
> bad reputation, MD5 remains a default, and even some kind of a "recommendation", due
> to its presence in the example PKBUILDs, hashing algorithm for file integrity verification.
> 
> Is there a reason to not have it changed to a more future-proof one? I mean, at least for now,
> it seems good enough to protect before a so-called "2nd preimage attack", which is the primary
> concern in the classic file verification scenario, BUT:
> 
> a) given the huge size of AUR and its rather chaotic nature, it is not that hard to imagine
> _a_ malicious upstream which could try to sneak some nasty changes in its own files,
> with AUR maintainer not noticing anything - leveraging flaws which do exist and are quite
> well-explored even today.
> 
> b) it's already shown its weaknesses and it is not going to be any better - the only research direction
> is to found more (practical) attacks against MD5, so faster the change, fewer the people possibly
> affected in the future
> 
> Attaching a patch which, I think, replaces MD5 with SHA256 as a default completely - it's my first
> change in ABS-related code, though, so please do not hesitate to criticize if something's wrong ;]

This point has been raised a number of times. Here's the standard answer:

checksums are not intended to prove that the download is trusted. It is
an "integrity checksum", not an "authenticity checksum". Its purpose is
to validate that you didn't have an interrupted download, or have a
cached version of an unversioned filename, or that gamma rays didn't
modify your disk and corrupt everything, etc. If you want authenticity,
that is what PGP is for. PGP actually proves that the source code was
created or authorized by a given *identity*.

The default checksums are not used by security-conscious people, and
aren't used in PKGBUILDs that define a different checksum anyway.
Security conscious people also do out of band checks, or inspect the
sources (maybe diffing it against the known previous version).

The opinion of Allan, the lead developer, is that defaulting to md5sums
is a good thing -- it makes it easy to spot people who just use the
defaults and run updpkgsums without checking anything, whereas
defaulting to sha256sums would result in him ironically having *less*
trust in the package.

"Securely verifying the unknown is still the unknown."

...

I happen to think that there is merit to using sha256sums by default, on
one specific rationale: it provides TOFU (Trust On First Use). For
people who are totally careless about their packaging, you cannot trust
the sources they list, but you can at least make sure that everyone is
using the same sources, and that if an attacker attacked the source
*after* the PKGBUILD is uploaded, you cannot maliciously change the
sources which *some* people use. (But attackers can still successfully
attack the source before it is first used, and then you're screwed).

...

So ultimately that is what this discussion will always devolve to:

- Do we want to ensure TOFU?
- Do we want to give PKGBUILDs the default black mark "uses md5sums
  because maintainer doesn't care about researching sources"?
Giancarlo Razzolini Jan. 23, 2020, 1:32 p.m. UTC | #5
Em janeiro 22, 2020 23:30 Eli Schwartz escreveu:
> So ultimately that is what this discussion will always devolve to:
> 
> - Do we want to ensure TOFU?

Yes.

> - Do we want to give PKGBUILDs the default black mark "uses md5sums
>   because maintainer doesn't care about researching sources"?
> 

No. Encouraging best packaging practices can and should be done right
from the start.

This discussion is pointless though. Let's continue to use md5sums until
it's completely broken, then we can switch to something else.

Regards,
Giancarlo Razzolini
Eli Schwartz Jan. 23, 2020, 2:59 p.m. UTC | #6
On 1/23/20 8:32 AM, Giancarlo Razzolini wrote:
> Em janeiro 22, 2020 23:30 Eli Schwartz escreveu:
>> So ultimately that is what this discussion will always devolve to:
>>
>> - Do we want to ensure TOFU?
> 
> Yes.
> 
>> - Do we want to give PKGBUILDs the default black mark "uses md5sums
>>   because maintainer doesn't care about researching sources"?
>>
> 
> No. Encouraging best packaging practices can and should be done right
> from the start.
> 
> This discussion is pointless though. Let's continue to use md5sums until
> it's completely broken, then we can switch to something else.

Then I'm sure you'll be delighted to know that the last time this
discussion was brought up (a couple years ago?) Allan said he wanted to
add "cksum" support and switch to that for a default. Rationale: both
md5sum and cksum are already completely broken, but no one deludes
themselves when they see "cksum" into thinking that it is anything but
deliberate, and no one deludes themselves into thinking that there is
any possibility it is secure.

(The same thing is true of md5sum, both that its presence in makepkg is
deliberate, and that it's not even intended to be secure. The difference
is that with md5sum, people can lie to themselves about both.)

And, sure enough, someone brought up the discussion again, and, sure
enough, Allan has fulfilled on his promise with the patch submission
which is a response to this thread:

"makepkg: add CRC checksums and set these to be the default"
Giancarlo Razzolini Jan. 23, 2020, 3:07 p.m. UTC | #7
Em janeiro 23, 2020 11:59 Eli Schwartz escreveu:
> 
> Then I'm sure you'll be delighted to know that the last time this
> discussion was brought up (a couple years ago?) Allan said he wanted to
> add "cksum" support and switch to that for a default. Rationale: both
> md5sum and cksum are already completely broken, but no one deludes
> themselves when they see "cksum" into thinking that it is anything but
> deliberate, and no one deludes themselves into thinking that there is
> any possibility it is secure.
>

That's the opposite of encouraging best practices, but this horse is long
dead, and there's nothing else to beat.

> 
> "makepkg: add CRC checksums and set these to be the default"
> 

No comment on this one.

Regards,
Giancarlo Razzolini

Patch

From fa26959e403f5f66c9f8da00d4111145c7c0cac2 Mon Sep 17 00:00:00 2001
From: Artur Juraszek <artur@juraszek.xyz>
Date: Thu, 23 Jan 2020 02:13:05 +0100
Subject: [PATCH] Replace MD5 with SHA-256 as a default file integrity check in
 PKGBUILDs

---
 doc/PKGBUILD-example.txt        |  4 ++--
 doc/PKGBUILD.5.asciidoc         | 12 ++++++------
 doc/makepkg-template.1.asciidoc |  2 +-
 etc/makepkg.conf.in             |  2 +-
 proto/PKGBUILD-split.proto      |  2 +-
 proto/PKGBUILD-vcs.proto        |  2 +-
 proto/PKGBUILD.proto            |  2 +-
 7 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/doc/PKGBUILD-example.txt b/doc/PKGBUILD-example.txt
index 910fd068..d4e1c9c1 100644
--- a/doc/PKGBUILD-example.txt
+++ b/doc/PKGBUILD-example.txt
@@ -12,8 +12,8 @@  depends=('glibc')
 makedepends=('ed')
 optdepends=('ed: for "patch -e" functionality')
 source=("ftp://ftp.gnu.org/gnu/$pkgname/$pkgname-$pkgver.tar.xz"{,.sig})
-md5sums=('e9ae5393426d3ad783a300a338c09b72'
-         'SKIP')
+sha256sums=('9124ba46db0abd873d0995c2ca880e81252676bb6c03e0a37dfc5f608a9b0ceb'
+            'SKIP')
 
 build() {
 	cd "$srcdir/$pkgname-$pkgver"
diff --git a/doc/PKGBUILD.5.asciidoc b/doc/PKGBUILD.5.asciidoc
index ef53c0ee..7b5d6d2b 100644
--- a/doc/PKGBUILD.5.asciidoc
+++ b/doc/PKGBUILD.5.asciidoc
@@ -146,17 +146,17 @@  contain whitespace characters.
 	listed here will not be extracted with the rest of the source files. This
 	is useful for packages that use compressed data directly.
 
-*md5sums (array)*::
-	This array contains an MD5 hash for every source file specified in the
+*sha256sums (array)*::
+	This array contains a SHA-256 hash for every source file specified in the
 	source array (in the same order). makepkg will use this to verify source
 	file integrity during subsequent builds. If 'SKIP' is put in the array
 	in place of a normal hash, the integrity check for that source file will
-	be skipped. To easily generate md5sums, run ``makepkg -g >> PKGBUILD''.
-	If desired, move the md5sums line to an appropriate location.
+	be skipped. To easily generate sha256sums, run ``makepkg -g >> PKGBUILD''.
+	If desired, move the sha256sums line to an appropriate location.
 
-*sha1sums, sha224sums, sha256sums, sha384sums, sha512sums, b2sums (arrays)*::
+*md5sums, sha1sums, sha224sums, sha384sums, sha512sums, b2sums (arrays)*::
 	Alternative integrity checks that makepkg supports; these all behave
-	similar to the md5sums option described above. To enable use and generation
+	similar to the sha256sums option described above. To enable use and generation
 	of these checksums, be sure to set up the `INTEGRITY_CHECK` option in
 	linkman:makepkg.conf[5].
 
diff --git a/doc/makepkg-template.1.asciidoc b/doc/makepkg-template.1.asciidoc
index 1cf39fb2..1b7476c2 100644
--- a/doc/makepkg-template.1.asciidoc
+++ b/doc/makepkg-template.1.asciidoc
@@ -85,7 +85,7 @@  Example PKGBUILD
 	license=('PerlArtistic' 'GPL')
 	depends=('perl')
 	source=("http://search.cpan.org/CPAN/authors/id/S/SH/SHERZODR/Config-Simple-${pkgver}.tar.gz")
-	md5sums=('f014aec54f0a1e2e880d317180fce502')
+	sha256sums=('dd9995706f0f9384a15ccffe116c3b6e22f42ba2e58d8f24ed03c4a0e386edb4')
 	_distname="Config-Simple"
 
 	# template start; name=perl-module; version=1.0;
diff --git a/etc/makepkg.conf.in b/etc/makepkg.conf.in
index caf5114b..6ca49366 100644
--- a/etc/makepkg.conf.in
+++ b/etc/makepkg.conf.in
@@ -90,7 +90,7 @@  BUILDENV=(!distcc color !ccache check !sign)
 OPTIONS=(strip docs libtool staticlibs emptydirs zipman purge !debug)
 
 #-- File integrity checks to use. Valid: md5, sha1, sha224, sha256, sha384, sha512, b2
-INTEGRITY_CHECK=(md5)
+INTEGRITY_CHECK=(sha256)
 #-- Options to be used when stripping binaries. See `man strip' for details.
 STRIP_BINARIES="@STRIP_BINARIES@"
 #-- Options to be used when stripping shared libraries. See `man strip' for details.
diff --git a/proto/PKGBUILD-split.proto b/proto/PKGBUILD-split.proto
index 9898ef81..eea97e56 100644
--- a/proto/PKGBUILD-split.proto
+++ b/proto/PKGBUILD-split.proto
@@ -28,7 +28,7 @@  changelog=
 source=("$pkgbase-$pkgver.tar.gz"
         "$pkgname-$pkgver.patch")
 noextract=()
-md5sums=()
+sha256sums=()
 validpgpkeys=()
 
 prepare() {
diff --git a/proto/PKGBUILD-vcs.proto b/proto/PKGBUILD-vcs.proto
index ae9956a9..49c6759f 100644
--- a/proto/PKGBUILD-vcs.proto
+++ b/proto/PKGBUILD-vcs.proto
@@ -25,7 +25,7 @@  options=()
 install=
 source=('FOLDER::VCS+URL#FRAGMENT')
 noextract=()
-md5sums=('SKIP')
+sha256sums=('SKIP')
 
 # Please refer to the 'USING VCS SOURCES' section of the PKGBUILD man page for
 # a description of each element in the source array.
diff --git a/proto/PKGBUILD.proto b/proto/PKGBUILD.proto
index a2c600d5..9aff797c 100644
--- a/proto/PKGBUILD.proto
+++ b/proto/PKGBUILD.proto
@@ -27,7 +27,7 @@  changelog=
 source=("$pkgname-$pkgver.tar.gz"
         "$pkgname-$pkgver.patch")
 noextract=()
-md5sums=()
+sha256sums=()
 validpgpkeys=()
 
 prepare() {
-- 
2.25.0