Correctly handle package sources which do not validate as an url

Message ID 20190106235605.22167-1-eschwartz@archlinux.org
State Under Review
Headers show
Series Correctly handle package sources which do not validate as an url | expand

Commit Message

Eli Schwartz Jan. 6, 2019, 11:56 p.m. UTC
php's parse_url does not handle proper rfc3986 URIs, specifically, it
does not handle the case of an empty authority such as file:/// or
local:/// and only handles the case of file by applying a special case
for file itself. These URIs are deemed "malformed" and return false.

When such URIs were used, we would end up always treating the package
source as a filename (despite that this is incorrect, since plain files
will be correctly handled by parse_url, we will correctly determine that
there is no schema, and we will go to the source_file_uri).

Instead, handle the case of a "malformed" URI by treating it as another
example of a source with a schema, and linking it as-is.

See https://lists.archlinux.org/pipermail/aur-general/2019-January/034782.html
for details.

Signed-off-by: Eli Schwartz <eschwartz@archlinux.org>
---

This fixes the case of local:///, but are there other cases where php
would claim a malformed url where we would actually want to link to
cgit?

Should we just be dumb like makepkg and git/update.py, and check if it
has the string literal '://'? Given the other two places where a source
url might be handled don't even make a pretense of being proper rfc3986
parsers, this would at least mean we're highly consistent in our
behavior.

 web/lib/pkgfuncs.inc.php | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Lukas Fleischer April 27, 2019, 12:39 p.m. UTC | #1
On Sun, 06 Jan 2019 at 18:56:04, Eli Schwartz wrote:
> php's parse_url does not handle proper rfc3986 URIs, specifically, it
> does not handle the case of an empty authority such as file:/// or
> local:/// and only handles the case of file by applying a special case
> for file itself. These URIs are deemed "malformed" and return false.
> 
> When such URIs were used, we would end up always treating the package
> source as a filename (despite that this is incorrect, since plain files
> will be correctly handled by parse_url, we will correctly determine that
> there is no schema, and we will go to the source_file_uri).
> 
> Instead, handle the case of a "malformed" URI by treating it as another
> example of a source with a schema, and linking it as-is.

Sorry for replying only now, this somehow slipped through the cracks.
But I realized it's not yet in master, so it's probably not too late!

What happens if somebody uses "javascript:alert('XSS!')" in their
sources? I hope it is not converted to a link?

I think we shouldn't create links for anything other than HTTP and HTTPs
schemes (and maybe FTP as well). These links are just for convenience
and probably not used very often. So it's likely a good idea to err on
the safe side.

Lukas
Eli Schwartz April 28, 2019, 3:49 a.m. UTC | #2
On 4/27/19 8:39 AM, Lukas Fleischer wrote:
> On Sun, 06 Jan 2019 at 18:56:04, Eli Schwartz wrote:
>> php's parse_url does not handle proper rfc3986 URIs, specifically, it
>> does not handle the case of an empty authority such as file:/// or
>> local:/// and only handles the case of file by applying a special case
>> for file itself. These URIs are deemed "malformed" and return false.
>>
>> When such URIs were used, we would end up always treating the package
>> source as a filename (despite that this is incorrect, since plain files
>> will be correctly handled by parse_url, we will correctly determine that
>> there is no schema, and we will go to the source_file_uri).
>>
>> Instead, handle the case of a "malformed" URI by treating it as another
>> example of a source with a schema, and linking it as-is.
> 
> Sorry for replying only now, this somehow slipped through the cracks.
> But I realized it's not yet in master, so it's probably not too late!
> 
> What happens if somebody uses "javascript:alert('XSS!')" in their
> sources? I hope it is not converted to a link?
> 
> I think we shouldn't create links for anything other than HTTP and HTTPs
> schemes (and maybe FTP as well). These links are just for convenience
> and probably not used very often. So it's likely a good idea to err on
> the safe side.

Ah... yeah, that is a pretty good point. I'd probably want to display
that as straight up plaintext.

It definitely should not be appended to the cgit url if it is a valid
schema, though. And regarding not making a link at all (e.g. for the
common git:// protocol), how would that play with renamed sources like
"$pkgname::git://example.com/project-something"?

I wish php had a schema validator that wasn't broken... python's
urlparse cleverly handles all this nonsense, and you can just refuse to
print urls with ParseResult(scheme='javascript',...). Maybe we should do
string comparisons to reject javascript schemes? Is there anything else
which matters in this context?
Lukas Fleischer April 28, 2019, 12:26 p.m. UTC | #3
On Sat, 27 Apr 2019 at 23:49:11, Eli Schwartz wrote:
> Ah... yeah, that is a pretty good point. I'd probably want to display
> that as straight up plaintext.
> 
> It definitely should not be appended to the cgit url if it is a valid
> schema, though. And regarding not making a link at all (e.g. for the
> common git:// protocol), how would that play with renamed sources like
> "$pkgname::git://example.com/project-something"?

What's the problem with that?

We always remove the "$pkgname::" prefix. And then, this source would be
shown as "git://example.com/project-something" without linking to
anything.

> I wish php had a schema validator that wasn't broken... python's
> urlparse cleverly handles all this nonsense, and you can just refuse to
> print urls with ParseResult(scheme='javascript',...). Maybe we should do
> string comparisons to reject javascript schemes? Is there anything else
> which matters in this context?

We could do that. But why blacklist instead of whitelist for such a
minor feature? I suggest to convert local links to cgit URLs, convert
HTTP/HTTPs/FTP to absolute links, and display everything else in plain
text.

Lukas
Eli Schwartz April 28, 2019, 1:54 p.m. UTC | #4
On 4/28/19 8:26 AM, Lukas Fleischer wrote:
> On Sat, 27 Apr 2019 at 23:49:11, Eli Schwartz wrote:
>> Ah... yeah, that is a pretty good point. I'd probably want to display
>> that as straight up plaintext.
>>
>> It definitely should not be appended to the cgit url if it is a valid
>> schema, though. And regarding not making a link at all (e.g. for the
>> common git:// protocol), how would that play with renamed sources like
>> "$pkgname::git://example.com/project-something"?
> 
> What's the problem with that?
> 
> We always remove the "$pkgname::" prefix. And then, this source would be
> shown as "git://example.com/project-something" without linking to
> anything.

We could do as you say. But currently, we are setting the link text to
"$pkgname", which I guess would not have an analogue for text-only
sources without an href. It would be a touch inconsistent.

>> I wish php had a schema validator that wasn't broken... python's
>> urlparse cleverly handles all this nonsense, and you can just refuse to
>> print urls with ParseResult(scheme='javascript',...). Maybe we should do
>> string comparisons to reject javascript schemes? Is there anything else
>> which matters in this context?
> 
> We could do that. But why blacklist instead of whitelist for such a
> minor feature? I suggest to convert local links to cgit URLs, convert
> HTTP/HTTPs/FTP to absolute links, and display everything else in plain
> text.

I don't know. :D It seems like it would be reasonable to go either way:
fix some current urls, or drop support for all the irregular ones.

Do we have any statistics on whether people actually use different urls
for anything? I mean, I suppose in theory one could have a custom scheme
handler for git:// which would open in some GUI shell for git
repositories. But that be sort of reaching for an excuse.

Patch

diff --git a/web/lib/pkgfuncs.inc.php b/web/lib/pkgfuncs.inc.php
index ced1f8e..126b5c3 100644
--- a/web/lib/pkgfuncs.inc.php
+++ b/web/lib/pkgfuncs.inc.php
@@ -481,7 +481,7 @@  function pkg_source_link($url, $arch, $package) {
 	$url = explode('::', $url);
 	$parsed_url = parse_url($url[0]);
 
-	if (isset($parsed_url['scheme']) || isset($url[1])) {
+	if ($parsed_url === false || isset($parsed_url['scheme']) || isset($url[1])) {
 		$link = '<a href="' .  htmlspecialchars((isset($url[1]) ? $url[1] : $url[0]), ENT_QUOTES) . '">' . htmlspecialchars($url[0]) . '</a>';
 	} else {
 		$file_url = sprintf(config_get('options', 'source_file_uri'), htmlspecialchars($url[0]), $package);