rendercomment: safer auto-linkification of URLs

Message ID 20200201180741.GA141839@tsubame.mg0.fr
State New
Headers show
Series
  • rendercomment: safer auto-linkification of URLs
Related show

Commit Message

Frédéric Mangano-Tarumi Feb. 1, 2020, 6:07 p.m. UTC
Fixes a few edge cases:

- URLs within code blocks used to get redundant <> added, breaking bash
  code snippets like `curl https://...` into `curl <https://...>`.

- Links written with markdown’s <https://...> syntax also used to get an
  extra pair of brackets.
---
 aurweb/scripts/rendercomment.py | 19 +++++++++++--------
 test/t2600-rendercomment.sh     | 15 +++++++++++++--
 2 files changed, 24 insertions(+), 10 deletions(-)

Comments

Lukas Fleischer Feb. 2, 2020, 11:10 a.m. UTC | #1
On Sat, 01 Feb 2020 at 19:07:41, Frédéric Mangano-Tarumi wrote:
> Fixes a few edge cases:
> 
> - URLs within code blocks used to get redundant <> added, breaking bash
>   code snippets like `curl https://...` into `curl <https://...>`.
> 
> - Links written with markdown\u2019s <https://...> syntax also used to get an
>   extra pair of brackets.
> ---
>  aurweb/scripts/rendercomment.py | 19 +++++++++++--------
>  test/t2600-rendercomment.sh     | 15 +++++++++++++--
>  2 files changed, 24 insertions(+), 10 deletions(-)

Great! Merged into pu, thanks!

Patch

diff --git a/aurweb/scripts/rendercomment.py b/aurweb/scripts/rendercomment.py
index ad39ceb..ba28486 100755
--- a/aurweb/scripts/rendercomment.py
+++ b/aurweb/scripts/rendercomment.py
@@ -13,17 +13,20 @@  repo_path = aurweb.config.get('serve', 'repo-path')
 commit_uri = aurweb.config.get('options', 'commit_uri')
 
 
-class LinkifyPreprocessor(markdown.preprocessors.Preprocessor):
-    _urlre = re.compile(r'(\b(?:https?|ftp):\/\/[\w\/\#~:.?+=&%@!\-;,]+?'
-                        r'(?=[.:?\-;,]*(?:[^\w\/\#~:.?+=&%@!\-;,]|$)))')
-
-    def run(self, lines):
-        return [self._urlre.sub(r'<\1>', line) for line in lines]
+class LinkifyExtension(markdown.extensions.Extension):
+    """
+    Turn URLs into links, even without explicit markdown.
+    Do not linkify URLs in code blocks.
+    """
 
+    # Captures http(s) and ftp URLs until the first non URL-ish character.
+    # Excludes trailing punctuation.
+    _urlre = (r'(\b(?:https?|ftp):\/\/[\w\/\#~:.?+=&%@!\-;,]+?'
+              r'(?=[.:?\-;,]*(?:[^\w\/\#~:.?+=&%@!\-;,]|$)))')
 
-class LinkifyExtension(markdown.extensions.Extension):
     def extendMarkdown(self, md, md_globals):
-        md.preprocessors.add('linkify', LinkifyPreprocessor(md), '_end')
+        processor = markdown.inlinepatterns.AutolinkInlineProcessor(self._urlre, md)
+        md.inlinePatterns.add('linkify', processor, '_end')
 
 
 class FlysprayLinksPreprocessor(markdown.preprocessors.Preprocessor):
diff --git a/test/t2600-rendercomment.sh b/test/t2600-rendercomment.sh
index 7b3a4a8..b0209eb 100755
--- a/test/t2600-rendercomment.sh
+++ b/test/t2600-rendercomment.sh
@@ -51,11 +51,22 @@  test_expect_success 'Test HTML sanitizing.' '
 
 test_expect_success 'Test link conversion.' '
 	cat <<-EOD | sqlite3 aur.db &&
-	INSERT INTO PackageComments (ID, PackageBaseID, Comments, RenderedComment) VALUES (4, 1, "Visit https://www.archlinux.org/.", "");
+	INSERT INTO PackageComments (ID, PackageBaseID, Comments, RenderedComment) VALUES (4, 1, "
+		Visit https://www.archlinux.org/.
+		Visit <https://www.archlinux.org/>.
+		Visit \`https://www.archlinux.org/\`.
+		Visit [Arch Linux](https://www.archlinux.org/).
+		Visit [Arch Linux][arch].
+		[arch]: https://www.archlinux.org/
+	", "");
 	EOD
 	"$RENDERCOMMENT" 4 &&
 	cat <<-EOD >expected &&
-	<p>Visit <a href="https://www.archlinux.org/">https://www.archlinux.org/</a>.</p>
+		<p>Visit <a href="https://www.archlinux.org/">https://www.archlinux.org/</a>.
+		Visit <a href="https://www.archlinux.org/">https://www.archlinux.org/</a>.
+		Visit <code>https://www.archlinux.org/</code>.
+		Visit <a href="https://www.archlinux.org/">Arch Linux</a>.
+		Visit <a href="https://www.archlinux.org/">Arch Linux</a>.</p>
 	EOD
 	cat <<-EOD | sqlite3 aur.db >actual &&
 	SELECT RenderedComment FROM PackageComments WHERE ID = 4;