Support conjunctive keyword search in RPC interface

Message ID 20200705051007.30145-1-kevr.gtalk@gmail.com
State New
Headers show
Series Support conjunctive keyword search in RPC interface | expand

Commit Message

Kevin Morris July 5, 2020, 5:10 a.m. UTC
Newly supported API Version 6 modifies `type=search` functionality; it
now behaves the same as `name` or `name-desc` search through the
https://aur.archlinux.org/packages/ search page.

Search for packages containing the literal keyword `blah blah` AND `haha`:
https://aur.archlinux.org/rpc/?v=6&type=search&arg="blah blah"%20haha

Search for packages containing the literal keyword `abc 123`:
https://aur.archlinux.org/rpc/?v=6&type=search&arg="abc 123"

The following example searches for packages that contain `blah` AND `abc`:
https://aur.archlinux.org/rpc/?v=6&type=search&arg=blah%20abc

The legacy method still searches for packages that contain `blah abc`:
https://aur.archlinux.org/rpc/?v=5&type=search&arg=blah%20abc
https://aur.archlinux.org/rpc/?v=5&type=search&arg=blah%20abc

API Version 6 is currently only considered during a `search` of `name` or
`name-desc`.

Note: This change was written as a solution to
https://bugs.archlinux.org/task/49133.

PS: + Some spacing issues fixed in comments.

Signed-off-by: Kevin Morris <kevr.gtalk@gmail.com>
---
 doc/rpc.txt               |  4 ++++
 web/lib/aurjson.class.php | 29 +++++++++++++++++++++--------
 web/lib/pkgfuncs.inc.php  | 10 +++++-----
 3 files changed, 30 insertions(+), 13 deletions(-)

Comments

Kevin Morris July 5, 2020, 5:13 a.m. UTC | #1
Alright, patch sent; I used `-v1` this time with `git send-email`... I
can't find documentation for that switch yet. I've tested basic search
through both paths; the only refactoring that needed to be done was to
remove the extra "AND (" and ")" from the generic function, and add it
where we need it in `pkg_search_page`. Then, we can reuse the same
`construct_keyword_search` in rpc.
Lukas Fleischer July 5, 2020, 1:29 p.m. UTC | #2
On Sun, 05 Jul 2020 at 01:10:07, Kevin Morris wrote:
> Newly supported API Version 6 modifies `type=search` functionality; it
> now behaves the same as `name` or `name-desc` search through the
> https://aur.archlinux.org/packages/ search page.
> 
> Search for packages containing the literal keyword `blah blah` AND `haha`:
> https://aur.archlinux.org/rpc/?v=6&type=search&arg="blah blah"%20haha
> 
> Search for packages containing the literal keyword `abc 123`:
> https://aur.archlinux.org/rpc/?v=6&type=search&arg="abc 123"
> 
> The following example searches for packages that contain `blah` AND `abc`:
> https://aur.archlinux.org/rpc/?v=6&type=search&arg=blah%20abc
> 
> The legacy method still searches for packages that contain `blah abc`:
> https://aur.archlinux.org/rpc/?v=5&type=search&arg=blah%20abc
> https://aur.archlinux.org/rpc/?v=5&type=search&arg=blah%20abc
> 
> API Version 6 is currently only considered during a `search` of `name` or
> `name-desc`.
> 
> Note: This change was written as a solution to
> https://bugs.archlinux.org/task/49133.
> 
> PS: + Some spacing issues fixed in comments.
> 
> Signed-off-by: Kevin Morris <kevr.gtalk@gmail.com>
> ---
>  doc/rpc.txt               |  4 ++++
>  web/lib/aurjson.class.php | 29 +++++++++++++++++++++--------
>  web/lib/pkgfuncs.inc.php  | 10 +++++-----
>  3 files changed, 30 insertions(+), 13 deletions(-)
> [...]
> @@ -492,13 +492,26 @@ class AurJSON {
>                         if (strlen($keyword_string) < 2) {
>                                 return $this->json_error('Query arg too small.');
>                         }
> -                       $keyword_string = $this->dbh->quote("%" . addcslashes($keyword_string, '%_') . "%");
>  
>                         if ($search_by === 'name') {
> -                               $where_condition = "(Packages.Name LIKE $keyword_string)";
> +                               if ($this->version >= 6) {
> +                                       $where_condition = construct_keyword_search($this->dbh,
> +                                               $keyword_string, false);

Do name and name-desc now behave exactly the same (see below)? If so,
this change in behavior should be documented at least. I would have
expected some more refactoring in construct_keyword_search() and one
additional parameter being passed here though.

Should we include pkgfuncs.inc.php from aurjson.class.php now? How does
this currently get imported?

Also, since the code for both name and name-desc are now very similar,
can we refactor and share most of the code instead of copy pasting? To
this end, it might be easier to switch the blocks (i.e. check for API
version first, then check for request type). That would allow us to
reuse the same assignment to $keyword_string as before and possibly the
same construct_keyword_search() invocation too.

> +                               } else {
> +                                       $keyword_string = $this->dbh->quote(
> +                                               "%" . addcslashes($keyword_string, '%_') . "%");
> +                                       $where_condition = "(Packages.Name LIKE $keyword_string)";
> +                               }
>                         } else if ($search_by === 'name-desc') {
> -                               $where_condition = "(Packages.Name LIKE $keyword_string OR ";
> -                               $where_condition .= "Description LIKE $keyword_string)";
> +                               if ($this->version >= 6) {
> +                                       $where_condition = construct_keyword_search($this->dbh,
> +                                               $keyword_string, true);

See above.

> +                               } else {
> +                                       $keyword_string = $this->dbh->quote(
> +                                               "%" . addcslashes($keyword_string, '%_') . "%");
> +                                       $where_condition = "(Packages.Name LIKE $keyword_string ";
> +                                       $where_condition .= "OR Description LIKE $keyword_string)";
> +                               }
>                         }
>                 } else if ($search_by === 'maintainer') {
>                         if (empty($keyword_string)) {
> diff --git a/web/lib/pkgfuncs.inc.php b/web/lib/pkgfuncs.inc.php
> index 8c915711..f6108e5a 100644
> --- a/web/lib/pkgfuncs.inc.php
> +++ b/web/lib/pkgfuncs.inc.php
> @@ -697,7 +697,9 @@ function pkg_search_page($params, $show_headers=true, $SID="") {
>                 }
>                 elseif (isset($params["SeB"]) && $params["SeB"] == "k") {
>                         /* Search by keywords. */
> +                       $q_where .= " AND ( ";

Do we need the space at the end?

>                         $q_where .= construct_keyword_search($dbh, $params['K'], false);
> +                       $q_where .= " )";

Same here, do we need the space before the closing parentheses? More
importantly though, do we want to add a space after the parenthesis
instead to match the previous behavior of SQL strings always ending with
a space, so we can immediately append without a space?

>                 }
>                 elseif (isset($params["SeB"]) && $params["SeB"] == "N") {
>                         /* Search by name (exact match). */
> @@ -709,7 +711,9 @@ function pkg_search_page($params, $show_headers=true, $SID="") {
>                 }
>                 else {
>                         /* Keyword search (default). */
> +                       $q_where .= " AND ( ";
>                         $q_where .= construct_keyword_search($dbh, $params['K'], true);
> +                       $q_where .= " )";

Same here.

>                 }
>         }
>  
> @@ -876,11 +880,7 @@ function construct_keyword_search($dbh, $keywords, $namedesc) {
>                 $op = "AND ";
>         }
>  
> -       if (!empty($q_keywords)) {
> -               $where_part = "AND (" . $q_keywords . ") ";
> -       }
> -
> -       return $where_part;
> +       return $q_keywords;
>  }
>  
>  /**
> -- 
> 2.20.1
Lukas Fleischer July 5, 2020, 1:33 p.m. UTC | #3
On Sun, 05 Jul 2020 at 01:13:25, Kevin Morris wrote:
> Alright, patch sent; I used `-v1` this time with `git send-email`... I
> can't find documentation for that switch yet. I've tested basic search
> through both paths; the only refactoring that needed to be done was to
> remove the extra "AND (" and ")" from the generic function, and add it
> where we need it in `pkg_search_page`. Then, we can reuse the same
> `construct_keyword_search` in rpc.

Thanks for the revision! I added some comments inline.

It's common to not use any versioning for the first revision of a patch
(e.g. send without -v1) but then start using (-v2, -v3, ...) for
followup revisions. The -v argument is documented in the
git-format-patch(1) man page, if you are using git-format-patch and
git-send-email separately, you need to specify the argument when
formatting the patch (however, in most cases, you can also run `git
format patch` directly).
Kevin Morris July 5, 2020, 6:51 p.m. UTC | #4
>
> Do name and name-desc now behave exactly the same (see below)? If so,
> this change in behavior should be documented at least. I would have
> expected some more refactoring in construct_keyword_search() and one
> additional parameter being passed here though.
>
> Should we include pkgfuncs.inc.php from aurjson.class.php now? How does
> this currently get imported?
>
> Also, since the code for both name and name-desc are now very similar,
> can we refactor and share most of the code instead of copy pasting? To
> this end, it might be easier to switch the blocks (i.e. check for API
> version first, then check for request type). That would allow us to
> reuse the same assignment to $keyword_string as before and possibly the
> same construct_keyword_search() invocation too.
>

I noticed an issue with pre-sanitizing $keyword_string as per `master` --
when searching by maintainer in rpc, a WHERE Username = $keyword_string is
used without wildcards (%...%). As far as I know, wildcards should only be
used in a LIKE expression? Could be wrong, but that's what I thought when
originally modifying `aurjson.inc.php`'s `function search`. Thoughts?

On Sun, Jul 5, 2020 at 6:33 AM Lukas Fleischer <lfleischer@archlinux.org>
wrote:

> On Sun, 05 Jul 2020 at 01:13:25, Kevin Morris wrote:
> > Alright, patch sent; I used `-v1` this time with `git send-email`... I
> > can't find documentation for that switch yet. I've tested basic search
> > through both paths; the only refactoring that needed to be done was to
> > remove the extra "AND (" and ")" from the generic function, and add it
> > where we need it in `pkg_search_page`. Then, we can reuse the same
> > `construct_keyword_search` in rpc.
>
> Thanks for the revision! I added some comments inline.
>
> It's common to not use any versioning for the first revision of a patch
> (e.g. send without -v1) but then start using (-v2, -v3, ...) for
> followup revisions. The -v argument is documented in the
> git-format-patch(1) man page, if you are using git-format-patch and
> git-send-email separately, you need to specify the argument when
> formatting the patch (however, in most cases, you can also run `git
> format patch` directly).
>
Kevin Morris July 5, 2020, 6:57 p.m. UTC | #5
>
> Same here, do we need the space before the closing parentheses? More
> importantly though, do we want to add a space after the parenthesis
> instead to match the previous behavior of SQL strings always ending with
> a space, so we can immediately append without a space?
>

This one can't hurt at all, the only thing it'd do is make
construct_keyword_search more flexible. ++ on this, change will be in the
next rev.

On Sun, Jul 5, 2020 at 11:51 AM Kevin Morris <kevr.gtalk@gmail.com> wrote:

> Do name and name-desc now behave exactly the same (see below)? If so,
>> this change in behavior should be documented at least. I would have
>> expected some more refactoring in construct_keyword_search() and one
>> additional parameter being passed here though.
>>
>> Should we include pkgfuncs.inc.php from aurjson.class.php now? How does
>> this currently get imported?
>>
>> Also, since the code for both name and name-desc are now very similar,
>> can we refactor and share most of the code instead of copy pasting? To
>> this end, it might be easier to switch the blocks (i.e. check for API
>> version first, then check for request type). That would allow us to
>> reuse the same assignment to $keyword_string as before and possibly the
>> same construct_keyword_search() invocation too.
>>
>
> I noticed an issue with pre-sanitizing $keyword_string as per `master` --
> when searching by maintainer in rpc, a WHERE Username = $keyword_string is
> used without wildcards (%...%). As far as I know, wildcards should only be
> used in a LIKE expression? Could be wrong, but that's what I thought when
> originally modifying `aurjson.inc.php`'s `function search`. Thoughts?
>
> On Sun, Jul 5, 2020 at 6:33 AM Lukas Fleischer <lfleischer@archlinux.org>
> wrote:
>
>> On Sun, 05 Jul 2020 at 01:13:25, Kevin Morris wrote:
>> > Alright, patch sent; I used `-v1` this time with `git send-email`... I
>> > can't find documentation for that switch yet. I've tested basic search
>> > through both paths; the only refactoring that needed to be done was to
>> > remove the extra "AND (" and ")" from the generic function, and add it
>> > where we need it in `pkg_search_page`. Then, we can reuse the same
>> > `construct_keyword_search` in rpc.
>>
>> Thanks for the revision! I added some comments inline.
>>
>> It's common to not use any versioning for the first revision of a patch
>> (e.g. send without -v1) but then start using (-v2, -v3, ...) for
>> followup revisions. The -v argument is documented in the
>> git-format-patch(1) man page, if you are using git-format-patch and
>> git-send-email separately, you need to specify the argument when
>> formatting the patch (however, in most cases, you can also run `git
>> format patch` directly).
>>
>
>
> --
> Kevin Morris
> Software Developer
>
> Personal Inquiries: kevr.gtalk@gmail.com
> Personal Phone: (415) 583-9687
>
> Technologies: C, C++, Python, Django, Ruby, Rails, ReactJS, jQuery,
> Javascript, SQL, Redux
>
Kevin Morris July 5, 2020, 7:03 p.m. UTC | #6
Okay, I was totally mistaken - `$keyword_string` is only ever sanitized in
the right place (name/name-desc) in master. Yeah, I can rework this to be a
bit cleaner as you recommended -- no problem. ++ on this, will be included
in next rev.

On Sun, Jul 5, 2020 at 11:51 AM Kevin Morris <kevr.gtalk@gmail.com> wrote:

> Do name and name-desc now behave exactly the same (see below)? If so,
>> this change in behavior should be documented at least. I would have
>> expected some more refactoring in construct_keyword_search() and one
>> additional parameter being passed here though.
>>
>> Should we include pkgfuncs.inc.php from aurjson.class.php now? How does
>> this currently get imported?
>>
>> Also, since the code for both name and name-desc are now very similar,
>> can we refactor and share most of the code instead of copy pasting? To
>> this end, it might be easier to switch the blocks (i.e. check for API
>> version first, then check for request type). That would allow us to
>> reuse the same assignment to $keyword_string as before and possibly the
>> same construct_keyword_search() invocation too.
>>
>
> I noticed an issue with pre-sanitizing $keyword_string as per `master` --
> when searching by maintainer in rpc, a WHERE Username = $keyword_string is
> used without wildcards (%...%). As far as I know, wildcards should only be
> used in a LIKE expression? Could be wrong, but that's what I thought when
> originally modifying `aurjson.inc.php`'s `function search`. Thoughts?
>
> On Sun, Jul 5, 2020 at 6:33 AM Lukas Fleischer <lfleischer@archlinux.org>
> wrote:
>
>> On Sun, 05 Jul 2020 at 01:13:25, Kevin Morris wrote:
>> > Alright, patch sent; I used `-v1` this time with `git send-email`... I
>> > can't find documentation for that switch yet. I've tested basic search
>> > through both paths; the only refactoring that needed to be done was to
>> > remove the extra "AND (" and ")" from the generic function, and add it
>> > where we need it in `pkg_search_page`. Then, we can reuse the same
>> > `construct_keyword_search` in rpc.
>>
>> Thanks for the revision! I added some comments inline.
>>
>> It's common to not use any versioning for the first revision of a patch
>> (e.g. send without -v1) but then start using (-v2, -v3, ...) for
>> followup revisions. The -v argument is documented in the
>> git-format-patch(1) man page, if you are using git-format-patch and
>> git-send-email separately, you need to specify the argument when
>> formatting the patch (however, in most cases, you can also run `git
>> format patch` directly).
>>
>
>
> --
> Kevin Morris
> Software Developer
>
> Personal Inquiries: kevr.gtalk@gmail.com
> Personal Phone: (415) 583-9687
>
> Technologies: C, C++, Python, Django, Ruby, Rails, ReactJS, jQuery,
> Javascript, SQL, Redux
>

Patch

diff --git a/doc/rpc.txt b/doc/rpc.txt
index 3148ebea..b0f5c4e1 100644
--- a/doc/rpc.txt
+++ b/doc/rpc.txt
@@ -39,6 +39,10 @@  Examples
   `/rpc/?v=5&type=search&by=makedepends&arg=boost`
 `search` with callback::
   `/rpc/?v=5&type=search&arg=foobar&callback=jsonp1192244621103`
+`search` with API Version 6 for packages containing `cookie` AND `milk`::
+  `/rpc/?v=6&type=search&arg=cookie%20milk`
+`search` with API Version 6 for packages containing `cookie milk`::
+  `/rpc/?v=6&type=search&arg="cookie milk"`
 `info`::
   `/rpc/?v=5&type=info&arg[]=foobar`
 `info` with multiple packages::
diff --git a/web/lib/aurjson.class.php b/web/lib/aurjson.class.php
index 0ac586fe..83ce502a 100644
--- a/web/lib/aurjson.class.php
+++ b/web/lib/aurjson.class.php
@@ -80,7 +80,7 @@  class AurJSON {
 		if (isset($http_data['v'])) {
 			$this->version = intval($http_data['v']);
 		}
-		if ($this->version < 1 || $this->version > 5) {
+		if ($this->version < 1 || $this->version > 6) {
 			return $this->json_error('Invalid version specified.');
 		}
 
@@ -140,7 +140,7 @@  class AurJSON {
 	}
 
 	/*
-	 * Check if an IP needs to be  rate limited.
+	 * Check if an IP needs to be rate limited.
 	 *
 	 * @param $ip IP of the current request
 	 *
@@ -192,7 +192,7 @@  class AurJSON {
 		$value = get_cache_value('ratelimit-ws:' . $ip, $status);
 		if (!$status || ($status && $value < $deletion_time)) {
 			if (set_cache_value('ratelimit-ws:' . $ip, $time, $window_length) &&
-			    set_cache_value('ratelimit:' . $ip, 1, $window_length)) {
+				set_cache_value('ratelimit:' . $ip, 1, $window_length)) {
 				return;
 			}
 		} else {
@@ -370,7 +370,7 @@  class AurJSON {
 		} elseif ($this->version >= 2) {
 			if ($this->version == 2 || $this->version == 3) {
 				$fields = implode(',', self::$fields_v2);
-			} else if ($this->version == 4 || $this->version == 5) {
+			} else if ($this->version >= 4 && $this->version <= 6) {
 				$fields = implode(',', self::$fields_v4);
 			}
 			$query = "SELECT {$fields} " .
@@ -492,13 +492,26 @@  class AurJSON {
 			if (strlen($keyword_string) < 2) {
 				return $this->json_error('Query arg too small.');
 			}
-			$keyword_string = $this->dbh->quote("%" . addcslashes($keyword_string, '%_') . "%");
 
 			if ($search_by === 'name') {
-				$where_condition = "(Packages.Name LIKE $keyword_string)";
+				if ($this->version >= 6) {
+					$where_condition = construct_keyword_search($this->dbh,
+						$keyword_string, false);
+				} else {
+					$keyword_string = $this->dbh->quote(
+						"%" . addcslashes($keyword_string, '%_') . "%");
+					$where_condition = "(Packages.Name LIKE $keyword_string)";
+				}
 			} else if ($search_by === 'name-desc') {
-				$where_condition = "(Packages.Name LIKE $keyword_string OR ";
-				$where_condition .= "Description LIKE $keyword_string)";
+				if ($this->version >= 6) {
+					$where_condition = construct_keyword_search($this->dbh,
+						$keyword_string, true);
+				} else {
+					$keyword_string = $this->dbh->quote(
+						"%" . addcslashes($keyword_string, '%_') . "%");
+					$where_condition = "(Packages.Name LIKE $keyword_string ";
+					$where_condition .= "OR Description LIKE $keyword_string)";
+				}
 			}
 		} else if ($search_by === 'maintainer') {
 			if (empty($keyword_string)) {
diff --git a/web/lib/pkgfuncs.inc.php b/web/lib/pkgfuncs.inc.php
index 8c915711..f6108e5a 100644
--- a/web/lib/pkgfuncs.inc.php
+++ b/web/lib/pkgfuncs.inc.php
@@ -697,7 +697,9 @@  function pkg_search_page($params, $show_headers=true, $SID="") {
 		}
 		elseif (isset($params["SeB"]) && $params["SeB"] == "k") {
 			/* Search by keywords. */
+			$q_where .= " AND ( ";
 			$q_where .= construct_keyword_search($dbh, $params['K'], false);
+			$q_where .= " )";
 		}
 		elseif (isset($params["SeB"]) && $params["SeB"] == "N") {
 			/* Search by name (exact match). */
@@ -709,7 +711,9 @@  function pkg_search_page($params, $show_headers=true, $SID="") {
 		}
 		else {
 			/* Keyword search (default). */
+			$q_where .= " AND ( ";
 			$q_where .= construct_keyword_search($dbh, $params['K'], true);
+			$q_where .= " )";
 		}
 	}
 
@@ -876,11 +880,7 @@  function construct_keyword_search($dbh, $keywords, $namedesc) {
 		$op = "AND ";
 	}
 
-	if (!empty($q_keywords)) {
-		$where_part = "AND (" . $q_keywords . ") ";
-	}
-
-	return $where_part;
+	return $q_keywords;
 }
 
 /**