Bump RPC API Version 6

Message ID 20200703031040.27579-1-kevr.gtalk@gmail.com
State New
Headers show
Series Bump RPC API Version 6 | expand

Commit Message

Kevin Morris July 3, 2020, 3:10 a.m. UTC
API Version 6 modifies `type=search` functionality: Split `arg` into
keywords separated by spaces (quoted keywords are preserved).

Search for packages containing the literal keyword `blah blah` AND `haha`:
https://aur.archlinux.org/rpc/?v=6&type=search&arg="blah blah"%20haha

Search for packages containing the literal keyword `abc 123`:
https://aur.archlinux.org/rpc/?v=6&type=search&arg="abc 123"

The following example searches for packages that contain `blah` AND `abc`:
https://aur.archlinux.org/rpc/?v=6&type=search&arg=blah%20abc

The legacy method still searches for packages that contain 'blah abc':
https://aur.archlinux.org/rpc/?v=5&type=search&arg=blah%20abc
https://aur.archlinux.org/rpc/?v=5&type=search&arg=blah%20abc

API Version 6 is only considered during a `search` of `name` or
`name-desc`.

Note: This change was written as a solution to
https://bugs.archlinux.org/task/49133.

PS: + Some spacing issues fixed in comments.

Signed-off-by: Kevin Morris <kevr.gtalk@gmail.com>
---
 doc/rpc.txt               |  4 +++
 web/lib/aurjson.class.php | 66 ++++++++++++++++++++++++++++++++++-----
 2 files changed, 62 insertions(+), 8 deletions(-)

Comments

Lukas Fleischer July 4, 2020, 1:05 p.m. UTC | #1
On Thu, 02 Jul 2020 at 23:10:40, Kevin Morris wrote:
> [...]
> Signed-off-by: Kevin Morris <kevr.gtalk@gmail.com>
> ---
>  doc/rpc.txt               |  4 +++
>  web/lib/aurjson.class.php | 66 ++++++++++++++++++++++++++++++++++-----
>  2 files changed, 62 insertions(+), 8 deletions(-)

Thanks Kevin! A few comments below.

> diff --git a/doc/rpc.txt b/doc/rpc.txt
> index 3148ebea..b0f5c4e1 100644
> @@ -472,6 +472,33 @@ class AurJSON {
>                 return array('ids' => $id_args, 'names' => $name_args);
>         }
>  
> +       /*
> +        * Prepare a WHERE statement for each keyword: append $func($keyword)
> +        * separated by $delim. Each keyword is sanitized as wildcards before
> +        * it's passed to $func.

What does that last part of the comment mean? Doesn't sanitization
happen in $func itself?

> +        *
> +        * @param $delim Delimiter to use in the middle of two keywords.
> +        * @param $keywords Array of keywords to prepare.
> +        * @param $func A function that returns a string. This value is concatenated.
> +        *
> +        * @return A WHERE condition statement of keywords separated by $delim.
> +        */
> +       private function join_where($delim, $keywords, $func) {
> +               // Applied to each item to concatenate our entire statement.
> +               $reduce_func = function($carry, $item) use(&$func) {
> +                       array_push($carry, $func($item));
> +                       return $carry;
> +               };
> +
> +               // Manual array_reduce with a local lambda.
> +               $acc = array(); // Initial
> +               foreach ($keywords as &$keyword) {
> +                       $acc += $reduce_func($acc, $keyword);

I am slightly puzzled by the implementation here; why is there a need to
have array_push() above and also use the += operator? Can't we simply
call $func() directly and use array_push() here?

> +               }
> +
> +               return join(" $delim ", $acc);

It might make sense to just use $delim here and use " AND " in the
caller (I actually skipped this function on my first read and was asking
myself why the caller doesn't put spaces around "AND", that's the
behavior people expect from a join function). Either that or rename the
function slightly; in the latter case it could even make sense to drop
the parameter altogether and hardcode the AND, unless you think the
function could be used for something else in the future.
Kevin Morris July 4, 2020, 9:13 p.m. UTC | #2
Thanks for the look/review Lukas; fixed up some things in the code that
clears all these points up -- the comments for `join_where` were outdated,
forgot to update it -- renamed some variables and swapped to `array_reduce`
instead of the manual loop. Now that I can see, there's no reason that it
wasn't `array_reduce` before other than some of my personal confusion with
php and it's error messages.

`$delim` is fixed up so the caller just includes spaces if they want like
you thought -- and i do like it more that it's standardized.

I'd like to leave join_where there because I use it in two places and
didn't really want to repeat the logic -- I'm also foreseeing that it can
be used later to create more WHERE statements out of lists.

(Hopefully) final patch being submitted within a few minutes.

On Sat, Jul 4, 2020 at 6:05 AM Lukas Fleischer <lfleischer@archlinux.org>
wrote:

> On Thu, 02 Jul 2020 at 23:10:40, Kevin Morris wrote:
> > [...]
> > Signed-off-by: Kevin Morris <kevr.gtalk@gmail.com>
> > ---
> >  doc/rpc.txt               |  4 +++
> >  web/lib/aurjson.class.php | 66 ++++++++++++++++++++++++++++++++++-----
> >  2 files changed, 62 insertions(+), 8 deletions(-)
>
> Thanks Kevin! A few comments below.
>
> > diff --git a/doc/rpc.txt b/doc/rpc.txt
> > index 3148ebea..b0f5c4e1 100644
> > @@ -472,6 +472,33 @@ class AurJSON {
> >                 return array('ids' => $id_args, 'names' => $name_args);
> >         }
> >
> > +       /*
> > +        * Prepare a WHERE statement for each keyword: append
> $func($keyword)
> > +        * separated by $delim. Each keyword is sanitized as wildcards
> before
> > +        * it's passed to $func.
>
> What does that last part of the comment mean? Doesn't sanitization
> happen in $func itself?
>
> > +        *
> > +        * @param $delim Delimiter to use in the middle of two keywords.
> > +        * @param $keywords Array of keywords to prepare.
> > +        * @param $func A function that returns a string. This value is
> concatenated.
> > +        *
> > +        * @return A WHERE condition statement of keywords separated by
> $delim.
> > +        */
> > +       private function join_where($delim, $keywords, $func) {
> > +               // Applied to each item to concatenate our entire
> statement.
> > +               $reduce_func = function($carry, $item) use(&$func) {
> > +                       array_push($carry, $func($item));
> > +                       return $carry;
> > +               };
> > +
> > +               // Manual array_reduce with a local lambda.
> > +               $acc = array(); // Initial
> > +               foreach ($keywords as &$keyword) {
> > +                       $acc += $reduce_func($acc, $keyword);
>
> I am slightly puzzled by the implementation here; why is there a need to
> have array_push() above and also use the += operator? Can't we simply
> call $func() directly and use array_push() here?
>
> > +               }
> > +
> > +               return join(" $delim ", $acc);
>
> It might make sense to just use $delim here and use " AND " in the
> caller (I actually skipped this function on my first read and was asking
> myself why the caller doesn't put spaces around "AND", that's the
> behavior people expect from a join function). Either that or rename the
> function slightly; in the latter case it could even make sense to drop
> the parameter altogether and hardcode the AND, unless you think the
> function could be used for something else in the future.
>

Patch

diff --git a/doc/rpc.txt b/doc/rpc.txt
index 3148ebea..b0f5c4e1 100644
--- a/doc/rpc.txt
+++ b/doc/rpc.txt
@@ -39,6 +39,10 @@  Examples
   `/rpc/?v=5&type=search&by=makedepends&arg=boost`
 `search` with callback::
   `/rpc/?v=5&type=search&arg=foobar&callback=jsonp1192244621103`
+`search` with API Version 6 for packages containing `cookie` AND `milk`::
+  `/rpc/?v=6&type=search&arg=cookie%20milk`
+`search` with API Version 6 for packages containing `cookie milk`::
+  `/rpc/?v=6&type=search&arg="cookie milk"`
 `info`::
   `/rpc/?v=5&type=info&arg[]=foobar`
 `info` with multiple packages::
diff --git a/web/lib/aurjson.class.php b/web/lib/aurjson.class.php
index 0ac586fe..a3224e57 100644
--- a/web/lib/aurjson.class.php
+++ b/web/lib/aurjson.class.php
@@ -80,7 +80,7 @@  class AurJSON {
 		if (isset($http_data['v'])) {
 			$this->version = intval($http_data['v']);
 		}
-		if ($this->version < 1 || $this->version > 5) {
+		if ($this->version < 1 || $this->version > 6) {
 			return $this->json_error('Invalid version specified.');
 		}
 
@@ -140,7 +140,7 @@  class AurJSON {
 	}
 
 	/*
-	 * Check if an IP needs to be  rate limited.
+	 * Check if an IP needs to be rate limited.
 	 *
 	 * @param $ip IP of the current request
 	 *
@@ -192,7 +192,7 @@  class AurJSON {
 		$value = get_cache_value('ratelimit-ws:' . $ip, $status);
 		if (!$status || ($status && $value < $deletion_time)) {
 			if (set_cache_value('ratelimit-ws:' . $ip, $time, $window_length) &&
-			    set_cache_value('ratelimit:' . $ip, 1, $window_length)) {
+				set_cache_value('ratelimit:' . $ip, 1, $window_length)) {
 				return;
 			}
 		} else {
@@ -370,7 +370,7 @@  class AurJSON {
 		} elseif ($this->version >= 2) {
 			if ($this->version == 2 || $this->version == 3) {
 				$fields = implode(',', self::$fields_v2);
-			} else if ($this->version == 4 || $this->version == 5) {
+			} else if ($this->version >= 4 && $this->version <= 6) {
 				$fields = implode(',', self::$fields_v4);
 			}
 			$query = "SELECT {$fields} " .
@@ -472,6 +472,33 @@  class AurJSON {
 		return array('ids' => $id_args, 'names' => $name_args);
 	}
 
+	/*
+	 * Prepare a WHERE statement for each keyword: append $func($keyword)
+	 * separated by $delim. Each keyword is sanitized as wildcards before
+	 * it's passed to $func.
+	 *
+	 * @param $delim Delimiter to use in the middle of two keywords.
+	 * @param $keywords Array of keywords to prepare.
+	 * @param $func A function that returns a string. This value is concatenated.
+	 *
+	 * @return A WHERE condition statement of keywords separated by $delim.
+	 */
+	private function join_where($delim, $keywords, $func) {
+		// Applied to each item to concatenate our entire statement.
+		$reduce_func = function($carry, $item) use(&$func) {
+			array_push($carry, $func($item));
+			return $carry;
+		};
+
+		// Manual array_reduce with a local lambda.
+		$acc = array(); // Initial
+		foreach ($keywords as &$keyword) {
+			$acc += $reduce_func($acc, $keyword);
+		}
+
+		return join(" $delim ", $acc);
+	}
+
 	/*
 	 * Performs a fulltext mysql search of the package database.
 	 *
@@ -492,13 +519,36 @@  class AurJSON {
 			if (strlen($keyword_string) < 2) {
 				return $this->json_error('Query arg too small.');
 			}
-			$keyword_string = $this->dbh->quote("%" . addcslashes($keyword_string, '%_') . "%");
 
 			if ($search_by === 'name') {
-				$where_condition = "(Packages.Name LIKE $keyword_string)";
+				if ($this->version >= 6) {
+					$keywords = preg_split('/"([^"]*)"|\h+/', $keyword_string, -1, 
+						PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
+					$func = function($keyword) {
+						$str = $this->dbh->quote("%" . addcslashes($keyword, '%_') . "%");
+						return "(Packages.Name LIKE $str)";
+					};
+					$where_condition = $this->join_where("AND", $keywords, $func);
+				} else {
+					$keyword_string = $this->dbh->quote(
+						"%" . addcslashes($keyword_string, '%_') . "%");
+					$where_condition = "(Packages.Name LIKE $keyword_string)";
+				}
 			} else if ($search_by === 'name-desc') {
-				$where_condition = "(Packages.Name LIKE $keyword_string OR ";
-				$where_condition .= "Description LIKE $keyword_string)";
+				if ($this->version >= 6) {
+					$keywords = preg_split('/"([^"]*)"|\h+/', $keyword_string, -1, 
+						PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
+					$func = function($keyword) {
+						$str = $this->dbh->quote("%" . addcslashes($keyword, '%_') . "%");
+						return "(Packages.Name LIKE $str OR Description LIKE $str)";
+					};
+					$where_condition = $this->join_where("AND", $keywords, $func);
+				} else {
+					$keyword_string = $this->dbh->quote(
+						"%" . addcslashes($keyword_string, '%_') . "%");
+					$where_condition = "(Packages.Name LIKE $keyword_string ";
+					$where_condition .= "OR Description LIKE $keyword_string)";
+				}
 			}
 		} else if ($search_by === 'maintainer') {
 			if (empty($keyword_string)) {