Add search_type url parameter

Message ID 20200630233601.32314-1-kevr.gtalk@gmail.com
State New
Headers show
Series
  • Add search_type url parameter
Related show

Commit Message

Kevin Morris June 30, 2020, 11:36 p.m. UTC
This commit introduces new functionality: When `search_type` is `web`,
search behavior matches aurweb's HTML search page.

New parameters: `search_type`
Valid search_type values: `rpc` (default), `web`

The `rpc` search type uses the existing legacy search method.

The `web` search type clones the web-based aurweb search page's behavior
as closely as possible (see note at the bottom of this message).

The following example searches for packages that contain `blah` AND `abc`:
https://aur.archlinux.org/rpc/?v=5&type=search&search_type=web&arg=blah%20abc

The legacy method still searches for packages that contain 'blah abc':
https://aur.archlinux.org/rpc/?v=5&type=search&arg=blah%20abc
https://aur.archlinux.org/rpc/?v=5&type=search&search_type=rpc&arg=blah%20abc

An invalid `search_type` returns a json error message about it being
unsupported.

Additionally, `search_type` is only parsed and considered during a search of
type `name` or `name-desc`.

Note: This change was written as a solution to
https://bugs.archlinux.org/task/49133.

PS: + Some spacing issues fixed in comments.

Signed-off-by: Kevin Morris <kevr.gtalk@gmail.com>
---
 doc/rpc.txt               |  7 ++++
 web/lib/aurjson.class.php | 75 +++++++++++++++++++++++++++++++++++----
 2 files changed, 76 insertions(+), 6 deletions(-)

Comments

Lukas Fleischer July 2, 2020, 11:46 p.m. UTC | #1
Hi Kevin,

Thanks for the patch!

On Tue, 30 Jun 2020 at 19:36:01, Kevin Morris wrote:
> This commit introduces new functionality: When `search_type` is `web`,
> search behavior matches aurweb's HTML search page.
> 
> New parameters: `search_type`
> Valid search_type values: `rpc` (default), `web`
> 
> The `rpc` search type uses the existing legacy search method.
> 
> The `web` search type clontes the web-based aurweb search page's behavior
> as closely as possible (see note at the bottom of this message).
> 
> The following example searches for packages that contain `blah` AND `abc`:
> https://aur.archlinux.org/rpc/?v=5&type=search&search_type=web&arg=blah%20abc
> 
> The legacy method still searches for packages that contain 'blah abc':
> https://aur.archlinux.org/rpc/?v=5&type=search&arg=blah%20abc
> https://aur.archlinux.org/rpc/?v=5&type=search&search_type=rpc&arg=blah%20abc

Before having a look at the implementation, is there any reason the two
search types are called "rpc" and "web" (other than "rpc" is how the RPC
interface used to behave and "web" is how the RPC interface used to
behave)? If not, I think we should choose a different naming that is
more intuitive and self-explanatory.

Going a step further, would it be much more work to instead make the
interface more powerful, use conjunctive search by default and perform
exact matches for phrases put in quotes, such as

    "arch linux" package

matching everything that contains the strings "arch linux" and
"package"? With that, we wouldn't even have to distinguish different
search types (but we may need to bump the API version).

Best regards,
Lukas
Kevin Morris July 3, 2020, 12:04 a.m. UTC | #2
No problem; I love to help where I can!

To address your two points:

1. There is no particular reason for `rpc` and `web` being chosen; I just
needed separate values for this patch to be tested with. I'm up to renaming
them to anything if we decide to use `search_type`.
2. It would not be much more work to add the quoted stanza -- I had an idea
to at one point, but just decided to back off from it; figured just
introducing the patch would get some conversation about it on the ML. I'll
start working on a follow-up patch that implements this as well.

The refraining from #2 is due to me being a bit worried due to it changing
how the API behaves, didn't think about the complete picture ("arch linux"
second_keyword). I'll start working on that now.

My tasks:

1. Rename search_type values to something more sensible (waiting on this
for feedback from ML)
2. Implement support for quoted strings in the search arg: "arch linux".
Quoted keywords will be taken literally -- that is, "arch linux" would
trigger a search for the literal 'arch linux' string.

Thanks for the follow-up, Lukas; I'm on it!


On Thu, Jul 2, 2020 at 4:46 PM Lukas Fleischer <lfleischer@archlinux.org>
wrote:

> Hi Kevin,
>
> Thanks for the patch!
>
> On Tue, 30 Jun 2020 at 19:36:01, Kevin Morris wrote:
> > This commit introduces new functionality: When `search_type` is `web`,
> > search behavior matches aurweb's HTML search page.
> >
> > New parameters: `search_type`
> > Valid search_type values: `rpc` (default), `web`
> >
> > The `rpc` search type uses the existing legacy search method.
> >
> > The `web` search type clontes the web-based aurweb search page's behavior
> > as closely as possible (see note at the bottom of this message).
> >
> > The following example searches for packages that contain `blah` AND
> `abc`:
> >
> https://aur.archlinux.org/rpc/?v=5&type=search&search_type=web&arg=blah%20abc
> >
> > The legacy method still searches for packages that contain 'blah abc':
> > https://aur.archlinux.org/rpc/?v=5&type=search&arg=blah%20abc
> >
> https://aur.archlinux.org/rpc/?v=5&type=search&search_type=rpc&arg=blah%20abc
>
> Before having a look at the implementation, is there any reason the two
> search types are called "rpc" and "web" (other than "rpc" is how the RPC
> interface used to behave and "web" is how the RPC interface used to
> behave)? If not, I think we should choose a different naming that is
> more intuitive and self-explanatory.
>
> Going a step further, would it be much more work to instead make the
> interface more powerful, use conjunctive search by default and perform
> exact matches for phrases put in quotes, such as
>
>     "arch linux" package
>
> matching everything that contains the strings "arch linux" and
> "package"? With that, we wouldn't even have to distinguish different
> search types (but we may need to bump the API version).
>
> Best regards,
> Lukas
>
Lukas Fleischer July 3, 2020, 12:23 a.m. UTC | #3
Hi Kevin,

Thanks for your quick reply and your willingness to improve on your
first submission!

On Thu, 02 Jul 2020 at 20:04:10, Kevin Morris wrote:
> The refraining from #2 is due to me being a bit worried due to it changing
> how the API behaves, didn't think about the complete picture ("arch linux"
> second_keyword). I'll start working on that now.

Yes, we need to bump the API version and keep the old behavior for the
current API version. But I don't see any issues with that.

> 1. Rename search_type values to something more sensible (waiting on this
> for feedback from ML)
> 2. Implement support for quoted strings in the search arg: "arch linux".
> Quoted keywords will be taken literally -- that is, "arch linux" would
> trigger a search for the literal 'arch linux' string.

Would (1) still be relevant if (2) is implemented?

Best,
Lukas
Kevin Morris July 3, 2020, 1:02 a.m. UTC | #4
If we bump the API version, #1 is unnecessary and would be removed.

On Thu, Jul 2, 2020, 5:23 PM Lukas Fleischer <lfleischer@archlinux.org>
wrote:

> Hi Kevin,
>
> Thanks for your quick reply and your willingness to improve on your
> first submission!
>
> On Thu, 02 Jul 2020 at 20:04:10, Kevin Morris wrote:
> > The refraining from #2 is due to me being a bit worried due to it
> changing
> > how the API behaves, didn't think about the complete picture ("arch
> linux"
> > second_keyword). I'll start working on that now.
>
> Yes, we need to bump the API version and keep the old behavior for the
> current API version. But I don't see any issues with that.
>
> > 1. Rename search_type values to something more sensible (waiting on this
> > for feedback from ML)
> > 2. Implement support for quoted strings in the search arg: "arch linux".
> > Quoted keywords will be taken literally -- that is, "arch linux" would
> > trigger a search for the literal 'arch linux' string.
>
> Would (1) still be relevant if (2) is implemented?
>
> Best,
> Lukas
>
Kevin Morris July 3, 2020, 3:23 a.m. UTC | #5
Alright. Sent a new patch that implements all this -- wasn't able to figure
out how to `git send-email` into the same thread as this one, it's titled
"[PATCH] Bump RPC API Version 6". I included the `Message ID` for this
thread.

Patch

diff --git a/doc/rpc.txt b/doc/rpc.txt
index 3148ebea..eb97547d 100644
--- a/doc/rpc.txt
+++ b/doc/rpc.txt
@@ -21,6 +21,11 @@  The _by_ parameter can be skipped and defaults to `name-desc`.
 If a maintainer search is performed and the search argument is left empty, a
 list of orphan packages is returned.
 
+The _search_type_ parameter can be skipped and defaults to `rpc`.
+
+_search_type_ is only ever used during a `name` or `name-desc` search and
+supports the following values: `rpc`, `web`.
+
 Package Details
 ---------------
 
@@ -39,6 +44,8 @@  Examples
   `/rpc/?v=5&type=search&by=makedepends&arg=boost`
 `search` with callback::
   `/rpc/?v=5&type=search&arg=foobar&callback=jsonp1192244621103`
+`search` with `web` _search_type_ for packages containing `cookie` AND `milk`::
+  `/rpc/?v=5&type=search&search_type=web&arg=cookie%20milk`
 `info`::
   `/rpc/?v=5&type=info&arg[]=foobar`
 `info` with multiple packages::
diff --git a/web/lib/aurjson.class.php b/web/lib/aurjson.class.php
index 0ac586fe..0e94b578 100644
--- a/web/lib/aurjson.class.php
+++ b/web/lib/aurjson.class.php
@@ -23,6 +23,9 @@  class AurJSON {
 	private static $exposed_depfields = array(
 		'depends', 'makedepends', 'checkdepends', 'optdepends'
 	);
+	private static $exposed_search_types = array(
+		'rpc', 'web'
+	);
 	private static $fields_v1 = array(
 		'Packages.ID', 'Packages.Name',
 		'PackageBases.ID AS PackageBaseID',
@@ -140,7 +143,7 @@  class AurJSON {
 	}
 
 	/*
-	 * Check if an IP needs to be  rate limited.
+	 * Check if an IP needs to be rate limited.
 	 *
 	 * @param $ip IP of the current request
 	 *
@@ -192,7 +195,7 @@  class AurJSON {
 		$value = get_cache_value('ratelimit-ws:' . $ip, $status);
 		if (!$status || ($status && $value < $deletion_time)) {
 			if (set_cache_value('ratelimit-ws:' . $ip, $time, $window_length) &&
-			    set_cache_value('ratelimit:' . $ip, 1, $window_length)) {
+				set_cache_value('ratelimit:' . $ip, 1, $window_length)) {
 				return;
 			}
 		} else {
@@ -472,6 +475,33 @@  class AurJSON {
 		return array('ids' => $id_args, 'names' => $name_args);
 	}
 
+	/*
+	 * Prepare a WHERE statement for each keyword: append $func($keyword)
+	 * separated by $delim. Each keyword is sanitized as wildcards before
+	 * it's passed to $func.
+	 *
+	 * @param $delim Delimiter to use in the middle of two keywords.
+	 * @param $keywords Array of keywords to prepare.
+	 * @param $func A function that returns a string. This value is concatenated.
+	 *
+	 * @return A WHERE condition statement of keywords separated by $delim.
+	 */
+	private function join_where($delim, $keywords, $func) {
+		// Applied to each item to concatenate our entire statement.
+		$reduce_func = function($carry, $item) use(&$func) {
+			array_push($carry, $func($item));
+			return $carry;
+		};
+
+		// Manual array_reduce with a local lambda.
+		$acc = array(); // Initial
+		foreach ($keywords as &$keyword) {
+			$acc += $reduce_func($acc, $keyword);
+		}
+
+		return join(" $delim ", $acc);
+	}
+
 	/*
 	 * Performs a fulltext mysql search of the package database.
 	 *
@@ -480,6 +510,7 @@  class AurJSON {
 	 * @return mixed Returns an array of package matches.
 	 */
 	private function search($http_data) {
+
 		$keyword_string = $http_data['arg'];
 
 		if (isset($http_data['by'])) {
@@ -488,17 +519,49 @@  class AurJSON {
 			$search_by = 'name-desc';
 		}
 
+		if (isset($http_data['search_type'])) {
+			$search_type = $http_data['search_type'];
+		} else {
+			// Default search_type: rpc
+			$search_type = 'rpc';
+		}
+
+		if (!in_array($search_type, self::$exposed_search_types)) {
+			return $this->json_error('Unsupported search type.');
+		}
+
 		if ($search_by === 'name' || $search_by === 'name-desc') {
 			if (strlen($keyword_string) < 2) {
 				return $this->json_error('Query arg too small.');
 			}
-			$keyword_string = $this->dbh->quote("%" . addcslashes($keyword_string, '%_') . "%");
 
 			if ($search_by === 'name') {
-				$where_condition = "(Packages.Name LIKE $keyword_string)";
+				if ($search_type === 'rpc') {
+					$keyword_string = $this->dbh->quote(
+						"%" . addcslashes($keyword_string, '%_') . "%");
+					$where_condition = "(Packages.Name LIKE $keyword_string)";
+				} else {
+					$keywords = explode(' ', $keyword_string);
+					$func = function($keyword) {
+						$str = $this->dbh->quote("%" . addcslashes($keyword, '%_') . "%");
+						return "(Packages.Name LIKE $str)";
+					};
+					$where_condition = $this->join_where("AND", $keywords, $func);
+				}
 			} else if ($search_by === 'name-desc') {
-				$where_condition = "(Packages.Name LIKE $keyword_string OR ";
-				$where_condition .= "Description LIKE $keyword_string)";
+				if ($search_type === 'rpc') {
+					$keyword_string = $this->dbh->quote(
+						"%" . addcslashes($keyword_string, '%_') . "%");
+					$where_condition = "(Packages.Name LIKE $keyword_string ";
+					$where_condition .= "OR Description LIKE $keyword_string)";
+				} else {
+					$keywords = explode(' ', $keyword_string);
+					$func = function($keyword) {
+						$str = $this->dbh->quote("%" . addcslashes($keyword, '%_') . "%");
+						return "(Packages.Name LIKE $str OR Description LIKE $str)";
+					};
+					$where_condition = $this->join_where("AND", $keywords, $func);
+				}
 			}
 		} else if ($search_by === 'maintainer') {
 			if (empty($keyword_string)) {