Message ID | 20200705051007.30145-1-kevr.gtalk@gmail.com |
---|---|
State | New |
Headers | show |
Series | Support conjunctive keyword search in RPC interface | expand |
Alright, patch sent; I used `-v1` this time with `git send-email`... I can't find documentation for that switch yet. I've tested basic search through both paths; the only refactoring that needed to be done was to remove the extra "AND (" and ")" from the generic function, and add it where we need it in `pkg_search_page`. Then, we can reuse the same `construct_keyword_search` in rpc.
On Sun, 05 Jul 2020 at 01:10:07, Kevin Morris wrote: > Newly supported API Version 6 modifies `type=search` functionality; it > now behaves the same as `name` or `name-desc` search through the > https://aur.archlinux.org/packages/ search page. > > Search for packages containing the literal keyword `blah blah` AND `haha`: > https://aur.archlinux.org/rpc/?v=6&type=search&arg="blah blah"%20haha > > Search for packages containing the literal keyword `abc 123`: > https://aur.archlinux.org/rpc/?v=6&type=search&arg="abc 123" > > The following example searches for packages that contain `blah` AND `abc`: > https://aur.archlinux.org/rpc/?v=6&type=search&arg=blah%20abc > > The legacy method still searches for packages that contain `blah abc`: > https://aur.archlinux.org/rpc/?v=5&type=search&arg=blah%20abc > https://aur.archlinux.org/rpc/?v=5&type=search&arg=blah%20abc > > API Version 6 is currently only considered during a `search` of `name` or > `name-desc`. > > Note: This change was written as a solution to > https://bugs.archlinux.org/task/49133. > > PS: + Some spacing issues fixed in comments. > > Signed-off-by: Kevin Morris <kevr.gtalk@gmail.com> > --- > doc/rpc.txt | 4 ++++ > web/lib/aurjson.class.php | 29 +++++++++++++++++++++-------- > web/lib/pkgfuncs.inc.php | 10 +++++----- > 3 files changed, 30 insertions(+), 13 deletions(-) > [...] > @@ -492,13 +492,26 @@ class AurJSON { > if (strlen($keyword_string) < 2) { > return $this->json_error('Query arg too small.'); > } > - $keyword_string = $this->dbh->quote("%" . addcslashes($keyword_string, '%_') . "%"); > > if ($search_by === 'name') { > - $where_condition = "(Packages.Name LIKE $keyword_string)"; > + if ($this->version >= 6) { > + $where_condition = construct_keyword_search($this->dbh, > + $keyword_string, false); Do name and name-desc now behave exactly the same (see below)? If so, this change in behavior should be documented at least. I would have expected some more refactoring in construct_keyword_search() and one additional parameter being passed here though. Should we include pkgfuncs.inc.php from aurjson.class.php now? How does this currently get imported? Also, since the code for both name and name-desc are now very similar, can we refactor and share most of the code instead of copy pasting? To this end, it might be easier to switch the blocks (i.e. check for API version first, then check for request type). That would allow us to reuse the same assignment to $keyword_string as before and possibly the same construct_keyword_search() invocation too. > + } else { > + $keyword_string = $this->dbh->quote( > + "%" . addcslashes($keyword_string, '%_') . "%"); > + $where_condition = "(Packages.Name LIKE $keyword_string)"; > + } > } else if ($search_by === 'name-desc') { > - $where_condition = "(Packages.Name LIKE $keyword_string OR "; > - $where_condition .= "Description LIKE $keyword_string)"; > + if ($this->version >= 6) { > + $where_condition = construct_keyword_search($this->dbh, > + $keyword_string, true); See above. > + } else { > + $keyword_string = $this->dbh->quote( > + "%" . addcslashes($keyword_string, '%_') . "%"); > + $where_condition = "(Packages.Name LIKE $keyword_string "; > + $where_condition .= "OR Description LIKE $keyword_string)"; > + } > } > } else if ($search_by === 'maintainer') { > if (empty($keyword_string)) { > diff --git a/web/lib/pkgfuncs.inc.php b/web/lib/pkgfuncs.inc.php > index 8c915711..f6108e5a 100644 > --- a/web/lib/pkgfuncs.inc.php > +++ b/web/lib/pkgfuncs.inc.php > @@ -697,7 +697,9 @@ function pkg_search_page($params, $show_headers=true, $SID="") { > } > elseif (isset($params["SeB"]) && $params["SeB"] == "k") { > /* Search by keywords. */ > + $q_where .= " AND ( "; Do we need the space at the end? > $q_where .= construct_keyword_search($dbh, $params['K'], false); > + $q_where .= " )"; Same here, do we need the space before the closing parentheses? More importantly though, do we want to add a space after the parenthesis instead to match the previous behavior of SQL strings always ending with a space, so we can immediately append without a space? > } > elseif (isset($params["SeB"]) && $params["SeB"] == "N") { > /* Search by name (exact match). */ > @@ -709,7 +711,9 @@ function pkg_search_page($params, $show_headers=true, $SID="") { > } > else { > /* Keyword search (default). */ > + $q_where .= " AND ( "; > $q_where .= construct_keyword_search($dbh, $params['K'], true); > + $q_where .= " )"; Same here. > } > } > > @@ -876,11 +880,7 @@ function construct_keyword_search($dbh, $keywords, $namedesc) { > $op = "AND "; > } > > - if (!empty($q_keywords)) { > - $where_part = "AND (" . $q_keywords . ") "; > - } > - > - return $where_part; > + return $q_keywords; > } > > /** > -- > 2.20.1
On Sun, 05 Jul 2020 at 01:13:25, Kevin Morris wrote: > Alright, patch sent; I used `-v1` this time with `git send-email`... I > can't find documentation for that switch yet. I've tested basic search > through both paths; the only refactoring that needed to be done was to > remove the extra "AND (" and ")" from the generic function, and add it > where we need it in `pkg_search_page`. Then, we can reuse the same > `construct_keyword_search` in rpc. Thanks for the revision! I added some comments inline. It's common to not use any versioning for the first revision of a patch (e.g. send without -v1) but then start using (-v2, -v3, ...) for followup revisions. The -v argument is documented in the git-format-patch(1) man page, if you are using git-format-patch and git-send-email separately, you need to specify the argument when formatting the patch (however, in most cases, you can also run `git format patch` directly).
> > Do name and name-desc now behave exactly the same (see below)? If so, > this change in behavior should be documented at least. I would have > expected some more refactoring in construct_keyword_search() and one > additional parameter being passed here though. > > Should we include pkgfuncs.inc.php from aurjson.class.php now? How does > this currently get imported? > > Also, since the code for both name and name-desc are now very similar, > can we refactor and share most of the code instead of copy pasting? To > this end, it might be easier to switch the blocks (i.e. check for API > version first, then check for request type). That would allow us to > reuse the same assignment to $keyword_string as before and possibly the > same construct_keyword_search() invocation too. > I noticed an issue with pre-sanitizing $keyword_string as per `master` -- when searching by maintainer in rpc, a WHERE Username = $keyword_string is used without wildcards (%...%). As far as I know, wildcards should only be used in a LIKE expression? Could be wrong, but that's what I thought when originally modifying `aurjson.inc.php`'s `function search`. Thoughts? On Sun, Jul 5, 2020 at 6:33 AM Lukas Fleischer <lfleischer@archlinux.org> wrote: > On Sun, 05 Jul 2020 at 01:13:25, Kevin Morris wrote: > > Alright, patch sent; I used `-v1` this time with `git send-email`... I > > can't find documentation for that switch yet. I've tested basic search > > through both paths; the only refactoring that needed to be done was to > > remove the extra "AND (" and ")" from the generic function, and add it > > where we need it in `pkg_search_page`. Then, we can reuse the same > > `construct_keyword_search` in rpc. > > Thanks for the revision! I added some comments inline. > > It's common to not use any versioning for the first revision of a patch > (e.g. send without -v1) but then start using (-v2, -v3, ...) for > followup revisions. The -v argument is documented in the > git-format-patch(1) man page, if you are using git-format-patch and > git-send-email separately, you need to specify the argument when > formatting the patch (however, in most cases, you can also run `git > format patch` directly). >
> > Same here, do we need the space before the closing parentheses? More > importantly though, do we want to add a space after the parenthesis > instead to match the previous behavior of SQL strings always ending with > a space, so we can immediately append without a space? > This one can't hurt at all, the only thing it'd do is make construct_keyword_search more flexible. ++ on this, change will be in the next rev. On Sun, Jul 5, 2020 at 11:51 AM Kevin Morris <kevr.gtalk@gmail.com> wrote: > Do name and name-desc now behave exactly the same (see below)? If so, >> this change in behavior should be documented at least. I would have >> expected some more refactoring in construct_keyword_search() and one >> additional parameter being passed here though. >> >> Should we include pkgfuncs.inc.php from aurjson.class.php now? How does >> this currently get imported? >> >> Also, since the code for both name and name-desc are now very similar, >> can we refactor and share most of the code instead of copy pasting? To >> this end, it might be easier to switch the blocks (i.e. check for API >> version first, then check for request type). That would allow us to >> reuse the same assignment to $keyword_string as before and possibly the >> same construct_keyword_search() invocation too. >> > > I noticed an issue with pre-sanitizing $keyword_string as per `master` -- > when searching by maintainer in rpc, a WHERE Username = $keyword_string is > used without wildcards (%...%). As far as I know, wildcards should only be > used in a LIKE expression? Could be wrong, but that's what I thought when > originally modifying `aurjson.inc.php`'s `function search`. Thoughts? > > On Sun, Jul 5, 2020 at 6:33 AM Lukas Fleischer <lfleischer@archlinux.org> > wrote: > >> On Sun, 05 Jul 2020 at 01:13:25, Kevin Morris wrote: >> > Alright, patch sent; I used `-v1` this time with `git send-email`... I >> > can't find documentation for that switch yet. I've tested basic search >> > through both paths; the only refactoring that needed to be done was to >> > remove the extra "AND (" and ")" from the generic function, and add it >> > where we need it in `pkg_search_page`. Then, we can reuse the same >> > `construct_keyword_search` in rpc. >> >> Thanks for the revision! I added some comments inline. >> >> It's common to not use any versioning for the first revision of a patch >> (e.g. send without -v1) but then start using (-v2, -v3, ...) for >> followup revisions. The -v argument is documented in the >> git-format-patch(1) man page, if you are using git-format-patch and >> git-send-email separately, you need to specify the argument when >> formatting the patch (however, in most cases, you can also run `git >> format patch` directly). >> > > > -- > Kevin Morris > Software Developer > > Personal Inquiries: kevr.gtalk@gmail.com > Personal Phone: (415) 583-9687 > > Technologies: C, C++, Python, Django, Ruby, Rails, ReactJS, jQuery, > Javascript, SQL, Redux >
Okay, I was totally mistaken - `$keyword_string` is only ever sanitized in the right place (name/name-desc) in master. Yeah, I can rework this to be a bit cleaner as you recommended -- no problem. ++ on this, will be included in next rev. On Sun, Jul 5, 2020 at 11:51 AM Kevin Morris <kevr.gtalk@gmail.com> wrote: > Do name and name-desc now behave exactly the same (see below)? If so, >> this change in behavior should be documented at least. I would have >> expected some more refactoring in construct_keyword_search() and one >> additional parameter being passed here though. >> >> Should we include pkgfuncs.inc.php from aurjson.class.php now? How does >> this currently get imported? >> >> Also, since the code for both name and name-desc are now very similar, >> can we refactor and share most of the code instead of copy pasting? To >> this end, it might be easier to switch the blocks (i.e. check for API >> version first, then check for request type). That would allow us to >> reuse the same assignment to $keyword_string as before and possibly the >> same construct_keyword_search() invocation too. >> > > I noticed an issue with pre-sanitizing $keyword_string as per `master` -- > when searching by maintainer in rpc, a WHERE Username = $keyword_string is > used without wildcards (%...%). As far as I know, wildcards should only be > used in a LIKE expression? Could be wrong, but that's what I thought when > originally modifying `aurjson.inc.php`'s `function search`. Thoughts? > > On Sun, Jul 5, 2020 at 6:33 AM Lukas Fleischer <lfleischer@archlinux.org> > wrote: > >> On Sun, 05 Jul 2020 at 01:13:25, Kevin Morris wrote: >> > Alright, patch sent; I used `-v1` this time with `git send-email`... I >> > can't find documentation for that switch yet. I've tested basic search >> > through both paths; the only refactoring that needed to be done was to >> > remove the extra "AND (" and ")" from the generic function, and add it >> > where we need it in `pkg_search_page`. Then, we can reuse the same >> > `construct_keyword_search` in rpc. >> >> Thanks for the revision! I added some comments inline. >> >> It's common to not use any versioning for the first revision of a patch >> (e.g. send without -v1) but then start using (-v2, -v3, ...) for >> followup revisions. The -v argument is documented in the >> git-format-patch(1) man page, if you are using git-format-patch and >> git-send-email separately, you need to specify the argument when >> formatting the patch (however, in most cases, you can also run `git >> format patch` directly). >> > > > -- > Kevin Morris > Software Developer > > Personal Inquiries: kevr.gtalk@gmail.com > Personal Phone: (415) 583-9687 > > Technologies: C, C++, Python, Django, Ruby, Rails, ReactJS, jQuery, > Javascript, SQL, Redux >
diff --git a/doc/rpc.txt b/doc/rpc.txt index 3148ebea..b0f5c4e1 100644 --- a/doc/rpc.txt +++ b/doc/rpc.txt @@ -39,6 +39,10 @@ Examples `/rpc/?v=5&type=search&by=makedepends&arg=boost` `search` with callback:: `/rpc/?v=5&type=search&arg=foobar&callback=jsonp1192244621103` +`search` with API Version 6 for packages containing `cookie` AND `milk`:: + `/rpc/?v=6&type=search&arg=cookie%20milk` +`search` with API Version 6 for packages containing `cookie milk`:: + `/rpc/?v=6&type=search&arg="cookie milk"` `info`:: `/rpc/?v=5&type=info&arg[]=foobar` `info` with multiple packages:: diff --git a/web/lib/aurjson.class.php b/web/lib/aurjson.class.php index 0ac586fe..83ce502a 100644 --- a/web/lib/aurjson.class.php +++ b/web/lib/aurjson.class.php @@ -80,7 +80,7 @@ class AurJSON { if (isset($http_data['v'])) { $this->version = intval($http_data['v']); } - if ($this->version < 1 || $this->version > 5) { + if ($this->version < 1 || $this->version > 6) { return $this->json_error('Invalid version specified.'); } @@ -140,7 +140,7 @@ class AurJSON { } /* - * Check if an IP needs to be rate limited. + * Check if an IP needs to be rate limited. * * @param $ip IP of the current request * @@ -192,7 +192,7 @@ class AurJSON { $value = get_cache_value('ratelimit-ws:' . $ip, $status); if (!$status || ($status && $value < $deletion_time)) { if (set_cache_value('ratelimit-ws:' . $ip, $time, $window_length) && - set_cache_value('ratelimit:' . $ip, 1, $window_length)) { + set_cache_value('ratelimit:' . $ip, 1, $window_length)) { return; } } else { @@ -370,7 +370,7 @@ class AurJSON { } elseif ($this->version >= 2) { if ($this->version == 2 || $this->version == 3) { $fields = implode(',', self::$fields_v2); - } else if ($this->version == 4 || $this->version == 5) { + } else if ($this->version >= 4 && $this->version <= 6) { $fields = implode(',', self::$fields_v4); } $query = "SELECT {$fields} " . @@ -492,13 +492,26 @@ class AurJSON { if (strlen($keyword_string) < 2) { return $this->json_error('Query arg too small.'); } - $keyword_string = $this->dbh->quote("%" . addcslashes($keyword_string, '%_') . "%"); if ($search_by === 'name') { - $where_condition = "(Packages.Name LIKE $keyword_string)"; + if ($this->version >= 6) { + $where_condition = construct_keyword_search($this->dbh, + $keyword_string, false); + } else { + $keyword_string = $this->dbh->quote( + "%" . addcslashes($keyword_string, '%_') . "%"); + $where_condition = "(Packages.Name LIKE $keyword_string)"; + } } else if ($search_by === 'name-desc') { - $where_condition = "(Packages.Name LIKE $keyword_string OR "; - $where_condition .= "Description LIKE $keyword_string)"; + if ($this->version >= 6) { + $where_condition = construct_keyword_search($this->dbh, + $keyword_string, true); + } else { + $keyword_string = $this->dbh->quote( + "%" . addcslashes($keyword_string, '%_') . "%"); + $where_condition = "(Packages.Name LIKE $keyword_string "; + $where_condition .= "OR Description LIKE $keyword_string)"; + } } } else if ($search_by === 'maintainer') { if (empty($keyword_string)) { diff --git a/web/lib/pkgfuncs.inc.php b/web/lib/pkgfuncs.inc.php index 8c915711..f6108e5a 100644 --- a/web/lib/pkgfuncs.inc.php +++ b/web/lib/pkgfuncs.inc.php @@ -697,7 +697,9 @@ function pkg_search_page($params, $show_headers=true, $SID="") { } elseif (isset($params["SeB"]) && $params["SeB"] == "k") { /* Search by keywords. */ + $q_where .= " AND ( "; $q_where .= construct_keyword_search($dbh, $params['K'], false); + $q_where .= " )"; } elseif (isset($params["SeB"]) && $params["SeB"] == "N") { /* Search by name (exact match). */ @@ -709,7 +711,9 @@ function pkg_search_page($params, $show_headers=true, $SID="") { } else { /* Keyword search (default). */ + $q_where .= " AND ( "; $q_where .= construct_keyword_search($dbh, $params['K'], true); + $q_where .= " )"; } } @@ -876,11 +880,7 @@ function construct_keyword_search($dbh, $keywords, $namedesc) { $op = "AND "; } - if (!empty($q_keywords)) { - $where_part = "AND (" . $q_keywords . ") "; - } - - return $where_part; + return $q_keywords; } /**
Newly supported API Version 6 modifies `type=search` functionality; it now behaves the same as `name` or `name-desc` search through the https://aur.archlinux.org/packages/ search page. Search for packages containing the literal keyword `blah blah` AND `haha`: https://aur.archlinux.org/rpc/?v=6&type=search&arg="blah blah"%20haha Search for packages containing the literal keyword `abc 123`: https://aur.archlinux.org/rpc/?v=6&type=search&arg="abc 123" The following example searches for packages that contain `blah` AND `abc`: https://aur.archlinux.org/rpc/?v=6&type=search&arg=blah%20abc The legacy method still searches for packages that contain `blah abc`: https://aur.archlinux.org/rpc/?v=5&type=search&arg=blah%20abc https://aur.archlinux.org/rpc/?v=5&type=search&arg=blah%20abc API Version 6 is currently only considered during a `search` of `name` or `name-desc`. Note: This change was written as a solution to https://bugs.archlinux.org/task/49133. PS: + Some spacing issues fixed in comments. Signed-off-by: Kevin Morris <kevr.gtalk@gmail.com> --- doc/rpc.txt | 4 ++++ web/lib/aurjson.class.php | 29 +++++++++++++++++++++-------- web/lib/pkgfuncs.inc.php | 10 +++++----- 3 files changed, 30 insertions(+), 13 deletions(-)