[pacman-dev] Order downloads by descending max_size

Message ID 20210812035914.39053-1-softwaresale01@gmail.com
State Accepted, archived
Headers show
Series [pacman-dev] Order downloads by descending max_size | expand

Commit Message

Charlie Sale Aug. 12, 2021, 3:59 a.m. UTC
When downloading in parallel, sort by package size so that the larger
packages are queued first to fully leverage parallelism.
Addresses FS#70172

Signed-off-by: Charlie Sale <softwaresale01@gmail.com>
---
 lib/libalpm/dload.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

Comments

Charlie Sale Aug. 12, 2021, 4:12 a.m. UTC | #1
On Wed, Aug 11, 2021 at 11:59 PM Charlie Sale <softwaresale01@gmail.com> wrote:
>
> When downloading in parallel, sort by package size so that the larger
> packages are queued first to fully leverage parallelism.
> Addresses FS#70172
>
> Signed-off-by: Charlie Sale <softwaresale01@gmail.com>
> ---
>  lib/libalpm/dload.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
>
> diff --git a/lib/libalpm/dload.c b/lib/libalpm/dload.c
> index ca6be7b6..58d2122f 100644
> --- a/lib/libalpm/dload.c
> +++ b/lib/libalpm/dload.c
> @@ -825,6 +825,19 @@ cleanup:
>         return ret;
>  }
>
> +/*
> + * Use to sort payloads by max size in decending order (largest -> smallest)
> + */
> +static int compare_dload_payload_sizes(const void *left_ptr, const void *right_ptr)
> +{
> +       struct dload_payload *left, *right;
> +
> +       left = (struct dload_payload *) left_ptr;
> +       right = (struct dload_payload *) right_ptr;
> +
> +       return right->max_size - left->max_size;
> +}
> +
>  /* Returns -1 if an error happened for a required file
>   * Returns 0 if a payload was actually downloaded
>   * Returns 1 if no files were downloaded and all errors were non-fatal
> @@ -838,6 +851,10 @@ static int curl_download_internal(alpm_handle_t *handle,
>         int max_streams = handle->parallel_downloads;
>         int updated = 0; /* was a file actually updated */
>         CURLM *curlm = handle->curlm;
> +       size_t payloads_size = alpm_list_count(payloads);
> +
> +       /* Sort payloads by package size */
> +       payloads = alpm_list_msort(payloads, payloads_size, &compare_dload_payload_sizes);
>
>         while(active_downloads_num > 0 || payloads) {
>                 CURLMcode mc;
> --
> 2.32.0
>

I'll also add that this is my first contribution to pacman. Please
inform me of anything I did incorrectly regarding the patch process.
Thank you in advance for your patience :)

Cheers!
~Charlie
morganamilo Aug. 12, 2021, 5:38 a.m. UTC | #2
On 12/08/2021 05:12, Charlie Sale wrote:
> On Wed, Aug 11, 2021 at 11:59 PM Charlie Sale <softwaresale01@gmail.com> wrote:
>>
>> When downloading in parallel, sort by package size so that the larger
>> packages are queued first to fully leverage parallelism.
>> Addresses FS#70172
>>
>> Signed-off-by: Charlie Sale <softwaresale01@gmail.com>
>> ---
>>  lib/libalpm/dload.c | 17 +++++++++++++++++
>>  1 file changed, 17 insertions(+)
>>
>> diff --git a/lib/libalpm/dload.c b/lib/libalpm/dload.c
>> index ca6be7b6..58d2122f 100644
>> --- a/lib/libalpm/dload.c
>> +++ b/lib/libalpm/dload.c
>> @@ -825,6 +825,19 @@ cleanup:
>>         return ret;
>>  }
>>
>> +/*
>> + * Use to sort payloads by max size in decending order (largest -> smallest)
>> + */
>> +static int compare_dload_payload_sizes(const void *left_ptr, const void *right_ptr)
>> +{
>> +       struct dload_payload *left, *right;
>> +
>> +       left = (struct dload_payload *) left_ptr;
>> +       right = (struct dload_payload *) right_ptr;
>> +
>> +       return right->max_size - left->max_size;
>> +}
>> +
>>  /* Returns -1 if an error happened for a required file
>>   * Returns 0 if a payload was actually downloaded
>>   * Returns 1 if no files were downloaded and all errors were non-fatal
>> @@ -838,6 +851,10 @@ static int curl_download_internal(alpm_handle_t *handle,
>>         int max_streams = handle->parallel_downloads;
>>         int updated = 0; /* was a file actually updated */
>>         CURLM *curlm = handle->curlm;
>> +       size_t payloads_size = alpm_list_count(payloads);
>> +
>> +       /* Sort payloads by package size */
>> +       payloads = alpm_list_msort(payloads, payloads_size, &compare_dload_payload_sizes);
>>
>>         while(active_downloads_num > 0 || payloads) {
>>                 CURLMcode mc;
>> --
>> 2.32.0
>>
> 
> I'll also add that this is my first contribution to pacman. Please
> inform me of anything I did incorrectly regarding the patch process.
> Thank you in advance for your patience :)
> 
> Cheers!
> ~Charlie
> 

In my totally untested theory this would slow things down. The ideal
situation is to have 1 large download running to soak up bandwidth and
then many small downloads running so they can all be set up and handshake.

Have you got any benchmarks to comfirm/deny the ones in the bug report?
Allan McRae Aug. 12, 2021, 12:10 p.m. UTC | #3
On 12/8/21 3:38 pm, Morgan Adamiec wrote:
> 
> 
> On 12/08/2021 05:12, Charlie Sale wrote:
>> On Wed, Aug 11, 2021 at 11:59 PM Charlie Sale <softwaresale01@gmail.com> wrote:
>>>
>>> When downloading in parallel, sort by package size so that the larger
>>> packages are queued first to fully leverage parallelism.
>>> Addresses FS#70172
>>>
>>> Signed-off-by: Charlie Sale <softwaresale01@gmail.com>
>>> ---
>>>  lib/libalpm/dload.c | 17 +++++++++++++++++
>>>  1 file changed, 17 insertions(+)
>>>
>>> diff --git a/lib/libalpm/dload.c b/lib/libalpm/dload.c
>>> index ca6be7b6..58d2122f 100644
>>> --- a/lib/libalpm/dload.c
>>> +++ b/lib/libalpm/dload.c
>>> @@ -825,6 +825,19 @@ cleanup:
>>>         return ret;
>>>  }
>>>
>>> +/*
>>> + * Use to sort payloads by max size in decending order (largest -> smallest)
>>> + */
>>> +static int compare_dload_payload_sizes(const void *left_ptr, const void *right_ptr)
>>> +{
>>> +       struct dload_payload *left, *right;
>>> +
>>> +       left = (struct dload_payload *) left_ptr;
>>> +       right = (struct dload_payload *) right_ptr;
>>> +
>>> +       return right->max_size - left->max_size;
>>> +}
>>> +
>>>  /* Returns -1 if an error happened for a required file
>>>   * Returns 0 if a payload was actually downloaded
>>>   * Returns 1 if no files were downloaded and all errors were non-fatal
>>> @@ -838,6 +851,10 @@ static int curl_download_internal(alpm_handle_t *handle,
>>>         int max_streams = handle->parallel_downloads;
>>>         int updated = 0; /* was a file actually updated */
>>>         CURLM *curlm = handle->curlm;
>>> +       size_t payloads_size = alpm_list_count(payloads);
>>> +
>>> +       /* Sort payloads by package size */
>>> +       payloads = alpm_list_msort(payloads, payloads_size, &compare_dload_payload_sizes);
>>>
>>>         while(active_downloads_num > 0 || payloads) {
>>>                 CURLMcode mc;
>>> --
>>> 2.32.0
>>>
>>
>> I'll also add that this is my first contribution to pacman. Please
>> inform me of anything I did incorrectly regarding the patch process.
>> Thank you in advance for your patience :)
>>
>> Cheers!
>> ~Charlie
>>
> 
> In my totally untested theory this would slow things down. The ideal
> situation is to have 1 large download running to soak up bandwidth and
> then many small downloads running so they can all be set up and handshake.
> 
> Have you got any benchmarks to comfirm/deny the ones in the bug report?

I have done rough numbers...  Compared to random order, there is a net
gain.  Not a big gain over the range of parameters I tried, but a gain.
 This sort is unlikely to optimal, but optimality would be much harder
to achieve.

Allan

Patch

diff --git a/lib/libalpm/dload.c b/lib/libalpm/dload.c
index ca6be7b6..58d2122f 100644
--- a/lib/libalpm/dload.c
+++ b/lib/libalpm/dload.c
@@ -825,6 +825,19 @@  cleanup:
 	return ret;
 }
 
+/*
+ * Use to sort payloads by max size in decending order (largest -> smallest)
+ */
+static int compare_dload_payload_sizes(const void *left_ptr, const void *right_ptr)
+{
+	struct dload_payload *left, *right;
+
+	left = (struct dload_payload *) left_ptr;
+	right = (struct dload_payload *) right_ptr;
+
+	return right->max_size - left->max_size;
+}
+
 /* Returns -1 if an error happened for a required file
  * Returns 0 if a payload was actually downloaded
  * Returns 1 if no files were downloaded and all errors were non-fatal
@@ -838,6 +851,10 @@  static int curl_download_internal(alpm_handle_t *handle,
 	int max_streams = handle->parallel_downloads;
 	int updated = 0; /* was a file actually updated */
 	CURLM *curlm = handle->curlm;
+	size_t payloads_size = alpm_list_count(payloads);
+
+	/* Sort payloads by package size */
+	payloads = alpm_list_msort(payloads, payloads_size, &compare_dload_payload_sizes);
 
 	while(active_downloads_num > 0 || payloads) {
 		CURLMcode mc;