Ok, so there's a few reasons this was done.
1) Design-wise it isn't valid to have --proxy (or via config/otherwise) set a proxy, then unpredictably have it bypassed or disabled. If I specify `--proxy 127.0.0.1:8080`, I would expect it to use that proxy for all communication indefinitely, not switch in and out depending on the track or service.
2) With reason 1, it's also a security problem. The only reason I implemented it in the first place was so I could download faster on my home connection. This means I would authenticate and call APIs under a proxy, then suddenly download manifests and segments e.t.c under my home connection. A competent service could see that as an indicator of bad play and flag you.
3) Maintaining this setup across the codebase is extremely annoying, especially because of how proxies are setup/used by Requests in the Session. There's no way to tell a request session to temporarily disable the proxy and turn it back on later, without having to get the proxy from the session (in an annoying way) store it, then remove it, make the calls, then assuming your still in the same function you can add it back. If you're not in the same function, well, time for some spaghetti code.
---
tldr; -1 ux/design/expectations with CLI, -1 security aspect, -1 code maintenance, but only +1 for potentially increased download speeds in certain scenarios.
Chardet was detecting a mixture of mostly cp1252 and MacRoman encoding, where it should just be left as-is when parsing. The actual text within it perhaps may want to go through `try_ensure_utf8` when parsed, but not the entire box.
* Add option for automatic subtitle character encoding normalization
The rationale behind this function is that some services use ISO-8859-1
(latin1) or Windows-1252 (CP-1252) instead of UTF-8 encoding, whether
intentionally or accidentally. Some services even stream subtitles with
malformed/mixed encoding (each segment has a different encoding).
* Remove Subtitle parameter `auto_fix_encoding`
Just always attempt to fix encoding. If the subtitle is neither UTF-8 nor CP-1252, then it should realistically error out instead of producing garbage Subtitle data anyway.
* Move Subtitle encoding fixing code out of if drm tree
* Use chardet as a last ditch effort fixing Subs, or return original data
* Move Subtitle.fix_encoding method to utilities as try_ensure_utf8
* Add Shivelight as a contributor
---------
Co-authored-by: rlaphoenix <rlaphoenix@pm.me>
For aria2c I've simplified the operation by offloading most of the work for creating a cookie header by just re-doing what Python-requests does. This results in the exact same cookies Python-requests would have used in a requests.get() call or such. It supports multiple of the same-name cookies under different domains/paths based on the URI of the mock request.
This used to be used even before devine was public, but it was constantly changed back and forth between an urljoin(), another form of urljoin (something custom or something I can't remember), and an if check + addition.
However, I can confirm that a simple if check will not work as the Base URI might not even be in the same relative root. The if checks have also been inconsistent with some checking if it starts with http(s)://, and some checking if it does not have the base URI at the start of the string.
This if check method does not work as well as an urljoin() has the potential to. It also fixes some services as some HLS playlists would have the m3u8 URL on a completely different root, subdomain, or even domain, causing it to completely break when trying to download segments.
DASH and normal URL downloads now both decrypt one large single or merged file after all downloads are finished. This leaves a bit of a "pause" between progress bar movement which looks a bit odd. So mark the track as being in a Decrypting state.
Since DASH doesn't have the ability to change keys dynamically per-track (Representation), there's no need for the DASH downloader to decrypt segments as they are downloaded (like HLS).
This halves the amount of processes needing to be opened as well as the I/O usage. It may result in noticeably lower CPU usage. Since the IOPS is lowered, you may even see an increase in download speed if downloading to something like a meh HDD.
This also fixes decryption in some weird edge-cases where decrypting each segment individually resulted in timestamp anomalies causing shaka to fail.
Most people used --skip-dl just to license the DRM pre-v1.3.0. Which makes sense, --skip-dl is otherwise a pointless feature. I've fixed it so that --skip-dl worked like before, allowing license calls, while still supporting the new per-segment features post-v1.3.0.
Fixes#37
segments[0] is the first tuple, of two values. The URL and an optional byte range. So this accidentally passed the tuple rather than the URL within the tuple.
Fixes#54
This will improve efficiency and accuracy of getting appropriate DRM systems when downloading segments.
This can dramatically improve download speed from less than 50 kb/s to full speed if the HLS playlist used a lot of AES-128 EXT-X-KEYs. E.g., a unique key for each segment.
This was caused because the HLS.get_drm function took EVERY EXT-X-KEY, checked for supported systems, loaded them, and returned the supported objects. This meant it could load possibly 100s of AES-128 ClearKey objects (likely requiring URL downloads for the key URI) causing a huge delay before downloading each segment.
This results in a noticeably faster speed cancelling segmented track downloads on CTRL+C and Errors. It's also reducing code duplication as the dl code will now handle the exception and cleanup for them.
This also simplifies the STOPPING/STOPPED and FAILING/FAILED status messages by quite a bit.
Since I'm using `futures.as_completed()`, it will never ever for loop over all tracks and segments and will forever be stuck in the primary thread of the operation. I.e., main thread for the download track threads, or the track thread for the download segment threads.
I've also removed all future cancelled checks as they will never be cancelled before they get the chance to run, because no future cancel calls are made anymore.