Ok, so there's a few reasons this was done.
1) Design-wise it isn't valid to have --proxy (or via config/otherwise) set a proxy, then unpredictably have it bypassed or disabled. If I specify `--proxy 127.0.0.1:8080`, I would expect it to use that proxy for all communication indefinitely, not switch in and out depending on the track or service.
2) With reason 1, it's also a security problem. The only reason I implemented it in the first place was so I could download faster on my home connection. This means I would authenticate and call APIs under a proxy, then suddenly download manifests and segments e.t.c under my home connection. A competent service could see that as an indicator of bad play and flag you.
3) Maintaining this setup across the codebase is extremely annoying, especially because of how proxies are setup/used by Requests in the Session. There's no way to tell a request session to temporarily disable the proxy and turn it back on later, without having to get the proxy from the session (in an annoying way) store it, then remove it, make the calls, then assuming your still in the same function you can add it back. If you're not in the same function, well, time for some spaghetti code.
---
tldr; -1 ux/design/expectations with CLI, -1 security aspect, -1 code maintenance, but only +1 for potentially increased download speeds in certain scenarios.
Chardet was detecting a mixture of mostly cp1252 and MacRoman encoding, where it should just be left as-is when parsing. The actual text within it perhaps may want to go through `try_ensure_utf8` when parsed, but not the entire box.
* Add option for automatic subtitle character encoding normalization
The rationale behind this function is that some services use ISO-8859-1
(latin1) or Windows-1252 (CP-1252) instead of UTF-8 encoding, whether
intentionally or accidentally. Some services even stream subtitles with
malformed/mixed encoding (each segment has a different encoding).
* Remove Subtitle parameter `auto_fix_encoding`
Just always attempt to fix encoding. If the subtitle is neither UTF-8 nor CP-1252, then it should realistically error out instead of producing garbage Subtitle data anyway.
* Move Subtitle encoding fixing code out of if drm tree
* Use chardet as a last ditch effort fixing Subs, or return original data
* Move Subtitle.fix_encoding method to utilities as try_ensure_utf8
* Add Shivelight as a contributor
---------
Co-authored-by: rlaphoenix <rlaphoenix@pm.me>
For aria2c I've simplified the operation by offloading most of the work for creating a cookie header by just re-doing what Python-requests does. This results in the exact same cookies Python-requests would have used in a requests.get() call or such. It supports multiple of the same-name cookies under different domains/paths based on the URI of the mock request.
DASH and normal URL downloads now both decrypt one large single or merged file after all downloads are finished. This leaves a bit of a "pause" between progress bar movement which looks a bit odd. So mark the track as being in a Decrypting state.
Since DASH doesn't have the ability to change keys dynamically per-track (Representation), there's no need for the DASH downloader to decrypt segments as they are downloaded (like HLS).
This halves the amount of processes needing to be opened as well as the I/O usage. It may result in noticeably lower CPU usage. Since the IOPS is lowered, you may even see an increase in download speed if downloading to something like a meh HDD.
This also fixes decryption in some weird edge-cases where decrypting each segment individually resulted in timestamp anomalies causing shaka to fail.
Most people used --skip-dl just to license the DRM pre-v1.3.0. Which makes sense, --skip-dl is otherwise a pointless feature. I've fixed it so that --skip-dl worked like before, allowing license calls, while still supporting the new per-segment features post-v1.3.0.
Fixes#37
segments[0] is the first tuple, of two values. The URL and an optional byte range. So this accidentally passed the tuple rather than the URL within the tuple.
Fixes#54
This results in a noticeably faster speed cancelling segmented track downloads on CTRL+C and Errors. It's also reducing code duplication as the dl code will now handle the exception and cleanup for them.
This also simplifies the STOPPING/STOPPED and FAILING/FAILED status messages by quite a bit.
Since I'm using `futures.as_completed()`, it will never ever for loop over all tracks and segments and will forever be stuck in the primary thread of the operation. I.e., main thread for the download track threads, or the track thread for the download segment threads.
I've also removed all future cancelled checks as they will never be cancelled before they get the chance to run, because no future cancel calls are made anymore.
I've removed asyncio usage as it's generally unnecessary. If you want to run aria2c under a thread, run it under a thread. In the case for devine, this would take another thread, and would be another thread layer deep. Pointless. Would affect speed.
With this change I've been able to improve the aria2c progress capture code quite a bit.
It seems the commit I made to do this change initially seemed to help, it was actually pointless and issues I had were caused by other problems.
For consistency it seems best to stick with the logging module with the RichHandler applied. Using just console.log means being unable to control the log level and which level of logs appear.
I've moved log.info calls to console.log calls to reduce conflicts of logs at the same time as console refreshes, but also to reduce unnecessary log level text being printed to the console.
We don't need to know if a log is an `info` level log, but I've kept log.error's and such as we would want to know if a log is an error log and such.
Turns out, even if you manually set the Range header AND the server has full support, it does not work. It will act like it works, but it seems internally aria2c gets confused on what bytes it requested, what it returned, and it will either just download the full file, or the range requested (but still complain, and freeze!).
Yikes.
Just like the commit for HLS multi-threading, this mimics the -j=16 system of aria2c, but manually via a ThreadPoolExecutor. Benefits of this is we still keep support for the new system, and we now get a useful progress bar via TQDM on segmented downloads, unlike aria2c which essentially fills the terminal with jumbled download progress stubs.