In this article, we will describe various libraries offered by Python to allow us to download files. We have gathered all of the information and details that can help you to download a file using Python.
Here are some different ways to download a file in Python easily.
Download File with Wget Function
In the wget function, we do not need to perform this extra step while using the wget function. The wget function offers a function named "download," which accepts two parameters:
- 1st parameter: URL to the downloadable resource file
- 2nd parameter: Path to the local file system where the downloaded file is to be stored.
Example:
import wget
myurl = input("Enter url: ")
wget.download(myurl , 'D:\python')
Output:
Enter url: https://d2fg0sxb1esmnr.cloudfront.net/site-img/logo.png
100% [..............................................................................] 11231 / 11231
Download File with urllib Package
This package facilitates the python developers to incorporate the file downloading feature in their website, cross-platform applications, etc.
urllib.request() is the method that needs two parameters in order to download the file.
- 1st parameter: URL to the downloadable resource file
- 2nd parameter: Path to the local file system where the downloaded file is to be stored.
Before running the sample code, you must ensure to install urllib package into your system by executing the following command:
pip install urllib
This Package will soon get deprecated in the later versions of Python. So, better to use the urllib3 Package after upgrading the python version from 2.0 to 3.5 and above.
python -m pip install urllib3
Example:
import urllib.request
myUrl = input("Enter url:")
#Linux
urllib.request.urlretrieve(myUrl, '/User/Downloads/xyz.jpg')
#Windows
#urllib.request.urlretrieve(myUrl,"D:\\python\\xyz.jpg")
In the 1st line of the above code, we imported the required module. Then we created a variable that holds the string representing the URL of the downloadable file resource. This variable points to the online resource.
In the last line of the code, we called the method by passing two parameters, i.e., the URL pointing to the online resource. The second represents the path where the downloadable resource is to be stored.
After running the above code snippet, we can access the downloaded file in the local system's document folder with a file named "car.jpg".
We can provide any name to the downloaded resources through the code. We have to make sure that the path to the folder should be accessed with the least privilege (without any special permission required from the administrator of the local system).
Ensure that urlretrieve is considered the 1st version of the Python-defined module of the legacy code. So, there might be a chance that this module may not be present at the later version of python releases.
If we are using the Python2 version, then it is advised to use this snippet to implement the desired functionality as it can be found as one of the simplest ways to download the file from online resources.
Download File with Proxy Module
Some developers may need some files which are restricted to download using networks belonging to certain regions. To facilitate the users, developers incorporate the proxy service in their websites to allow the users to download such files.
Example:
from urllib2 import Request
>>>setNewProxy = urllib.request.ProxyHandler({'http': '123.12.21.98'})
>>> connProxy= urllib.request.build_opener(setNewProxy)
>>> urllib.request.urlretrieve('https://nameYourWebsite.com/')
In the above code, we have created the proxy object named setNewProxy, bypassing the virtual IP address of your device.
We established the connection to the proxy server using build_opener(), bypassing the proxy object in it. In the final step, we retrieve the resource using the urlretrieve() method of the request class.
Download File with urllib2 Package
This is the alternate way to download the file from an online resource. This method requires only one parameter to fetch the document. This Package is deprecated in Python3 and the above versions. In order to use the urllib2 version, it is already included in the urllib3 version. So it is advised to shift your project to the Python3 version and above to avoid getting incompatible issues.
python -m pip install urllib3
The urllib2 package contains a urlopen() method that accepts the URL to the downloadable online resource. This method returns the object which points to that required resource.
Example:
import urllib2
Myurl = input("Enter url :")
fileFetchededata = urllib2.urlopen(myurl)
dataforwrite = fileFetchededata.read()
with open('/your/local/path/xyz.jpg', 'wb') as myfile:
myfile.write(dataforwrite)
Firstly, in the above code, we imported the urllib2 package, which offers the urlopen method to fetch the file data object from an online resource. urlopen accepts one parameter, i.e., URL, in the form of a string.
The fileFetchededata()is the variable that holds the fetched file's data in the form of an object. We need to copy the data from this object and add it to our local system's desired file.
After storing the fetched file in the data object, we used the open() method to write the data from the object to our file named myfile. This method again accepts two parameters:
- Local system path where the downloaded file is going to store.
- Mode of storing the file. Here "wb" shows the open() method has the necessary permission to write the data from the object to that into myfile variable.
We can look into the downloaded file created by navigating into the directory mentioned in the python script.
This Package is now added in the request() method in Python3. So, we can not use this method in the Python2 version.
So, before starting the project, we need to make sure the versions we are going to use, and based on that, we can select the desired packages to be used; otherwise, there may be a chance of version incompatibility.
Download File with Request Function
This method is specially built for the Python3 version and includes all the features present in the methods of urllib2.
This Package downloads the file in binary format. We can use the open() method in the previous code example to fetch the human-readable data from this binary code format. The open() method copies the data from the binary formatted file into the desired file.
Like the above scenarios, this code also creates the file in the python script's path.
Example:
import requests
myurl = input("Enter url :")
req = requests.get(myurl )
with open('/your/local/path/myCar.jpg', 'wb') as myfile:
myfile.write(req.content)
# Accessing HTTP meta-data
print(req.encoding)
print(req.headers['content-type'])
print(req.status_code)
In various applications, developers build cross-platform API, multi-page websites. In such scenarios, it may be required to access some file information such as meta-data. The request method offers few constants (a few of them mentioned in the above code).
This meta-data can generate the request and pass it to another HTTP request to perform other development-related activities. (This is just an example).
The request function provides a wide range of features to the python developers that easily do web-scraping-related activities.
The request package's main advantage is that it is backward compatible and can be used in Python2.7 also. So, in general, developers can use this package in a lot more projects without facing any version-related problems.
Download File with Subprocess Module
Subprocess module is a module in python to run system commands from python code. In Linux, we have some commands to download files from URL, Two most popular commands are :
wget and curl
Example:
import subprocess
subprocess.run(' curl www.picsum.photos/200 --output abc.jpg ' )
subprocess.run(' wget www.picsum.photos/200 ' )
Here using subprocess, we are running commands in the system, and we can run any system command from this module curl, and wget are Linux commands to download files from URL.
Handling Large File Downloads
Request package offers many more functions and flags to make developers enable the download of large files easier for the users.
There is a flag named "stream," which can be set to true. This will tell the request.get() method to download only the header of the file and store it as an object. While this happens, the connection with the URL remains open.
A built-in iterator is made to iterate through the file object to fetch the data in a large number of small chunks and store it in the desired documents.
Example:
req = requests.get(myurl, Stream=True)
with open("myfilename.pdf",'wb') as myPypdf:
for current_chunk in req.iter_content(chunk_size=1024)
if current_chunk :
myPypdf.write(ch)
We can see from the above code and we also have the privilege to set the chunk size as per our wish. The iter_content is the built-in iterator that iterates throughout the data abject and writes it in the specified document in our local system.
Advantage of Request Package Over other Methods
There are few scenarios, and we observe that while downloading few files, as we click on the download button, we get redirected to some other website. So, these redirections sometimes become complicated to handle.
Request methods offer additional functionalities to the developers to do the same easily.
Example:
import requests
myurl = 'insert url'
myresponse = requests.get(myurl , allow_redirects=True)
with open('filename.pdf') as myPypdf:
myPypdf.write(myresponse .content)
To handle the redirections, we need to put the allow_redirects variable value equal to true.
Download File with Asyncio Module
There may be a situation where a developer may need to download multiple files by making the download process periodically. Multiple files can be downloaded asynchronously using the asyncio module.
Asyncio module works by keeping an eye on the system events. Whenever there is an event, asyncio starts downloading the file as soon as it receives the interrupt from the system.
We need to install the aiohttp module to implement the feature successfully. We can install the respective module using the following command in cmd:
pip install aiohttp
pip install asyncio
Example:
import asyncio
from contextlib import closing
import aiohttp
async def FileDownload(session: aiohttp.ClientSession, url: str):
async with session.get(url) as response:
assert response.status == 200
# For large files we can use response.content.read(chunk_size) instead.
return url, await response.read()
@asyncio.coroutine
def DownloadMultipleFiles(session: aiohttp.ClientSession):
myUrls = (
'http://youtube.com,
'http://gaana.com',
'http://xyzabc.com'
)
myDownloads = [FileDownload(session, url) for url in myUrls]
print('Results')
for download_future in asyncio.as_completed(myDownloads):
result = yield from myDownloads
print('finished:', result)
return myUrls
def main():
with closing(asyncio.get_event_loop()) as obj1:
with aiohttp.ClientSession() as period:
myresult = obj1.run_until_complete(DownloadMultipleFiles(period))
print('Download finished:', myresult)
main()
Conclusion
We saw that urllib and urllib2 packages would get deprecated from python three versions and above. To use the same functionality, we can use the requests module of python three and install urllib3 in our system.
For avoiding version incompatibility, it is advised to use the urllib3 or requests module to perform the above-required operation.
request package handles large file downloads in their way. It also made the developers handle the redirections within the websites easily.
In our opinion, the wget function is very easy to use because we do not need to explicitly copy the data from the binary-fetched file into a locally created blank file. So, this reduces our work.
Finally, we can prefer to use request methods as it offers a wide range of in-built features. The wget package is becoming much handier with the latest python release. Also, developers now prefer to work with file downloading-related activities using the wget and the request package.