How to download files from the internet with Python

How to download files from the internet with Python

Hello world, in this tutorial we will learn how to download files from the internet with Python. In the course of web scraping the need to download data from the internet. You might build a web crawler that finds and download pictures from the internet or you might want to download audio or video files from a website. All this actions needs to be done autonomously with little or no help from the user.

Setting up:

We will need Python’s requests module for this project. You can install the required module by running the commands on your terminal:

pip3 install requests

Let’s import our installed module:

import requests

The Python code:

We need to declare our user agent, this will allow us to download stuff from the internet without being flag as a bot. Rather our script will be flagged as a browser being used by a human.

We will be using the Mozilla user agent and unbuntu as our local operating system:

user_agent = {'User-agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0'}

For example purposes we will assume we already know the name of the file, let’s call it “test.jpg”. Let’s create the file and get it ready for writing in binary mode.

filename = "test.jpg"
with open(filename,'wb') as filem:

Next we need to open our file link using the requests module and parse our user agent as the header. We also need to set the stream flag to True. This tells the requests module we are downloading.

with open(filename,'wb') as filem:
     data = requests.get(url,stream=True,headers=user_agent,allow_redirects=False)

Using the returned data from our requests module, we get the file size from the website’s headers.

with open(filename,'wb') as filem:
     data = requests.get(url,stream=True,headers= user_agent,allow_redirects=False)
     print(data.headers)
     print("************")
     chunks =  int(data.headers['Content-length'])
     print("FILESIZE : %f"%chunks/1024)

we use requests iter_content function to iterate through the file content and write them to our file, and we catch exceptions while doing it.

try:
    for chunk in data.iter_content(chunk_size = chunks):
        filem.write(chunk)
    print("Done downloading!")
except Exception as e:
    print("sorry couldn't download file because %s"%str(e))

Putting it together:

To make the above code cleaner we turn it into a function that we can call anytime we want instead of rewriting the whole code.


def download(filename,url):
    with open(filename,'wb') as filem:
         data = requests.get(url,stream=True,headers = user_agent,allow_redirects=False)
         print(data.headers)
         print("************")
         chunks =  int(data.headers['Content-length'])
         print("FILENAME: %s"%filename)
         print("FILESIZE : %f"%chunks/1024)
         print("Downloading %s........."%filname)
         try:
            for chunk in data.iter_content(chunk_size = chunks):filem.write(chunk)
             return("Done downloading!")
         except Exception as e:
             return("sorry couldn't download file because %s"%str(e))

That’s how to download files from the internet with Python using the requests module. There is still room for improvement, you might want to add a progress bar to show how much has been downloaded.

Leave a Reply

Your email address will not be published. Required fields are marked *