10. More Python modules#

  • gzip - compresses/decompresses binary data, reads/writes data into *.gzip file.

  • zipfile - creates, adds, reads, writes to zip archives.

  • getpass - reads password from keyboard.

  • glob - finds files with matched regexp patterns.

  • json - encodes/decodes python object into json, reads/writes jason files.

  • argparse - passes parameters to a python script.

  • Exception handling.

10.1. Module gzip (Exercise)#

The module is utilized to

  • compress/decompress binary data

  • write data into compressed *.gz file

  • read data from compressed *.gz file

## See all the methods in the module
import gzip
data = '''
 Create some data. \nIn this particular case, \nthe data is just two sentences stored in a string\n 
 Line 4 \n
 Line 5 \n
 Line 6 \n 
 Line 7

 Create some data. 
In this particular case, 
the data is just two sentences stored in a string
 Line 4 

 Line 5 

 Line 6 
 Line 7
## Store data into gzip file, data.txt.gz.  The procedure is very similar to writing data into a regular file. 
## Notice the write option 'wt' is for writing text:

with gzip.open("data.txt.gz","wt") as f:
## Check if file data.txt.gz shows up in the directory:

import os
%ls data.txt.gz
## Let's check if we can read data back from the file. Notice the read option 'rt' is for reading text:

with gzip.open("data.txt.gz","rt") as f:
    data_read = f.read()
 Create some data. 
In this particular case, 
the data is just two sentences stored in a string
 Line 4 

 Line 5 

 Line 6 
 Line 7
## We cal also convert the data into a binary form, then write it into gzip file with 'wb' option
#data_byte = bytes(data, 'utf-8')
data_byte = data.encode('utf-8')
b'\n Create some data. \nIn this particular case, \nthe data is just two sentences stored in a string\n \n Line 4 \n\n Line 5 \n\n Line 6 \n \n Line 7\n '
## Store the binary data into gzip file
with gzip.open("data.txt.gz","wb") as f:
## This also can be done with GzipFile method:
with gzip.GzipFile("data.txt.gz","wb") as f:
## We can also read the gzip file into binary data type by using "rb" option:

with gzip.open("data.txt.gz","rb") as f:
    out_byte = f.read()
b'\n Create some data. \nIn this particular case, \nthe data is just two sentences stored in a string\n \n Line 4 \n\n Line 5 \n\n Line 6 \n \n Line 7\n '
## Compressing/decompressing data. It can be applied to binary data only.
## Check the sizes of text data, byte data, and compressed byte data:

import sys

size_txt = sys.getsizeof(data)

out_byte = data.encode('utf-8')
size_bin = sys.getsizeof(out_byte)

out_compressed = gzip.compress(out_byte)
size_compressed = sys.getsizeof(out_compressed)

print(size_txt, '>', size_bin, '>', size_compressed)
180 > 172 > 151

10.2. Module zipfile (Exercise)#

Works with zip archives:

  • Reads list of files and directories from zip archive

  • Extracts files from an archive

  • Reads specific file from an archive

  • Creates a new archive

  • Adds/appends file to the archive

10.2.1. Download a zip file:#

import os
os.system('wget https://linuxcourse.rutgers.edu/2024/html/files/test.zip')
--2025-01-01 17:18:41--  https://linuxcourse.rutgers.edu/2024/html/files/test.zip
Resolving linuxcourse.rutgers.edu (linuxcourse.rutgers.edu)...
Connecting to linuxcourse.rutgers.edu (linuxcourse.rutgers.edu)||:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 844 [application/zip]
Saving to: ‘test.zip.11’

                                       100% 50.2M=0s

2025-01-01 17:18:41 (50.2 MB/s) - ‘test.zip.11’ saved [844/844]
import zipfile
with zipfile.ZipFile('test.zip') as file:   
    # printing all the information of archive file contents using 'printdir' method
File Name                                             Modified             Size
NEW_DIR1/                                      2021-05-25 14:34:44            0
py_script.py                                   2024-01-10 17:14:22          198
list.txt                                       2024-01-10 17:14:00          105
ip.txt                                         2024-01-11 11:55:14           12
with zipfile.ZipFile('test.zip') as file:   
    # ZipFile.namelist() returns a list containing all the members with names of an archive file
['NEW_DIR1/', 'py_script.py', 'list.txt', 'ip.txt']
with zipfile.ZipFile('test.zip') as file:
    # Open and read the content of a file from the list:
    with file.open(name = file.namelist()[0], mode = 'r') as text_file:

 with zipfile.ZipFile('test.zip') as file:

        # ZipFile.infolist() returns a list containing all the members of an archive file
[<ZipInfo filename='NEW_DIR1/' filemode='drwxr-xr-x' external_attr=0x10>, <ZipInfo filename='py_script.py' compress_type=deflate filemode='-rwxr-xr-x' file_size=198 compress_size=134>, <ZipInfo filename='list.txt' compress_type=deflate filemode='-rw-rw-r--' file_size=105 compress_size=94>, <ZipInfo filename='ip.txt' filemode='-rw-rw-r--' file_size=12>]
## Extract the content of zip file into a new directory, EXTRACT_DIR

with zipfile.ZipFile('test.zip') as f:
import os
['list.txt', 'ip.txt', 'NEW_DIR1', 'py_script.py']
## Create an empty zipfile:

archive_name = 'new_test.zip'
f = zipfile.ZipFile(archive_name, 'w')
## Add a new file, /etc/hosts, to the existing archive. Ovewrite existing files there.
with zipfile.ZipFile('new_test.zip', 'w') as f:
## Append a new file, /etc/passwd. This doesn't overwrite the existing files.
with zipfile.ZipFile('new_test.zip', 'a') as f:
## Check the content of archive new_test.zip
with zipfile.ZipFile('new_test.zip','r') as f:
File Name                                             Modified             Size
etc/hosts                                      2024-12-03 19:56:42          179
etc/passwd                                     2024-07-27 16:12:20         1188

10.3. Module getpass (Exercise)#

  • Promps for password and reads it from stdin

  • Figures out the user name from the environment

import getpass
## Prompt for password and read user password:
p = getpass.getpass(prompt='Enter the password: ')
StdinNotImplementedError                  Traceback (most recent call last)
Cell In[32], line 1
----> 1 p = getpass.getpass(prompt='Enter the password: ')

File ~/miniconda3/lib/python3.12/site-packages/ipykernel/kernelbase.py:1256, in Kernel.getpass(self, prompt, stream)
   1254 if not self._allow_stdin:
   1255     msg = "getpass was called, but this frontend does not support input requests."
-> 1256     raise StdinNotImplementedError(msg)
   1257 if stream is not None:
   1258     import warnings

StdinNotImplementedError: getpass was called, but this frontend does not support input requests.
if p == 'this':
    print('correct password')
    print('password incorrect')
## identify the user and prompt in the loop until success. Maximum 10 attempts
user = getpass.getuser()

count = 0
max_count = 10

#while True:  
while count <= max_count:
    pwd = getpass.getpass("User Name : %s" % user)
    count += 1

    if pwd == 'password1':
        print("login successful")
        print("The password you entered is incorrect.")

10.4. Module glob (Exercise)#

  • Finds files matching a specified pattern and returns them in a list [ ]

  • * matches zero or more characters

  • ? matches one character

  • [0-9] matches any number

  • [a-z] matches any low case character

  • [A-Z] matches any upper case character

import glob
## Search for files specifically in /etc
## Search for files in directory and subdirectories /etc
## Recursive search in a directory tree
glob.glob('/etc/**/*.conf', recursive=True)
## For recursive search, it is better using iterative method `iglob` to save space in RAM. 
## The iglob runs iteratively so doesn't buffer the output unlike glob.

for filename in glob.iglob('/etc/**/*.conf', recursive=True):

10.5. Exception handling in Python (Exercise)#

## Example: find all conf files that contain keyword `network`.
## This will throw an error since some files in /etc are not readable by the user.

for filename in glob.iglob('/etc/**/*.conf', recursive=True):
    with open(filename) as f:
        line = f.read()
        if 'network' in line:

To allow the script to run and prompt us about errors, we run it within block

except ...
## Lets handle the exception above
for filename in glob.iglob('/etc/**/*.conf', recursive=True):
        with open(filename) as f:
            line = f.read()
            if 'network' in line:
    except PermissionError as e:
            print(f"can't read {filename}: {e}")

There is another, FileNotFoundError, error due to a symbolic link to a missing file.

We can handle multipple exceptions:

for filename in glob.iglob('/etc/**/*.conf', recursive=True):
        with open(filename) as f:
            line = f.read()
            if 'network' in line:
    except (PermissionError, FileNotFoundError) as e:
            print(f"can't read {filename}: {e}")

The last two exceptions we can catch with a generic OSError:

except OSError as e:

In numerical computations, you may need to deal with the division by zero. For example:

denominator = [1.2, 4.0, 0.001, 45.0, 0.0]

numerator = 0.1

for i in denominator:
    s = numerator / i

We need to put safe guards with

except ZeroDivisionError as e:
    for i in denominator:
      s = numerator / i

except ZeroDivisionError  as e:
    print('Division by zero, i = ', i)

10.6. JSON (Exercise)#

JSON is JavaScript Object Notation. It is used to store and transfer data.

  • jason.dump() writes data in json file

  • json.load() reads data from json file

  • json.dumps() encodes a python object (usually dictionary) into json

  • json.loads() decodes json into python object

10.6.1. Dump data to json file#

import json

# Define dictionary with services and their server IP addresses

services = {
  # for all servers in NJ (building : re)
{'dns': ['', ''],
'dhcp': '',
'ldapmaster': '' ,
'ldap': ['', ''],
'ravada': ''
  # For all the servers in FL (building : fl)
{'dns': ['', ''],
'dhcp': '',
'ldapmaster': '',
'ldap': '[,]',
'ravada': ''

# the json file where the output must be stored 
with  open("services.json", "w") as out_file:
    json.dump(services, out_file, indent = 4) 
%cat services.json

10.6.2. Read data from json file:#

with open('services.json','r') as f:
    data= json.load(f)
%cat services.json
for keys in data:

10.6.3. Encode a dictionary into json string:#

json_string = json.dumps(services)
print(json.dumps(services, indent=4))

10.6.4. Decode json string into a dictionary:#

dict = json.loads(json_string)

10.6.5. Get the values by referencing the keys in the dictionaries:#

for keys in dict:
    dict2 = dict[keys]

10.7. Parsing arguments with argparse (Exercise)#

It allows to pass parameters to a python script.

All the exercises below should run in the Linux command prompt. There are 3 python scripts in the directory: parser_1.py, parser_2.py, and parser_3.py


# Import the library
import argparse
# Create the parser
parser = argparse.ArgumentParser()
# Add an argument
parser.add_argument('--alpha', type=int, required=True)
# Parse the argument
args = parser.parse_args()

# use and print the argument
value = args.alpha * 10
print('alpha = ', args.alpha, f', value = {args.alpha} * 10 = ', value)

10.7.1. Run the script#

Save the above cell in a script, say parser_1.py, specify that it is a python script in the header:


Make it executable:

chmod a+x parser_1.py

Run the script with the parameter:

./parser_1.py --alpha 15

10.7.2. Multiple arguments with help options#

10.7.3. More arguments#

import argparse
parser = argparse.ArgumentParser()

parser.add_argument('--a1', type = str, required=True, help = 'First coefficient')
parser.add_argument('--a2', type = str, required=True, help = 'Second coefficient')
parser.add_argument('--a3', type = str, help = 'Third coefficient')

args = parser.parse_args()

s = f'{args.a1}*x^2 + {args.a2}*x + {args.a2}'


10.7.4. Invoking help#

Save the content of the cell above in file, say, parser_2.py, then run as follows:

./parser_2.py -h
./parser_2.py  --a1 34 --a2 55 --a3 56

10.7.5. Multiple arguments#

import argparse
parser = argparse.ArgumentParser()

parser.add_argument('--values', type=int, nargs=3)

args = parser.parse_args()
sum = sum(args.values)
print('Sum:', sum)

Run it with values:

./parser_3.py --values 4 8 9

For any number of values, set nargs = '+'

10.8. Reference#

Good tutorial on subparsers