10. More Python modules#
gzip
- compresses/decompresses binary data, reads/writes data into *.gzip file.zipfile
- creates, adds, reads, writes to zip archives.getpass
- reads password from keyboard.glob
- finds files with matched regexp patterns.json
- encodes/decodes python object into json, reads/writes jason files.argparse
- passes parameters to a python script.Exception handling.
10.1. Module gzip (Exercise)#
The module is utilized to
compress/decompress binary data
write data into compressed *.gz file
read data from compressed *.gz file
## See all the methods in the module
import gzip
data = '''
Create some data. \nIn this particular case, \nthe data is just two sentences stored in a string\n
Line 4 \n
Line 5 \n
Line 6 \n
Line 7
'''
print(data)
Create some data.
In this particular case,
the data is just two sentences stored in a string
Line 4
Line 5
Line 6
Line 7
## Store data into gzip file, data.txt.gz. The procedure is very similar to writing data into a regular file.
## Notice the write option 'wt' is for writing text:
with gzip.open("data.txt.gz","wt") as f:
f.write(data)
## Check if file data.txt.gz shows up in the directory:
import os
os.listdir()
['test.zip.16',
'python_intro.ipynb',
'.Lesson_1.ipynb.layout',
'python_extra_modules.ipynb',
'test.zip.7',
'final_exam_fall24.ipynb',
'Lesson_1.ipynb',
'final_exam24.ipynb',
'test.zip.13',
'notebooks.ipynb',
'midterm2024_qa.ipynb',
'shell_scripting.ipynb',
'new_test.zip',
'test.zip.3',
'lessons',
'placeholder_midterm2023.ipynb',
'midterm_exercises.ipynb',
'NFS.ipynb',
'OpenMP.ipynb',
'index.md',
'_build',
'img',
'logo.png',
'homework4.ipynb',
'Final_exam_preparation.ipynb',
'references.bib',
'midterm2023_qa.ipynb',
'systemd.ipynb',
'final_exam22_Q-and-A.ipynb',
'_config.yml',
'EXTRACT_DIR',
'smb.ipynb',
'.ipynb_checkpoints',
'homework2.ipynb',
'passwd.txt',
'test.zip.11',
'compilation.ipynb',
'python_scripting2.ipynb',
'test.zip.9',
'test.zip.4',
'midterm2023.ipynb',
'homework1.ipynb',
'homework3.ipynb',
'test.zip.6',
'requirements.txt',
'conf.py',
'intro.md',
'test.zip.15',
'final_exam',
'test.zip.5',
'final_exam_fall24_qa.ipynb',
'Untitled.ipynb',
'markdown.md',
'test.zip.8',
'final_exam23.ipynb',
'test.zip.17',
'data.txt.gz',
'test.zip.2',
'test.zip.12',
'final_exam23qa.ipynb',
'test.zip.14',
'security.ipynb',
'test.zip.1',
'final_exam24qa.ipynb',
'linux_packaging.ipynb',
'midterm_fall_2024.ipynb',
'midterm_summer_2022.ipynb',
'markdown-notebooks.md',
'test.zip',
'midterm2024.ipynb',
'MPI.ipynb',
'test.zip.10',
'_toc.yml',
'homework6.ipynb',
'python_scripting.ipynb',
'virtualization.ipynb',
'networking.ipynb']
%ls data.txt.gz
data.txt.gz
## Let's check if we can read data back from the file. Notice the read option 'rt' is for reading text:
with gzip.open("data.txt.gz","rt") as f:
data_read = f.read()
print(data_read)
Create some data.
In this particular case,
the data is just two sentences stored in a string
Line 4
Line 5
Line 6
Line 7
## We cal also convert the data into a binary form, then write it into gzip file with 'wb' option
#data_byte = bytes(data, 'utf-8')
data_byte = data.encode('utf-8')
type(data_byte)
bytes
print(data_byte)
b'\n Create some data. \nIn this particular case, \nthe data is just two sentences stored in a string\n \n Line 4 \n\n Line 5 \n\n Line 6 \n \n Line 7\n '
## Store the binary data into gzip file
with gzip.open("data.txt.gz","wb") as f:
f.write(data_byte)
## This also can be done with GzipFile method:
with gzip.GzipFile("data.txt.gz","wb") as f:
f.write(data_byte)
## We can also read the gzip file into binary data type by using "rb" option:
with gzip.open("data.txt.gz","rb") as f:
out_byte = f.read()
print(out_byte)
b'\n Create some data. \nIn this particular case, \nthe data is just two sentences stored in a string\n \n Line 4 \n\n Line 5 \n\n Line 6 \n \n Line 7\n '
## Compressing/decompressing data. It can be applied to binary data only.
## Check the sizes of text data, byte data, and compressed byte data:
import sys
size_txt = sys.getsizeof(data)
out_byte = data.encode('utf-8')
size_bin = sys.getsizeof(out_byte)
out_compressed = gzip.compress(out_byte)
size_compressed = sys.getsizeof(out_compressed)
print(size_txt, '>', size_bin, '>', size_compressed)
180 > 172 > 151
10.2. Module zipfile (Exercise)#
Works with zip archives:
Reads list of files and directories from zip archive
Extracts files from an archive
Reads specific file from an archive
Creates a new archive
Adds/appends file to the archive
10.2.1. Download a zip file:#
import os
os.system('wget https://linuxcourse.rutgers.edu/2024/html/files/test.zip')
--2025-01-21 19:28:21-- https://linuxcourse.rutgers.edu/2024/html/files/test.zip
Resolving linuxcourse.rutgers.edu (linuxcourse.rutgers.edu)... 128.6.238.10
Connecting to linuxcourse.rutgers.edu (linuxcourse.rutgers.edu)|128.6.238.10|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 844 [application/zip]
Saving to: ‘test.zip.18’
0K
0
100% 45.7M=0s
2025-01-21 19:28:21 (45.7 MB/s) - ‘test.zip.18’ saved [844/844]
import zipfile
with zipfile.ZipFile('test.zip') as file:
# printing all the information of archive file contents using 'printdir' method
print(file.printdir())
File Name Modified Size
NEW_DIR1/ 2021-05-25 14:34:44 0
py_script.py 2024-01-10 17:14:22 198
list.txt 2024-01-10 17:14:00 105
ip.txt 2024-01-11 11:55:14 12
None
with zipfile.ZipFile('test.zip') as file:
# ZipFile.namelist() returns a list containing all the members with names of an archive file
print(file.namelist())
['NEW_DIR1/', 'py_script.py', 'list.txt', 'ip.txt']
with zipfile.ZipFile('test.zip') as file:
# Open and read the content of a file from the list:
with file.open(name = file.namelist()[0], mode = 'r') as text_file:
print(text_file.read().decode('utf-8'))
with zipfile.ZipFile('test.zip') as file:
# ZipFile.infolist() returns a list containing all the members of an archive file
print(file.infolist())
[<ZipInfo filename='NEW_DIR1/' filemode='drwxr-xr-x' external_attr=0x10>, <ZipInfo filename='py_script.py' compress_type=deflate filemode='-rwxr-xr-x' file_size=198 compress_size=134>, <ZipInfo filename='list.txt' compress_type=deflate filemode='-rw-rw-r--' file_size=105 compress_size=94>, <ZipInfo filename='ip.txt' filemode='-rw-rw-r--' file_size=12>]
## Extract the content of zip file into a new directory, EXTRACT_DIR
with zipfile.ZipFile('test.zip') as f:
f.extractall('EXTRACT_DIR')
import os
os.listdir('EXTRACT_DIR')
['list.txt', 'ip.txt', 'NEW_DIR1', 'py_script.py']
## Create an empty zipfile:
archive_name = 'new_test.zip'
f = zipfile.ZipFile(archive_name, 'w')
f.close()
## Add a new file, /etc/hosts, to the existing archive. Ovewrite existing files there.
with zipfile.ZipFile('new_test.zip', 'w') as f:
f.write('/etc/hosts')
## Append a new file, /etc/passwd. This doesn't overwrite the existing files.
with zipfile.ZipFile('new_test.zip', 'a') as f:
f.write('/etc/passwd')
## Check the content of archive new_test.zip
with zipfile.ZipFile('new_test.zip','r') as f:
print(f.printdir())
File Name Modified Size
etc/hosts 2024-12-03 19:56:42 179
etc/passwd 2024-07-27 16:12:20 1188
None
10.3. Module getpass (Exercise)#
Promps for password and reads it from stdin
Figures out the user name from the environment
import getpass
## Prompt for password and read user password:
p = getpass.getpass(prompt='Enter the password: ')
---------------------------------------------------------------------------
StdinNotImplementedError Traceback (most recent call last)
Cell In[32], line 1
----> 1 p = getpass.getpass(prompt='Enter the password: ')
File ~/miniconda3/lib/python3.12/site-packages/ipykernel/kernelbase.py:1256, in Kernel.getpass(self, prompt, stream)
1254 if not self._allow_stdin:
1255 msg = "getpass was called, but this frontend does not support input requests."
-> 1256 raise StdinNotImplementedError(msg)
1257 if stream is not None:
1258 import warnings
StdinNotImplementedError: getpass was called, but this frontend does not support input requests.
if p == 'this':
print('correct password')
else:
print('password incorrect')
## identify the user and prompt in the loop until success. Maximum 10 attempts
user = getpass.getuser()
count = 0
max_count = 10
#while True:
while count <= max_count:
pwd = getpass.getpass("User Name : %s" % user)
count += 1
if pwd == 'password1':
print("login successful")
break
else:
print("The password you entered is incorrect.")
10.4. Module glob (Exercise)#
Finds files matching a specified pattern and returns them in a list [ ]
*
matches zero or more characters?
matches one character[0-9] matches any number
[a-z] matches any low case character
[A-Z] matches any upper case character
import glob
glob.glob('*.zip')
glob.glob('*[0-9].zip')
glob.glob('*[a-z].zip')
glob.glob('[A-Z]*[0-9].zip')
glob.glob('[n,I]e*')
## Search for files specifically in /etc
glob.glob('/etc/*.conf')
## Search for files in directory and subdirectories /etc
glob.glob('/etc/*/*.conf')
## Recursive search in a directory tree
glob.glob('/etc/**/*.conf', recursive=True)
## For recursive search, it is better using iterative method `iglob` to save space in RAM.
## The iglob runs iteratively so doesn't buffer the output unlike glob.
for filename in glob.iglob('/etc/**/*.conf', recursive=True):
print(filename)
10.5. Exception handling in Python (Exercise)#
## Example: find all conf files that contain keyword `network`.
## This will throw an error since some files in /etc are not readable by the user.
for filename in glob.iglob('/etc/**/*.conf', recursive=True):
with open(filename) as f:
line = f.read()
if 'network' in line:
print(filename)
To allow the script to run and prompt us about errors, we run it within block
try:
...
except ...
## Lets handle the exception above
for filename in glob.iglob('/etc/**/*.conf', recursive=True):
try:
with open(filename) as f:
line = f.read()
if 'network' in line:
print(filename)
except PermissionError as e:
print(f"can't read {filename}: {e}")
There is another, FileNotFoundError
, error due to a symbolic link to a missing file.
We can handle multipple exceptions:
for filename in glob.iglob('/etc/**/*.conf', recursive=True):
try:
with open(filename) as f:
line = f.read()
if 'network' in line:
print(filename)
except (PermissionError, FileNotFoundError) as e:
print(f"can't read {filename}: {e}")
The last two exceptions we can catch with a generic OSError
:
except OSError as e:
In numerical computations, you may need to deal with the division by zero. For example:
denominator = [1.2, 4.0, 0.001, 45.0, 0.0]
numerator = 0.1
for i in denominator:
s = numerator / i
print(s)
We need to put safe guards with
except ZeroDivisionError as e:
try:
for i in denominator:
s = numerator / i
print(s)
except ZeroDivisionError as e:
print('Division by zero, i = ', i)
10.6. JSON (Exercise)#
JSON is JavaScript Object Notation. It is used to store and transfer data.
jason.dump() writes data in json file
json.load() reads data from json file
json.dumps() encodes a python object (usually dictionary) into json
json.loads() decodes json into python object
10.6.1. Dump data to json file#
import json
# Define dictionary with services and their server IP addresses
services = {
# for all servers in NJ (building : re)
're':
{'dns': ['192.168.3.250', '192.168.11.104'],
'dhcp': '192.168.3.250',
'ldapmaster': '192.168.3.100' ,
'ldap': ['192.168.3.100', '192.168.11.104'],
'ravada': '192.168.3.40'
},
# For all the servers in FL (building : fl)
'fl':
{'dns': ['192.168.11.104', '192.168.3.250'],
'dhcp': '192.168.11.104',
'ldapmaster': '192.168.3.100',
'ldap': '[192.168.11.104, 192.168.3.100]',
'ravada': '192.168.11.100'
}
}
# the json file where the output must be stored
with open("services.json", "w") as out_file:
json.dump(services, out_file, indent = 4)
%cat services.json
10.6.2. Read data from json file:#
with open('services.json','r') as f:
data= json.load(f)
%cat services.json
data
type(data)
for keys in data:
print(keys,':',data[keys],'\n')
10.6.3. Encode a dictionary into json string:#
json_string = json.dumps(services)
print(json_string)
print(json.dumps(services, indent=4))
10.6.4. Decode json string into a dictionary:#
dict = json.loads(json_string)
print(dict)
10.6.5. Get the values by referencing the keys in the dictionaries:#
for keys in dict:
dict2 = dict[keys]
print(dict2['dns'])
10.7. Parsing arguments with argparse (Exercise)#
It allows to pass parameters to a python script.
All the exercises below should run in the Linux command prompt. There are 3 python scripts in the directory: parser_1.py, parser_2.py, and parser_3.py
Exersizes:
# Import the library
import argparse
# Create the parser
parser = argparse.ArgumentParser()
# Add an argument
parser.add_argument('--alpha', type=int, required=True)
# Parse the argument
args = parser.parse_args()
# use and print the argument
value = args.alpha * 10
print('alpha = ', args.alpha, f', value = {args.alpha} * 10 = ', value)
10.7.1. Run the script#
Save the above cell in a script, say parser_1.py, specify that it is a python script in the header:
#!/home/hostadm/miniconda3/bin/python3
Make it executable:
chmod a+x parser_1.py
Run the script with the parameter:
./parser_1.py --alpha 15
10.7.2. Multiple arguments with help options#
10.7.3. More arguments#
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--a1', type = str, required=True, help = 'First coefficient')
parser.add_argument('--a2', type = str, required=True, help = 'Second coefficient')
parser.add_argument('--a3', type = str, help = 'Third coefficient')
args = parser.parse_args()
s = f'{args.a1}*x^2 + {args.a2}*x + {args.a2}'
print(s)
10.7.4. Invoking help#
Save the content of the cell above in file, say, parser_2.py, then run as follows:
./parser_2.py -h
./parser_2.py --a1 34 --a2 55 --a3 56
10.7.5. Multiple arguments#
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--values', type=int, nargs=3)
args = parser.parse_args()
sum = sum(args.values)
print('Sum:', sum)
Run it with values:
./parser_3.py --values 4 8 9
For any number of values, set nargs = '+'